Repository: alivanz/go-simd
Branch: main
Commit: f3b5f7d73797
Files: 76
Total size: 1.9 MB

Directory structure:
gitextract_6cse9yuu/

├── .gitignore
├── LICENSE
├── README.md
├── arm/
│   ├── generate.go
│   ├── neon/
│   │   ├── functions.c
│   │   ├── functions.go
│   │   ├── functions_bypass.go
│   │   ├── functions_cgo.go
│   │   ├── functions_test.go
│   │   ├── loops.c
│   │   ├── loops.go
│   │   └── loops_test.go
│   └── types.go
├── example/
│   ├── neon/
│   │   └── main.go
│   └── sse2/
│       └── main.go
├── generator/
│   ├── arm/
│   │   ├── arm.go
│   │   ├── main.go
│   │   └── sort.go
│   ├── scanner/
│   │   ├── scan.go
│   │   ├── scan_test.go
│   │   └── util.go
│   ├── types/
│   │   ├── function.go
│   │   └── type.go
│   ├── utils/
│   │   ├── download.go
│   │   ├── filter.go
│   │   └── slice.go
│   ├── writer/
│   │   ├── cgo.go
│   │   ├── function.go
│   │   ├── package.go
│   │   ├── package_test.go
│   │   ├── type.go
│   │   └── writer.go
│   └── x86/
│       ├── info.go
│       └── main.go
├── go.mod
├── go.sum
└── x86/
    ├── aes/
    │   ├── functions.c
    │   └── functions.go
    ├── avx/
    │   ├── functions.c
    │   └── functions.go
    ├── avx2/
    │   ├── functions.c
    │   └── functions.go
    ├── bmi/
    │   ├── functions.c
    │   └── functions.go
    ├── bmi2/
    │   ├── functions.c
    │   └── functions.go
    ├── crc32/
    │   ├── functions.c
    │   └── functions.go
    ├── f16c/
    │   ├── functions.c
    │   └── functions.go
    ├── fma/
    │   ├── functions.c
    │   └── functions.go
    ├── fsgsbase/
    │   ├── functions.c
    │   └── functions.go
    ├── generate.go
    ├── lzcnt/
    │   ├── functions.c
    │   └── functions.go
    ├── mmx/
    │   ├── functions.c
    │   └── functions.go
    ├── mmx_sse/
    │   ├── functions.c
    │   └── functions.go
    ├── mmx_sse2/
    │   ├── functions.c
    │   └── functions.go
    ├── mmx_ssse3/
    │   ├── functions.c
    │   └── functions.go
    ├── popcnt/
    │   ├── functions.c
    │   └── functions.go
    ├── sse/
    │   ├── functions.c
    │   └── functions.go
    ├── sse2/
    │   ├── functions.c
    │   └── functions.go
    ├── sse3/
    │   ├── functions.c
    │   └── functions.go
    ├── ssse3/
    │   ├── functions.c
    │   └── functions.go
    └── types.go

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
.vscode
raw.h
intrinsics.json
data.xml

================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2023 Alivan Akbar

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
# SIMD Implementation in Golang

This repository contains an implementation of SIMD (Single Instruction, Multiple Data) operations in Go, specifically targeting ARM NEON architecture. The goal is to provide optimized parallel processing capabilities for certain computational tasks.

## Future Plans

We are actively working on expanding the SIMD implementation to support x86 architecture as well. The upcoming x86 implementation will provide similar SIMD functionalities for parallel processing on x86-based systems.

## Hacks

When we call a C function through CGO, there are some overheads due to Go design.
In general, avoiding CGO would be a good idea.
But I found a hack, instead of relying on CGO, we can utilize `linkname` directive to call C code, bypass CGO, and get better performance.

```
goos: darwin
goarch: arm64
pkg: github.com/alivanz/go-simd/arm/neon
BenchmarkMultRef-8                131395              9168 ns/op
BenchmarkMultSimd-8               598742              1954 ns/op
BenchmarkMultSimdBypass-8         605554              1959 ns/op
BenchmarkMultSimdFull-8          1816879               661.3 ns/op
BenchmarkMultSimdCgo-8             13020             92213 ns/op
PASS
```

```
goos: darwin
goarch: arm64
pkg: github.com/alivanz/go-simd/arm/neon
cpu: Apple M2
BenchmarkVmulqF32N-8                8848            124616 ns/op        33657.86 MB/s       1422 B/op          0 allocs/op
BenchmarkVmulqF32C-8                2256            528683 ns/op        7933.49 MB/s        5577 B/op          0 allocs/op
BenchmarkVmulqF32Ref-8              3630            327995 ns/op        12787.69 MB/s       3466 B/op          0 allocs/op
PASS
ok      github.com/alivanz/go-simd/arm/neon     5.793s
```

The floating-point multiplication benchmarks demonstrate significant performance differences between implementations:

- `VmulqF32N` (Native): Achieves the highest throughput at 33.6 GB/s with minimal memory allocation (1422 B/op). This implementation leverages direct SIMD instructions for optimal performance.
- `VmulqF32C` (C): Shows the lowest performance at 7.9 GB/s with higher memory allocation (5577 B/op), likely due to the overhead of CGO calls and memory management.
- `VmulqF32Ref` (Reference): Performs at 12.8 GB/s with moderate memory usage (3466 B/op), serving as a baseline for comparison.

These results highlight the importance of using native SIMD implementations over CGO-based solutions for performance-critical applications. The native implementation is approximately 2.6x faster than the reference implementation, while the C implementation is about 1.6x slower than the reference.

## Features

- SIMD operations for ARM NEON architecture.
- High-performance parallel processing for specific tasks.
- Utilizes the power of SIMD instructions to process multiple data elements simultaneously.
- Supports a range of data types, including integers and floating-point numbers.
- Modular design for easy integration into existing projects.
- Well-documented code for understanding and extending the implementation.

## Roadmap

- [x] Implement SIMD operations for ARM NEON architecture.
- [ ] Add support for x86 architecture.
- [ ] Expand SIMD operations for additional data types.
- [ ] Optimize performance for specific use cases.
- [ ] Develop comprehensive test suite for validation.

## Usage

To use the SIMD implementations in your project, follow these steps:

1. Import the required package in your Go code:

```go
import "github.com/alivanz/go-simd"
```

2. Use the SIMD functions in your code as needed. Example:

```go
package main

import (
	"log"

	"github.com/alivanz/go-simd/arm"
	"github.com/alivanz/go-simd/arm/neon"
)

func main() {
	var a, b arm.Int8X8
	var add, mul arm.Int16X8
	for i := 0; i < 8; i++ {
		a[i] = arm.Int8(i)
		b[i] = arm.Int8(i * i)
	}
	log.Printf("a = %+v", a)
	log.Printf("b = %+v", b)
	neon.VaddlS8(&add, &a, &b)
	neon.VmullS8(&mul, &a, &b)
	log.Printf("add = %+v", add)
	log.Printf("mul = %+v", mul)
}

```

## Supported Operations

Only ARM Neon supported, for now.

Refer to the documentation in each respective file for more details on how to use each operation.

## Contributing

Contributions to this project are welcome. To contribute, please follow these steps:

1. Fork the repository.
2. Create a new branch for your feature or bug fix.
3. Make your changes and commit them with descriptive messages.
4. Push your changes to your forked repository.
5. Submit a pull request to the main repository.

Please ensure that your code follows the existing code style and includes appropriate tests.

## Acknowledgments

- The ARM NEON architecture documentation for providing valuable insights into SIMD programming techniques.
- The open-source community for their contributions and inspiration.

## Contact

For any questions or feedback regarding this repository, please feel free to contact me at [alivan1627@gmail.com](mailto:alivan1627@gmail.com)

================================================
FILE: arm/generate.go
================================================
package arm

//go:generate go run ../generator/arm


================================================
FILE: arm/neon/functions.c
================================================
#include <arm_neon.h>

void VabaS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1, int8x8_t* v2) { *r = vaba_s8(*v0, *v1, *v2); }
void VabaS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vaba_s16(*v0, *v1, *v2); }
void VabaS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vaba_s32(*v0, *v1, *v2); }
void VabaU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1, uint8x8_t* v2) { *r = vaba_u8(*v0, *v1, *v2); }
void VabaU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1, uint16x4_t* v2) { *r = vaba_u16(*v0, *v1, *v2); }
void VabaU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1, uint32x2_t* v2) { *r = vaba_u32(*v0, *v1, *v2); }
void VabalS8(int16x8_t* r, int16x8_t* v0, int8x8_t* v1, int8x8_t* v2) { *r = vabal_s8(*v0, *v1, *v2); }
void VabalS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vabal_s16(*v0, *v1, *v2); }
void VabalS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vabal_s32(*v0, *v1, *v2); }
void VabalU8(uint16x8_t* r, uint16x8_t* v0, uint8x8_t* v1, uint8x8_t* v2) { *r = vabal_u8(*v0, *v1, *v2); }
void VabalU16(uint32x4_t* r, uint32x4_t* v0, uint16x4_t* v1, uint16x4_t* v2) { *r = vabal_u16(*v0, *v1, *v2); }
void VabalU32(uint64x2_t* r, uint64x2_t* v0, uint32x2_t* v1, uint32x2_t* v2) { *r = vabal_u32(*v0, *v1, *v2); }
void VabalHighS8(int16x8_t* r, int16x8_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vabal_high_s8(*v0, *v1, *v2); }
void VabalHighS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vabal_high_s16(*v0, *v1, *v2); }
void VabalHighS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vabal_high_s32(*v0, *v1, *v2); }
void VabalHighU8(uint16x8_t* r, uint16x8_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vabal_high_u8(*v0, *v1, *v2); }
void VabalHighU16(uint32x4_t* r, uint32x4_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vabal_high_u16(*v0, *v1, *v2); }
void VabalHighU32(uint64x2_t* r, uint64x2_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vabal_high_u32(*v0, *v1, *v2); }
void VabaqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vabaq_s8(*v0, *v1, *v2); }
void VabaqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vabaq_s16(*v0, *v1, *v2); }
void VabaqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vabaq_s32(*v0, *v1, *v2); }
void VabaqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vabaq_u8(*v0, *v1, *v2); }
void VabaqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vabaq_u16(*v0, *v1, *v2); }
void VabaqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vabaq_u32(*v0, *v1, *v2); }
void VabdS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vabd_s8(*v0, *v1); }
void VabdS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vabd_s16(*v0, *v1); }
void VabdS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vabd_s32(*v0, *v1); }
void VabdU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vabd_u8(*v0, *v1); }
void VabdU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vabd_u16(*v0, *v1); }
void VabdU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vabd_u32(*v0, *v1); }
void VabdF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vabd_f32(*v0, *v1); }
void VabdF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vabd_f64(*v0, *v1); }
void VabddF64(float64_t* r, float64_t* v0, float64_t* v1) { *r = vabdd_f64(*v0, *v1); }
void VabdlS8(int16x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vabdl_s8(*v0, *v1); }
void VabdlS16(int32x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vabdl_s16(*v0, *v1); }
void VabdlS32(int64x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vabdl_s32(*v0, *v1); }
void VabdlU8(uint16x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vabdl_u8(*v0, *v1); }
void VabdlU16(uint32x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vabdl_u16(*v0, *v1); }
void VabdlU32(uint64x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vabdl_u32(*v0, *v1); }
void VabdlHighS8(int16x8_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vabdl_high_s8(*v0, *v1); }
void VabdlHighS16(int32x4_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vabdl_high_s16(*v0, *v1); }
void VabdlHighS32(int64x2_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vabdl_high_s32(*v0, *v1); }
void VabdlHighU8(uint16x8_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vabdl_high_u8(*v0, *v1); }
void VabdlHighU16(uint32x4_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vabdl_high_u16(*v0, *v1); }
void VabdlHighU32(uint64x2_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vabdl_high_u32(*v0, *v1); }
void VabdqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vabdq_s8(*v0, *v1); }
void VabdqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vabdq_s16(*v0, *v1); }
void VabdqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vabdq_s32(*v0, *v1); }
void VabdqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vabdq_u8(*v0, *v1); }
void VabdqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vabdq_u16(*v0, *v1); }
void VabdqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vabdq_u32(*v0, *v1); }
void VabdqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vabdq_f32(*v0, *v1); }
void VabdqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vabdq_f64(*v0, *v1); }
void VabdsF32(float32_t* r, float32_t* v0, float32_t* v1) { *r = vabds_f32(*v0, *v1); }
void VabsS8(int8x8_t* r, int8x8_t* v0) { *r = vabs_s8(*v0); }
void VabsS16(int16x4_t* r, int16x4_t* v0) { *r = vabs_s16(*v0); }
void VabsS32(int32x2_t* r, int32x2_t* v0) { *r = vabs_s32(*v0); }
void VabsS64(int64x1_t* r, int64x1_t* v0) { *r = vabs_s64(*v0); }
void VabsF32(float32x2_t* r, float32x2_t* v0) { *r = vabs_f32(*v0); }
void VabsF64(float64x1_t* r, float64x1_t* v0) { *r = vabs_f64(*v0); }
void VabsdS64(int64_t* r, int64_t* v0) { *r = vabsd_s64(*v0); }
void VabsqS8(int8x16_t* r, int8x16_t* v0) { *r = vabsq_s8(*v0); }
void VabsqS16(int16x8_t* r, int16x8_t* v0) { *r = vabsq_s16(*v0); }
void VabsqS32(int32x4_t* r, int32x4_t* v0) { *r = vabsq_s32(*v0); }
void VabsqS64(int64x2_t* r, int64x2_t* v0) { *r = vabsq_s64(*v0); }
void VabsqF32(float32x4_t* r, float32x4_t* v0) { *r = vabsq_f32(*v0); }
void VabsqF64(float64x2_t* r, float64x2_t* v0) { *r = vabsq_f64(*v0); }
void VaddS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vadd_s8(*v0, *v1); }
void VaddS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vadd_s16(*v0, *v1); }
void VaddS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vadd_s32(*v0, *v1); }
void VaddS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vadd_s64(*v0, *v1); }
void VaddU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vadd_u8(*v0, *v1); }
void VaddU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vadd_u16(*v0, *v1); }
void VaddU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vadd_u32(*v0, *v1); }
void VaddU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vadd_u64(*v0, *v1); }
void VaddF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vadd_f32(*v0, *v1); }
void VaddF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vadd_f64(*v0, *v1); }
void VaddP16(poly16x4_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vadd_p16(*v0, *v1); }
void VaddP64(poly64x1_t* r, poly64x1_t* v0, poly64x1_t* v1) { *r = vadd_p64(*v0, *v1); }
void VaddP8(poly8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vadd_p8(*v0, *v1); }
void VadddS64(int64_t* r, int64_t* v0, int64_t* v1) { *r = vaddd_s64(*v0, *v1); }
void VadddU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vaddd_u64(*v0, *v1); }
void VaddhnS16(int8x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vaddhn_s16(*v0, *v1); }
void VaddhnS32(int16x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vaddhn_s32(*v0, *v1); }
void VaddhnS64(int32x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vaddhn_s64(*v0, *v1); }
void VaddhnU16(uint8x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vaddhn_u16(*v0, *v1); }
void VaddhnU32(uint16x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vaddhn_u32(*v0, *v1); }
void VaddhnU64(uint32x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vaddhn_u64(*v0, *v1); }
void VaddhnHighS16(int8x16_t* r, int8x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vaddhn_high_s16(*v0, *v1, *v2); }
void VaddhnHighS32(int16x8_t* r, int16x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vaddhn_high_s32(*v0, *v1, *v2); }
void VaddhnHighS64(int32x4_t* r, int32x2_t* v0, int64x2_t* v1, int64x2_t* v2) { *r = vaddhn_high_s64(*v0, *v1, *v2); }
void VaddhnHighU16(uint8x16_t* r, uint8x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vaddhn_high_u16(*v0, *v1, *v2); }
void VaddhnHighU32(uint16x8_t* r, uint16x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vaddhn_high_u32(*v0, *v1, *v2); }
void VaddhnHighU64(uint32x4_t* r, uint32x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = vaddhn_high_u64(*v0, *v1, *v2); }
void VaddlS8(int16x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vaddl_s8(*v0, *v1); }
void VaddlS16(int32x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vaddl_s16(*v0, *v1); }
void VaddlS32(int64x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vaddl_s32(*v0, *v1); }
void VaddlU8(uint16x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vaddl_u8(*v0, *v1); }
void VaddlU16(uint32x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vaddl_u16(*v0, *v1); }
void VaddlU32(uint64x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vaddl_u32(*v0, *v1); }
void VaddlHighS8(int16x8_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vaddl_high_s8(*v0, *v1); }
void VaddlHighS16(int32x4_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vaddl_high_s16(*v0, *v1); }
void VaddlHighS32(int64x2_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vaddl_high_s32(*v0, *v1); }
void VaddlHighU8(uint16x8_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vaddl_high_u8(*v0, *v1); }
void VaddlHighU16(uint32x4_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vaddl_high_u16(*v0, *v1); }
void VaddlHighU32(uint64x2_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vaddl_high_u32(*v0, *v1); }
void VaddlvS8(int16_t* r, int8x8_t* v0) { *r = vaddlv_s8(*v0); }
void VaddlvS16(int32_t* r, int16x4_t* v0) { *r = vaddlv_s16(*v0); }
void VaddlvS32(int64_t* r, int32x2_t* v0) { *r = vaddlv_s32(*v0); }
void VaddlvU8(uint16_t* r, uint8x8_t* v0) { *r = vaddlv_u8(*v0); }
void VaddlvU16(uint32_t* r, uint16x4_t* v0) { *r = vaddlv_u16(*v0); }
void VaddlvU32(uint64_t* r, uint32x2_t* v0) { *r = vaddlv_u32(*v0); }
void VaddlvqS8(int16_t* r, int8x16_t* v0) { *r = vaddlvq_s8(*v0); }
void VaddlvqS16(int32_t* r, int16x8_t* v0) { *r = vaddlvq_s16(*v0); }
void VaddlvqS32(int64_t* r, int32x4_t* v0) { *r = vaddlvq_s32(*v0); }
void VaddlvqU8(uint16_t* r, uint8x16_t* v0) { *r = vaddlvq_u8(*v0); }
void VaddlvqU16(uint32_t* r, uint16x8_t* v0) { *r = vaddlvq_u16(*v0); }
void VaddlvqU32(uint64_t* r, uint32x4_t* v0) { *r = vaddlvq_u32(*v0); }
void VaddqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vaddq_s8(*v0, *v1); }
void VaddqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vaddq_s16(*v0, *v1); }
void VaddqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vaddq_s32(*v0, *v1); }
void VaddqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vaddq_s64(*v0, *v1); }
void VaddqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vaddq_u8(*v0, *v1); }
void VaddqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vaddq_u16(*v0, *v1); }
void VaddqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vaddq_u32(*v0, *v1); }
void VaddqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vaddq_u64(*v0, *v1); }
void VaddqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vaddq_f32(*v0, *v1); }
void VaddqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vaddq_f64(*v0, *v1); }
void VaddqP128(poly128_t* r, poly128_t* v0, poly128_t* v1) { *r = vaddq_p128(*v0, *v1); }
void VaddqP16(poly16x8_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vaddq_p16(*v0, *v1); }
void VaddqP64(poly64x2_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vaddq_p64(*v0, *v1); }
void VaddqP8(poly8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vaddq_p8(*v0, *v1); }
void VaddvS8(int8_t* r, int8x8_t* v0) { *r = vaddv_s8(*v0); }
void VaddvS16(int16_t* r, int16x4_t* v0) { *r = vaddv_s16(*v0); }
void VaddvS32(int32_t* r, int32x2_t* v0) { *r = vaddv_s32(*v0); }
void VaddvU8(uint8_t* r, uint8x8_t* v0) { *r = vaddv_u8(*v0); }
void VaddvU16(uint16_t* r, uint16x4_t* v0) { *r = vaddv_u16(*v0); }
void VaddvU32(uint32_t* r, uint32x2_t* v0) { *r = vaddv_u32(*v0); }
void VaddvF32(float32_t* r, float32x2_t* v0) { *r = vaddv_f32(*v0); }
void VaddvqS8(int8_t* r, int8x16_t* v0) { *r = vaddvq_s8(*v0); }
void VaddvqS16(int16_t* r, int16x8_t* v0) { *r = vaddvq_s16(*v0); }
void VaddvqS32(int32_t* r, int32x4_t* v0) { *r = vaddvq_s32(*v0); }
void VaddvqS64(int64_t* r, int64x2_t* v0) { *r = vaddvq_s64(*v0); }
void VaddvqU8(uint8_t* r, uint8x16_t* v0) { *r = vaddvq_u8(*v0); }
void VaddvqU16(uint16_t* r, uint16x8_t* v0) { *r = vaddvq_u16(*v0); }
void VaddvqU32(uint32_t* r, uint32x4_t* v0) { *r = vaddvq_u32(*v0); }
void VaddvqU64(uint64_t* r, uint64x2_t* v0) { *r = vaddvq_u64(*v0); }
void VaddvqF32(float32_t* r, float32x4_t* v0) { *r = vaddvq_f32(*v0); }
void VaddvqF64(float64_t* r, float64x2_t* v0) { *r = vaddvq_f64(*v0); }
void VaddwS8(int16x8_t* r, int16x8_t* v0, int8x8_t* v1) { *r = vaddw_s8(*v0, *v1); }
void VaddwS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1) { *r = vaddw_s16(*v0, *v1); }
void VaddwS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1) { *r = vaddw_s32(*v0, *v1); }
void VaddwU8(uint16x8_t* r, uint16x8_t* v0, uint8x8_t* v1) { *r = vaddw_u8(*v0, *v1); }
void VaddwU16(uint32x4_t* r, uint32x4_t* v0, uint16x4_t* v1) { *r = vaddw_u16(*v0, *v1); }
void VaddwU32(uint64x2_t* r, uint64x2_t* v0, uint32x2_t* v1) { *r = vaddw_u32(*v0, *v1); }
void VaddwHighS8(int16x8_t* r, int16x8_t* v0, int8x16_t* v1) { *r = vaddw_high_s8(*v0, *v1); }
void VaddwHighS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1) { *r = vaddw_high_s16(*v0, *v1); }
void VaddwHighS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1) { *r = vaddw_high_s32(*v0, *v1); }
void VaddwHighU8(uint16x8_t* r, uint16x8_t* v0, uint8x16_t* v1) { *r = vaddw_high_u8(*v0, *v1); }
void VaddwHighU16(uint32x4_t* r, uint32x4_t* v0, uint16x8_t* v1) { *r = vaddw_high_u16(*v0, *v1); }
void VaddwHighU32(uint64x2_t* r, uint64x2_t* v0, uint32x4_t* v1) { *r = vaddw_high_u32(*v0, *v1); }
void VaesdqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vaesdq_u8(*v0, *v1); }
void VaeseqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vaeseq_u8(*v0, *v1); }
void VaesimcqU8(uint8x16_t* r, uint8x16_t* v0) { *r = vaesimcq_u8(*v0); }
void VaesmcqU8(uint8x16_t* r, uint8x16_t* v0) { *r = vaesmcq_u8(*v0); }
void VandS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vand_s8(*v0, *v1); }
void VandS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vand_s16(*v0, *v1); }
void VandS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vand_s32(*v0, *v1); }
void VandS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vand_s64(*v0, *v1); }
void VandU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vand_u8(*v0, *v1); }
void VandU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vand_u16(*v0, *v1); }
void VandU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vand_u32(*v0, *v1); }
void VandU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vand_u64(*v0, *v1); }
void VandqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vandq_s8(*v0, *v1); }
void VandqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vandq_s16(*v0, *v1); }
void VandqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vandq_s32(*v0, *v1); }
void VandqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vandq_s64(*v0, *v1); }
void VandqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vandq_u8(*v0, *v1); }
void VandqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vandq_u16(*v0, *v1); }
void VandqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vandq_u32(*v0, *v1); }
void VandqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vandq_u64(*v0, *v1); }
void VbcaxqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vbcaxq_s8(*v0, *v1, *v2); }
void VbcaxqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vbcaxq_s16(*v0, *v1, *v2); }
void VbcaxqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vbcaxq_s32(*v0, *v1, *v2); }
void VbcaxqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1, int64x2_t* v2) { *r = vbcaxq_s64(*v0, *v1, *v2); }
void VbcaxqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vbcaxq_u8(*v0, *v1, *v2); }
void VbcaxqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vbcaxq_u16(*v0, *v1, *v2); }
void VbcaxqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vbcaxq_u32(*v0, *v1, *v2); }
void VbcaxqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = vbcaxq_u64(*v0, *v1, *v2); }
void VbicS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vbic_s8(*v0, *v1); }
void VbicS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vbic_s16(*v0, *v1); }
void VbicS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vbic_s32(*v0, *v1); }
void VbicS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vbic_s64(*v0, *v1); }
void VbicU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vbic_u8(*v0, *v1); }
void VbicU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vbic_u16(*v0, *v1); }
void VbicU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vbic_u32(*v0, *v1); }
void VbicU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vbic_u64(*v0, *v1); }
void VbicqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vbicq_s8(*v0, *v1); }
void VbicqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vbicq_s16(*v0, *v1); }
void VbicqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vbicq_s32(*v0, *v1); }
void VbicqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vbicq_s64(*v0, *v1); }
void VbicqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vbicq_u8(*v0, *v1); }
void VbicqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vbicq_u16(*v0, *v1); }
void VbicqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vbicq_u32(*v0, *v1); }
void VbicqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vbicq_u64(*v0, *v1); }
void VbslS8(int8x8_t* r, uint8x8_t* v0, int8x8_t* v1, int8x8_t* v2) { *r = vbsl_s8(*v0, *v1, *v2); }
void VbslS16(int16x4_t* r, uint16x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vbsl_s16(*v0, *v1, *v2); }
void VbslS32(int32x2_t* r, uint32x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vbsl_s32(*v0, *v1, *v2); }
void VbslS64(int64x1_t* r, uint64x1_t* v0, int64x1_t* v1, int64x1_t* v2) { *r = vbsl_s64(*v0, *v1, *v2); }
void VbslU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1, uint8x8_t* v2) { *r = vbsl_u8(*v0, *v1, *v2); }
void VbslU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1, uint16x4_t* v2) { *r = vbsl_u16(*v0, *v1, *v2); }
void VbslU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1, uint32x2_t* v2) { *r = vbsl_u32(*v0, *v1, *v2); }
void VbslU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1, uint64x1_t* v2) { *r = vbsl_u64(*v0, *v1, *v2); }
void VbslF32(float32x2_t* r, uint32x2_t* v0, float32x2_t* v1, float32x2_t* v2) { *r = vbsl_f32(*v0, *v1, *v2); }
void VbslF64(float64x1_t* r, uint64x1_t* v0, float64x1_t* v1, float64x1_t* v2) { *r = vbsl_f64(*v0, *v1, *v2); }
void VbslP16(poly16x4_t* r, uint16x4_t* v0, poly16x4_t* v1, poly16x4_t* v2) { *r = vbsl_p16(*v0, *v1, *v2); }
void VbslP64(poly64x1_t* r, uint64x1_t* v0, poly64x1_t* v1, poly64x1_t* v2) { *r = vbsl_p64(*v0, *v1, *v2); }
void VbslP8(poly8x8_t* r, uint8x8_t* v0, poly8x8_t* v1, poly8x8_t* v2) { *r = vbsl_p8(*v0, *v1, *v2); }
void VbslqS8(int8x16_t* r, uint8x16_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vbslq_s8(*v0, *v1, *v2); }
void VbslqS16(int16x8_t* r, uint16x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vbslq_s16(*v0, *v1, *v2); }
void VbslqS32(int32x4_t* r, uint32x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vbslq_s32(*v0, *v1, *v2); }
void VbslqS64(int64x2_t* r, uint64x2_t* v0, int64x2_t* v1, int64x2_t* v2) { *r = vbslq_s64(*v0, *v1, *v2); }
void VbslqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vbslq_u8(*v0, *v1, *v2); }
void VbslqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vbslq_u16(*v0, *v1, *v2); }
void VbslqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vbslq_u32(*v0, *v1, *v2); }
void VbslqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = vbslq_u64(*v0, *v1, *v2); }
void VbslqF32(float32x4_t* r, uint32x4_t* v0, float32x4_t* v1, float32x4_t* v2) { *r = vbslq_f32(*v0, *v1, *v2); }
void VbslqF64(float64x2_t* r, uint64x2_t* v0, float64x2_t* v1, float64x2_t* v2) { *r = vbslq_f64(*v0, *v1, *v2); }
void VbslqP16(poly16x8_t* r, uint16x8_t* v0, poly16x8_t* v1, poly16x8_t* v2) { *r = vbslq_p16(*v0, *v1, *v2); }
void VbslqP64(poly64x2_t* r, uint64x2_t* v0, poly64x2_t* v1, poly64x2_t* v2) { *r = vbslq_p64(*v0, *v1, *v2); }
void VbslqP8(poly8x16_t* r, uint8x16_t* v0, poly8x16_t* v1, poly8x16_t* v2) { *r = vbslq_p8(*v0, *v1, *v2); }
void VcaddRot270F32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcadd_rot270_f32(*v0, *v1); }
void VcaddRot90F32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcadd_rot90_f32(*v0, *v1); }
void VcaddqRot270F32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcaddq_rot270_f32(*v0, *v1); }
void VcaddqRot270F64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcaddq_rot270_f64(*v0, *v1); }
void VcaddqRot90F32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcaddq_rot90_f32(*v0, *v1); }
void VcaddqRot90F64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcaddq_rot90_f64(*v0, *v1); }
void VcageF32(uint32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcage_f32(*v0, *v1); }
void VcageF64(uint64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vcage_f64(*v0, *v1); }
void VcagedF64(uint64_t* r, float64_t* v0, float64_t* v1) { *r = vcaged_f64(*v0, *v1); }
void VcageqF32(uint32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcageq_f32(*v0, *v1); }
void VcageqF64(uint64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcageq_f64(*v0, *v1); }
void VcagesF32(uint32_t* r, float32_t* v0, float32_t* v1) { *r = vcages_f32(*v0, *v1); }
void VcagtF32(uint32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcagt_f32(*v0, *v1); }
void VcagtF64(uint64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vcagt_f64(*v0, *v1); }
void VcagtdF64(uint64_t* r, float64_t* v0, float64_t* v1) { *r = vcagtd_f64(*v0, *v1); }
void VcagtqF32(uint32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcagtq_f32(*v0, *v1); }
void VcagtqF64(uint64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcagtq_f64(*v0, *v1); }
void VcagtsF32(uint32_t* r, float32_t* v0, float32_t* v1) { *r = vcagts_f32(*v0, *v1); }
void VcaleF32(uint32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcale_f32(*v0, *v1); }
void VcaleF64(uint64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vcale_f64(*v0, *v1); }
void VcaledF64(uint64_t* r, float64_t* v0, float64_t* v1) { *r = vcaled_f64(*v0, *v1); }
void VcaleqF32(uint32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcaleq_f32(*v0, *v1); }
void VcaleqF64(uint64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcaleq_f64(*v0, *v1); }
void VcalesF32(uint32_t* r, float32_t* v0, float32_t* v1) { *r = vcales_f32(*v0, *v1); }
void VcaltF32(uint32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcalt_f32(*v0, *v1); }
void VcaltF64(uint64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vcalt_f64(*v0, *v1); }
void VcaltdF64(uint64_t* r, float64_t* v0, float64_t* v1) { *r = vcaltd_f64(*v0, *v1); }
void VcaltqF32(uint32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcaltq_f32(*v0, *v1); }
void VcaltqF64(uint64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcaltq_f64(*v0, *v1); }
void VcaltsF32(uint32_t* r, float32_t* v0, float32_t* v1) { *r = vcalts_f32(*v0, *v1); }
void VceqS8(uint8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vceq_s8(*v0, *v1); }
void VceqS16(uint16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vceq_s16(*v0, *v1); }
void VceqS32(uint32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vceq_s32(*v0, *v1); }
void VceqS64(uint64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vceq_s64(*v0, *v1); }
void VceqU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vceq_u8(*v0, *v1); }
void VceqU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vceq_u16(*v0, *v1); }
void VceqU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vceq_u32(*v0, *v1); }
void VceqU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vceq_u64(*v0, *v1); }
void VceqF32(uint32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vceq_f32(*v0, *v1); }
void VceqF64(uint64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vceq_f64(*v0, *v1); }
void VceqP64(uint64x1_t* r, poly64x1_t* v0, poly64x1_t* v1) { *r = vceq_p64(*v0, *v1); }
void VceqP8(uint8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vceq_p8(*v0, *v1); }
void VceqdS64(uint64_t* r, int64_t* v0, int64_t* v1) { *r = vceqd_s64(*v0, *v1); }
void VceqdU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vceqd_u64(*v0, *v1); }
void VceqdF64(uint64_t* r, float64_t* v0, float64_t* v1) { *r = vceqd_f64(*v0, *v1); }
void VceqqS8(uint8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vceqq_s8(*v0, *v1); }
void VceqqS16(uint16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vceqq_s16(*v0, *v1); }
void VceqqS32(uint32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vceqq_s32(*v0, *v1); }
void VceqqS64(uint64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vceqq_s64(*v0, *v1); }
void VceqqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vceqq_u8(*v0, *v1); }
void VceqqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vceqq_u16(*v0, *v1); }
void VceqqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vceqq_u32(*v0, *v1); }
void VceqqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vceqq_u64(*v0, *v1); }
void VceqqF32(uint32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vceqq_f32(*v0, *v1); }
void VceqqF64(uint64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vceqq_f64(*v0, *v1); }
void VceqqP64(uint64x2_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vceqq_p64(*v0, *v1); }
void VceqqP8(uint8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vceqq_p8(*v0, *v1); }
void VceqsF32(uint32_t* r, float32_t* v0, float32_t* v1) { *r = vceqs_f32(*v0, *v1); }
void VceqzS8(uint8x8_t* r, int8x8_t* v0) { *r = vceqz_s8(*v0); }
void VceqzS16(uint16x4_t* r, int16x4_t* v0) { *r = vceqz_s16(*v0); }
void VceqzS32(uint32x2_t* r, int32x2_t* v0) { *r = vceqz_s32(*v0); }
void VceqzS64(uint64x1_t* r, int64x1_t* v0) { *r = vceqz_s64(*v0); }
void VceqzU8(uint8x8_t* r, uint8x8_t* v0) { *r = vceqz_u8(*v0); }
void VceqzU16(uint16x4_t* r, uint16x4_t* v0) { *r = vceqz_u16(*v0); }
void VceqzU32(uint32x2_t* r, uint32x2_t* v0) { *r = vceqz_u32(*v0); }
void VceqzU64(uint64x1_t* r, uint64x1_t* v0) { *r = vceqz_u64(*v0); }
void VceqzF32(uint32x2_t* r, float32x2_t* v0) { *r = vceqz_f32(*v0); }
void VceqzF64(uint64x1_t* r, float64x1_t* v0) { *r = vceqz_f64(*v0); }
void VceqzP64(uint64x1_t* r, poly64x1_t* v0) { *r = vceqz_p64(*v0); }
void VceqzP8(uint8x8_t* r, poly8x8_t* v0) { *r = vceqz_p8(*v0); }
void VceqzdS64(uint64_t* r, int64_t* v0) { *r = vceqzd_s64(*v0); }
void VceqzdU64(uint64_t* r, uint64_t* v0) { *r = vceqzd_u64(*v0); }
void VceqzdF64(uint64_t* r, float64_t* v0) { *r = vceqzd_f64(*v0); }
void VceqzqS8(uint8x16_t* r, int8x16_t* v0) { *r = vceqzq_s8(*v0); }
void VceqzqS16(uint16x8_t* r, int16x8_t* v0) { *r = vceqzq_s16(*v0); }
void VceqzqS32(uint32x4_t* r, int32x4_t* v0) { *r = vceqzq_s32(*v0); }
void VceqzqS64(uint64x2_t* r, int64x2_t* v0) { *r = vceqzq_s64(*v0); }
void VceqzqU8(uint8x16_t* r, uint8x16_t* v0) { *r = vceqzq_u8(*v0); }
void VceqzqU16(uint16x8_t* r, uint16x8_t* v0) { *r = vceqzq_u16(*v0); }
void VceqzqU32(uint32x4_t* r, uint32x4_t* v0) { *r = vceqzq_u32(*v0); }
void VceqzqU64(uint64x2_t* r, uint64x2_t* v0) { *r = vceqzq_u64(*v0); }
void VceqzqF32(uint32x4_t* r, float32x4_t* v0) { *r = vceqzq_f32(*v0); }
void VceqzqF64(uint64x2_t* r, float64x2_t* v0) { *r = vceqzq_f64(*v0); }
void VceqzqP64(uint64x2_t* r, poly64x2_t* v0) { *r = vceqzq_p64(*v0); }
void VceqzqP8(uint8x16_t* r, poly8x16_t* v0) { *r = vceqzq_p8(*v0); }
void VceqzsF32(uint32_t* r, float32_t* v0) { *r = vceqzs_f32(*v0); }
void VcgeS8(uint8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vcge_s8(*v0, *v1); }
void VcgeS16(uint16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vcge_s16(*v0, *v1); }
void VcgeS32(uint32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vcge_s32(*v0, *v1); }
void VcgeS64(uint64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vcge_s64(*v0, *v1); }
void VcgeU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vcge_u8(*v0, *v1); }
void VcgeU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vcge_u16(*v0, *v1); }
void VcgeU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vcge_u32(*v0, *v1); }
void VcgeU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vcge_u64(*v0, *v1); }
void VcgeF32(uint32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcge_f32(*v0, *v1); }
void VcgeF64(uint64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vcge_f64(*v0, *v1); }
void VcgedS64(uint64_t* r, int64_t* v0, int64_t* v1) { *r = vcged_s64(*v0, *v1); }
void VcgedU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vcged_u64(*v0, *v1); }
void VcgedF64(uint64_t* r, float64_t* v0, float64_t* v1) { *r = vcged_f64(*v0, *v1); }
void VcgeqS8(uint8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vcgeq_s8(*v0, *v1); }
void VcgeqS16(uint16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vcgeq_s16(*v0, *v1); }
void VcgeqS32(uint32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vcgeq_s32(*v0, *v1); }
void VcgeqS64(uint64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vcgeq_s64(*v0, *v1); }
void VcgeqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vcgeq_u8(*v0, *v1); }
void VcgeqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vcgeq_u16(*v0, *v1); }
void VcgeqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vcgeq_u32(*v0, *v1); }
void VcgeqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vcgeq_u64(*v0, *v1); }
void VcgeqF32(uint32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcgeq_f32(*v0, *v1); }
void VcgeqF64(uint64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcgeq_f64(*v0, *v1); }
void VcgesF32(uint32_t* r, float32_t* v0, float32_t* v1) { *r = vcges_f32(*v0, *v1); }
void VcgezS8(uint8x8_t* r, int8x8_t* v0) { *r = vcgez_s8(*v0); }
void VcgezS16(uint16x4_t* r, int16x4_t* v0) { *r = vcgez_s16(*v0); }
void VcgezS32(uint32x2_t* r, int32x2_t* v0) { *r = vcgez_s32(*v0); }
void VcgezS64(uint64x1_t* r, int64x1_t* v0) { *r = vcgez_s64(*v0); }
void VcgezF32(uint32x2_t* r, float32x2_t* v0) { *r = vcgez_f32(*v0); }
void VcgezF64(uint64x1_t* r, float64x1_t* v0) { *r = vcgez_f64(*v0); }
void VcgezdS64(uint64_t* r, int64_t* v0) { *r = vcgezd_s64(*v0); }
void VcgezdF64(uint64_t* r, float64_t* v0) { *r = vcgezd_f64(*v0); }
void VcgezqS8(uint8x16_t* r, int8x16_t* v0) { *r = vcgezq_s8(*v0); }
void VcgezqS16(uint16x8_t* r, int16x8_t* v0) { *r = vcgezq_s16(*v0); }
void VcgezqS32(uint32x4_t* r, int32x4_t* v0) { *r = vcgezq_s32(*v0); }
void VcgezqS64(uint64x2_t* r, int64x2_t* v0) { *r = vcgezq_s64(*v0); }
void VcgezqF32(uint32x4_t* r, float32x4_t* v0) { *r = vcgezq_f32(*v0); }
void VcgezqF64(uint64x2_t* r, float64x2_t* v0) { *r = vcgezq_f64(*v0); }
void VcgezsF32(uint32_t* r, float32_t* v0) { *r = vcgezs_f32(*v0); }
void VcgtS8(uint8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vcgt_s8(*v0, *v1); }
void VcgtS16(uint16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vcgt_s16(*v0, *v1); }
void VcgtS32(uint32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vcgt_s32(*v0, *v1); }
void VcgtS64(uint64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vcgt_s64(*v0, *v1); }
void VcgtU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vcgt_u8(*v0, *v1); }
void VcgtU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vcgt_u16(*v0, *v1); }
void VcgtU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vcgt_u32(*v0, *v1); }
void VcgtU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vcgt_u64(*v0, *v1); }
void VcgtF32(uint32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcgt_f32(*v0, *v1); }
void VcgtF64(uint64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vcgt_f64(*v0, *v1); }
void VcgtdS64(uint64_t* r, int64_t* v0, int64_t* v1) { *r = vcgtd_s64(*v0, *v1); }
void VcgtdU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vcgtd_u64(*v0, *v1); }
void VcgtdF64(uint64_t* r, float64_t* v0, float64_t* v1) { *r = vcgtd_f64(*v0, *v1); }
void VcgtqS8(uint8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vcgtq_s8(*v0, *v1); }
void VcgtqS16(uint16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vcgtq_s16(*v0, *v1); }
void VcgtqS32(uint32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vcgtq_s32(*v0, *v1); }
void VcgtqS64(uint64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vcgtq_s64(*v0, *v1); }
void VcgtqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vcgtq_u8(*v0, *v1); }
void VcgtqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vcgtq_u16(*v0, *v1); }
void VcgtqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vcgtq_u32(*v0, *v1); }
void VcgtqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vcgtq_u64(*v0, *v1); }
void VcgtqF32(uint32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcgtq_f32(*v0, *v1); }
void VcgtqF64(uint64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcgtq_f64(*v0, *v1); }
void VcgtsF32(uint32_t* r, float32_t* v0, float32_t* v1) { *r = vcgts_f32(*v0, *v1); }
void VcgtzS8(uint8x8_t* r, int8x8_t* v0) { *r = vcgtz_s8(*v0); }
void VcgtzS16(uint16x4_t* r, int16x4_t* v0) { *r = vcgtz_s16(*v0); }
void VcgtzS32(uint32x2_t* r, int32x2_t* v0) { *r = vcgtz_s32(*v0); }
void VcgtzS64(uint64x1_t* r, int64x1_t* v0) { *r = vcgtz_s64(*v0); }
void VcgtzF32(uint32x2_t* r, float32x2_t* v0) { *r = vcgtz_f32(*v0); }
void VcgtzF64(uint64x1_t* r, float64x1_t* v0) { *r = vcgtz_f64(*v0); }
void VcgtzdS64(uint64_t* r, int64_t* v0) { *r = vcgtzd_s64(*v0); }
void VcgtzdF64(uint64_t* r, float64_t* v0) { *r = vcgtzd_f64(*v0); }
void VcgtzqS8(uint8x16_t* r, int8x16_t* v0) { *r = vcgtzq_s8(*v0); }
void VcgtzqS16(uint16x8_t* r, int16x8_t* v0) { *r = vcgtzq_s16(*v0); }
void VcgtzqS32(uint32x4_t* r, int32x4_t* v0) { *r = vcgtzq_s32(*v0); }
void VcgtzqS64(uint64x2_t* r, int64x2_t* v0) { *r = vcgtzq_s64(*v0); }
void VcgtzqF32(uint32x4_t* r, float32x4_t* v0) { *r = vcgtzq_f32(*v0); }
void VcgtzqF64(uint64x2_t* r, float64x2_t* v0) { *r = vcgtzq_f64(*v0); }
void VcgtzsF32(uint32_t* r, float32_t* v0) { *r = vcgtzs_f32(*v0); }
void VcleS8(uint8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vcle_s8(*v0, *v1); }
void VcleS16(uint16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vcle_s16(*v0, *v1); }
void VcleS32(uint32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vcle_s32(*v0, *v1); }
void VcleS64(uint64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vcle_s64(*v0, *v1); }
void VcleU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vcle_u8(*v0, *v1); }
void VcleU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vcle_u16(*v0, *v1); }
void VcleU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vcle_u32(*v0, *v1); }
void VcleU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vcle_u64(*v0, *v1); }
void VcleF32(uint32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcle_f32(*v0, *v1); }
void VcleF64(uint64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vcle_f64(*v0, *v1); }
void VcledS64(uint64_t* r, int64_t* v0, int64_t* v1) { *r = vcled_s64(*v0, *v1); }
void VcledU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vcled_u64(*v0, *v1); }
void VcledF64(uint64_t* r, float64_t* v0, float64_t* v1) { *r = vcled_f64(*v0, *v1); }
void VcleqS8(uint8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vcleq_s8(*v0, *v1); }
void VcleqS16(uint16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vcleq_s16(*v0, *v1); }
void VcleqS32(uint32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vcleq_s32(*v0, *v1); }
void VcleqS64(uint64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vcleq_s64(*v0, *v1); }
void VcleqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vcleq_u8(*v0, *v1); }
void VcleqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vcleq_u16(*v0, *v1); }
void VcleqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vcleq_u32(*v0, *v1); }
void VcleqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vcleq_u64(*v0, *v1); }
void VcleqF32(uint32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcleq_f32(*v0, *v1); }
void VcleqF64(uint64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcleq_f64(*v0, *v1); }
void VclesF32(uint32_t* r, float32_t* v0, float32_t* v1) { *r = vcles_f32(*v0, *v1); }
void VclezS8(uint8x8_t* r, int8x8_t* v0) { *r = vclez_s8(*v0); }
void VclezS16(uint16x4_t* r, int16x4_t* v0) { *r = vclez_s16(*v0); }
void VclezS32(uint32x2_t* r, int32x2_t* v0) { *r = vclez_s32(*v0); }
void VclezS64(uint64x1_t* r, int64x1_t* v0) { *r = vclez_s64(*v0); }
void VclezF32(uint32x2_t* r, float32x2_t* v0) { *r = vclez_f32(*v0); }
void VclezF64(uint64x1_t* r, float64x1_t* v0) { *r = vclez_f64(*v0); }
void VclezdS64(uint64_t* r, int64_t* v0) { *r = vclezd_s64(*v0); }
void VclezdF64(uint64_t* r, float64_t* v0) { *r = vclezd_f64(*v0); }
void VclezqS8(uint8x16_t* r, int8x16_t* v0) { *r = vclezq_s8(*v0); }
void VclezqS16(uint16x8_t* r, int16x8_t* v0) { *r = vclezq_s16(*v0); }
void VclezqS32(uint32x4_t* r, int32x4_t* v0) { *r = vclezq_s32(*v0); }
void VclezqS64(uint64x2_t* r, int64x2_t* v0) { *r = vclezq_s64(*v0); }
void VclezqF32(uint32x4_t* r, float32x4_t* v0) { *r = vclezq_f32(*v0); }
void VclezqF64(uint64x2_t* r, float64x2_t* v0) { *r = vclezq_f64(*v0); }
void VclezsF32(uint32_t* r, float32_t* v0) { *r = vclezs_f32(*v0); }
void VclsS8(int8x8_t* r, int8x8_t* v0) { *r = vcls_s8(*v0); }
void VclsS16(int16x4_t* r, int16x4_t* v0) { *r = vcls_s16(*v0); }
void VclsS32(int32x2_t* r, int32x2_t* v0) { *r = vcls_s32(*v0); }
void VclsU8(int8x8_t* r, uint8x8_t* v0) { *r = vcls_u8(*v0); }
void VclsU16(int16x4_t* r, uint16x4_t* v0) { *r = vcls_u16(*v0); }
void VclsU32(int32x2_t* r, uint32x2_t* v0) { *r = vcls_u32(*v0); }
void VclsqS8(int8x16_t* r, int8x16_t* v0) { *r = vclsq_s8(*v0); }
void VclsqS16(int16x8_t* r, int16x8_t* v0) { *r = vclsq_s16(*v0); }
void VclsqS32(int32x4_t* r, int32x4_t* v0) { *r = vclsq_s32(*v0); }
void VclsqU8(int8x16_t* r, uint8x16_t* v0) { *r = vclsq_u8(*v0); }
void VclsqU16(int16x8_t* r, uint16x8_t* v0) { *r = vclsq_u16(*v0); }
void VclsqU32(int32x4_t* r, uint32x4_t* v0) { *r = vclsq_u32(*v0); }
void VcltS8(uint8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vclt_s8(*v0, *v1); }
void VcltS16(uint16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vclt_s16(*v0, *v1); }
void VcltS32(uint32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vclt_s32(*v0, *v1); }
void VcltS64(uint64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vclt_s64(*v0, *v1); }
void VcltU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vclt_u8(*v0, *v1); }
void VcltU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vclt_u16(*v0, *v1); }
void VcltU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vclt_u32(*v0, *v1); }
void VcltU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vclt_u64(*v0, *v1); }
void VcltF32(uint32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vclt_f32(*v0, *v1); }
void VcltF64(uint64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vclt_f64(*v0, *v1); }
void VcltdS64(uint64_t* r, int64_t* v0, int64_t* v1) { *r = vcltd_s64(*v0, *v1); }
void VcltdU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vcltd_u64(*v0, *v1); }
void VcltdF64(uint64_t* r, float64_t* v0, float64_t* v1) { *r = vcltd_f64(*v0, *v1); }
void VcltqS8(uint8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vcltq_s8(*v0, *v1); }
void VcltqS16(uint16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vcltq_s16(*v0, *v1); }
void VcltqS32(uint32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vcltq_s32(*v0, *v1); }
void VcltqS64(uint64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vcltq_s64(*v0, *v1); }
void VcltqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vcltq_u8(*v0, *v1); }
void VcltqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vcltq_u16(*v0, *v1); }
void VcltqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vcltq_u32(*v0, *v1); }
void VcltqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vcltq_u64(*v0, *v1); }
void VcltqF32(uint32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcltq_f32(*v0, *v1); }
void VcltqF64(uint64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcltq_f64(*v0, *v1); }
void VcltsF32(uint32_t* r, float32_t* v0, float32_t* v1) { *r = vclts_f32(*v0, *v1); }
void VcltzS8(uint8x8_t* r, int8x8_t* v0) { *r = vcltz_s8(*v0); }
void VcltzS16(uint16x4_t* r, int16x4_t* v0) { *r = vcltz_s16(*v0); }
void VcltzS32(uint32x2_t* r, int32x2_t* v0) { *r = vcltz_s32(*v0); }
void VcltzS64(uint64x1_t* r, int64x1_t* v0) { *r = vcltz_s64(*v0); }
void VcltzF32(uint32x2_t* r, float32x2_t* v0) { *r = vcltz_f32(*v0); }
void VcltzF64(uint64x1_t* r, float64x1_t* v0) { *r = vcltz_f64(*v0); }
void VcltzdS64(uint64_t* r, int64_t* v0) { *r = vcltzd_s64(*v0); }
void VcltzdF64(uint64_t* r, float64_t* v0) { *r = vcltzd_f64(*v0); }
void VcltzqS8(uint8x16_t* r, int8x16_t* v0) { *r = vcltzq_s8(*v0); }
void VcltzqS16(uint16x8_t* r, int16x8_t* v0) { *r = vcltzq_s16(*v0); }
void VcltzqS32(uint32x4_t* r, int32x4_t* v0) { *r = vcltzq_s32(*v0); }
void VcltzqS64(uint64x2_t* r, int64x2_t* v0) { *r = vcltzq_s64(*v0); }
void VcltzqF32(uint32x4_t* r, float32x4_t* v0) { *r = vcltzq_f32(*v0); }
void VcltzqF64(uint64x2_t* r, float64x2_t* v0) { *r = vcltzq_f64(*v0); }
void VcltzsF32(uint32_t* r, float32_t* v0) { *r = vcltzs_f32(*v0); }
void VclzS8(int8x8_t* r, int8x8_t* v0) { *r = vclz_s8(*v0); }
void VclzS16(int16x4_t* r, int16x4_t* v0) { *r = vclz_s16(*v0); }
void VclzS32(int32x2_t* r, int32x2_t* v0) { *r = vclz_s32(*v0); }
void VclzU8(uint8x8_t* r, uint8x8_t* v0) { *r = vclz_u8(*v0); }
void VclzU16(uint16x4_t* r, uint16x4_t* v0) { *r = vclz_u16(*v0); }
void VclzU32(uint32x2_t* r, uint32x2_t* v0) { *r = vclz_u32(*v0); }
void VclzqS8(int8x16_t* r, int8x16_t* v0) { *r = vclzq_s8(*v0); }
void VclzqS16(int16x8_t* r, int16x8_t* v0) { *r = vclzq_s16(*v0); }
void VclzqS32(int32x4_t* r, int32x4_t* v0) { *r = vclzq_s32(*v0); }
void VclzqU8(uint8x16_t* r, uint8x16_t* v0) { *r = vclzq_u8(*v0); }
void VclzqU16(uint16x8_t* r, uint16x8_t* v0) { *r = vclzq_u16(*v0); }
void VclzqU32(uint32x4_t* r, uint32x4_t* v0) { *r = vclzq_u32(*v0); }
void VcntS8(int8x8_t* r, int8x8_t* v0) { *r = vcnt_s8(*v0); }
void VcntU8(uint8x8_t* r, uint8x8_t* v0) { *r = vcnt_u8(*v0); }
void VcntP8(poly8x8_t* r, poly8x8_t* v0) { *r = vcnt_p8(*v0); }
void VcntqS8(int8x16_t* r, int8x16_t* v0) { *r = vcntq_s8(*v0); }
void VcntqU8(uint8x16_t* r, uint8x16_t* v0) { *r = vcntq_u8(*v0); }
void VcntqP8(poly8x16_t* r, poly8x16_t* v0) { *r = vcntq_p8(*v0); }
void VcombineS8(int8x16_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vcombine_s8(*v0, *v1); }
void VcombineS16(int16x8_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vcombine_s16(*v0, *v1); }
void VcombineS32(int32x4_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vcombine_s32(*v0, *v1); }
void VcombineS64(int64x2_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vcombine_s64(*v0, *v1); }
void VcombineU8(uint8x16_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vcombine_u8(*v0, *v1); }
void VcombineU16(uint16x8_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vcombine_u16(*v0, *v1); }
void VcombineU32(uint32x4_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vcombine_u32(*v0, *v1); }
void VcombineU64(uint64x2_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vcombine_u64(*v0, *v1); }
void VcombineF32(float32x4_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcombine_f32(*v0, *v1); }
void VcombineF64(float64x2_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vcombine_f64(*v0, *v1); }
void VcombineP16(poly16x8_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vcombine_p16(*v0, *v1); }
void VcombineP64(poly64x2_t* r, poly64x1_t* v0, poly64x1_t* v1) { *r = vcombine_p64(*v0, *v1); }
void VcombineP8(poly8x16_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vcombine_p8(*v0, *v1); }
void VcvtF32S32(float32x2_t* r, int32x2_t* v0) { *r = vcvt_f32_s32(*v0); }
void VcvtF32U32(float32x2_t* r, uint32x2_t* v0) { *r = vcvt_f32_u32(*v0); }
void VcvtF32F64(float32x2_t* r, float64x2_t* v0) { *r = vcvt_f32_f64(*v0); }
void VcvtF64S64(float64x1_t* r, int64x1_t* v0) { *r = vcvt_f64_s64(*v0); }
void VcvtF64U64(float64x1_t* r, uint64x1_t* v0) { *r = vcvt_f64_u64(*v0); }
void VcvtF64F32(float64x2_t* r, float32x2_t* v0) { *r = vcvt_f64_f32(*v0); }
void VcvtHighF32F64(float32x4_t* r, float32x2_t* v0, float64x2_t* v1) { *r = vcvt_high_f32_f64(*v0, *v1); }
void VcvtHighF64F32(float64x2_t* r, float32x4_t* v0) { *r = vcvt_high_f64_f32(*v0); }
void VcvtS32F32(int32x2_t* r, float32x2_t* v0) { *r = vcvt_s32_f32(*v0); }
void VcvtS64F64(int64x1_t* r, float64x1_t* v0) { *r = vcvt_s64_f64(*v0); }
void VcvtU32F32(uint32x2_t* r, float32x2_t* v0) { *r = vcvt_u32_f32(*v0); }
void VcvtU64F64(uint64x1_t* r, float64x1_t* v0) { *r = vcvt_u64_f64(*v0); }
void VcvtaS32F32(int32x2_t* r, float32x2_t* v0) { *r = vcvta_s32_f32(*v0); }
void VcvtaS64F64(int64x1_t* r, float64x1_t* v0) { *r = vcvta_s64_f64(*v0); }
void VcvtaU32F32(uint32x2_t* r, float32x2_t* v0) { *r = vcvta_u32_f32(*v0); }
void VcvtaU64F64(uint64x1_t* r, float64x1_t* v0) { *r = vcvta_u64_f64(*v0); }
void VcvtadS64F64(int64_t* r, float64_t* v0) { *r = vcvtad_s64_f64(*v0); }
void VcvtadU64F64(uint64_t* r, float64_t* v0) { *r = vcvtad_u64_f64(*v0); }
void VcvtaqS32F32(int32x4_t* r, float32x4_t* v0) { *r = vcvtaq_s32_f32(*v0); }
void VcvtaqS64F64(int64x2_t* r, float64x2_t* v0) { *r = vcvtaq_s64_f64(*v0); }
void VcvtaqU32F32(uint32x4_t* r, float32x4_t* v0) { *r = vcvtaq_u32_f32(*v0); }
void VcvtaqU64F64(uint64x2_t* r, float64x2_t* v0) { *r = vcvtaq_u64_f64(*v0); }
void VcvtasS32F32(int32_t* r, float32_t* v0) { *r = vcvtas_s32_f32(*v0); }
void VcvtasU32F32(uint32_t* r, float32_t* v0) { *r = vcvtas_u32_f32(*v0); }
void VcvtdF64S64(float64_t* r, int64_t* v0) { *r = vcvtd_f64_s64(*v0); }
void VcvtdF64U64(float64_t* r, uint64_t* v0) { *r = vcvtd_f64_u64(*v0); }
void VcvtdS64F64(int64_t* r, float64_t* v0) { *r = vcvtd_s64_f64(*v0); }
void VcvtdU64F64(uint64_t* r, float64_t* v0) { *r = vcvtd_u64_f64(*v0); }
void VcvtmS32F32(int32x2_t* r, float32x2_t* v0) { *r = vcvtm_s32_f32(*v0); }
void VcvtmS64F64(int64x1_t* r, float64x1_t* v0) { *r = vcvtm_s64_f64(*v0); }
void VcvtmU32F32(uint32x2_t* r, float32x2_t* v0) { *r = vcvtm_u32_f32(*v0); }
void VcvtmU64F64(uint64x1_t* r, float64x1_t* v0) { *r = vcvtm_u64_f64(*v0); }
void VcvtmdS64F64(int64_t* r, float64_t* v0) { *r = vcvtmd_s64_f64(*v0); }
void VcvtmdU64F64(uint64_t* r, float64_t* v0) { *r = vcvtmd_u64_f64(*v0); }
void VcvtmqS32F32(int32x4_t* r, float32x4_t* v0) { *r = vcvtmq_s32_f32(*v0); }
void VcvtmqS64F64(int64x2_t* r, float64x2_t* v0) { *r = vcvtmq_s64_f64(*v0); }
void VcvtmqU32F32(uint32x4_t* r, float32x4_t* v0) { *r = vcvtmq_u32_f32(*v0); }
void VcvtmqU64F64(uint64x2_t* r, float64x2_t* v0) { *r = vcvtmq_u64_f64(*v0); }
void VcvtmsS32F32(int32_t* r, float32_t* v0) { *r = vcvtms_s32_f32(*v0); }
void VcvtmsU32F32(uint32_t* r, float32_t* v0) { *r = vcvtms_u32_f32(*v0); }
void VcvtnS32F32(int32x2_t* r, float32x2_t* v0) { *r = vcvtn_s32_f32(*v0); }
void VcvtnS64F64(int64x1_t* r, float64x1_t* v0) { *r = vcvtn_s64_f64(*v0); }
void VcvtnU32F32(uint32x2_t* r, float32x2_t* v0) { *r = vcvtn_u32_f32(*v0); }
void VcvtnU64F64(uint64x1_t* r, float64x1_t* v0) { *r = vcvtn_u64_f64(*v0); }
void VcvtndS64F64(int64_t* r, float64_t* v0) { *r = vcvtnd_s64_f64(*v0); }
void VcvtndU64F64(uint64_t* r, float64_t* v0) { *r = vcvtnd_u64_f64(*v0); }
void VcvtnqS32F32(int32x4_t* r, float32x4_t* v0) { *r = vcvtnq_s32_f32(*v0); }
void VcvtnqS64F64(int64x2_t* r, float64x2_t* v0) { *r = vcvtnq_s64_f64(*v0); }
void VcvtnqU32F32(uint32x4_t* r, float32x4_t* v0) { *r = vcvtnq_u32_f32(*v0); }
void VcvtnqU64F64(uint64x2_t* r, float64x2_t* v0) { *r = vcvtnq_u64_f64(*v0); }
void VcvtnsS32F32(int32_t* r, float32_t* v0) { *r = vcvtns_s32_f32(*v0); }
void VcvtnsU32F32(uint32_t* r, float32_t* v0) { *r = vcvtns_u32_f32(*v0); }
void VcvtpS32F32(int32x2_t* r, float32x2_t* v0) { *r = vcvtp_s32_f32(*v0); }
void VcvtpS64F64(int64x1_t* r, float64x1_t* v0) { *r = vcvtp_s64_f64(*v0); }
void VcvtpU32F32(uint32x2_t* r, float32x2_t* v0) { *r = vcvtp_u32_f32(*v0); }
void VcvtpU64F64(uint64x1_t* r, float64x1_t* v0) { *r = vcvtp_u64_f64(*v0); }
void VcvtpdS64F64(int64_t* r, float64_t* v0) { *r = vcvtpd_s64_f64(*v0); }
void VcvtpdU64F64(uint64_t* r, float64_t* v0) { *r = vcvtpd_u64_f64(*v0); }
void VcvtpqS32F32(int32x4_t* r, float32x4_t* v0) { *r = vcvtpq_s32_f32(*v0); }
void VcvtpqS64F64(int64x2_t* r, float64x2_t* v0) { *r = vcvtpq_s64_f64(*v0); }
void VcvtpqU32F32(uint32x4_t* r, float32x4_t* v0) { *r = vcvtpq_u32_f32(*v0); }
void VcvtpqU64F64(uint64x2_t* r, float64x2_t* v0) { *r = vcvtpq_u64_f64(*v0); }
void VcvtpsS32F32(int32_t* r, float32_t* v0) { *r = vcvtps_s32_f32(*v0); }
void VcvtpsU32F32(uint32_t* r, float32_t* v0) { *r = vcvtps_u32_f32(*v0); }
void VcvtqF32S32(float32x4_t* r, int32x4_t* v0) { *r = vcvtq_f32_s32(*v0); }
void VcvtqF32U32(float32x4_t* r, uint32x4_t* v0) { *r = vcvtq_f32_u32(*v0); }
void VcvtqF64S64(float64x2_t* r, int64x2_t* v0) { *r = vcvtq_f64_s64(*v0); }
void VcvtqF64U64(float64x2_t* r, uint64x2_t* v0) { *r = vcvtq_f64_u64(*v0); }
void VcvtqS32F32(int32x4_t* r, float32x4_t* v0) { *r = vcvtq_s32_f32(*v0); }
void VcvtqS64F64(int64x2_t* r, float64x2_t* v0) { *r = vcvtq_s64_f64(*v0); }
void VcvtqU32F32(uint32x4_t* r, float32x4_t* v0) { *r = vcvtq_u32_f32(*v0); }
void VcvtqU64F64(uint64x2_t* r, float64x2_t* v0) { *r = vcvtq_u64_f64(*v0); }
void VcvtsF32S32(float32_t* r, int32_t* v0) { *r = vcvts_f32_s32(*v0); }
void VcvtsF32U32(float32_t* r, uint32_t* v0) { *r = vcvts_f32_u32(*v0); }
void VcvtsS32F32(int32_t* r, float32_t* v0) { *r = vcvts_s32_f32(*v0); }
void VcvtsU32F32(uint32_t* r, float32_t* v0) { *r = vcvts_u32_f32(*v0); }
void VcvtxF32F64(float32x2_t* r, float64x2_t* v0) { *r = vcvtx_f32_f64(*v0); }
void VcvtxHighF32F64(float32x4_t* r, float32x2_t* v0, float64x2_t* v1) { *r = vcvtx_high_f32_f64(*v0, *v1); }
void VcvtxdF32F64(float32_t* r, float64_t* v0) { *r = vcvtxd_f32_f64(*v0); }
void VdivF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vdiv_f32(*v0, *v1); }
void VdivF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vdiv_f64(*v0, *v1); }
void VdivqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vdivq_f32(*v0, *v1); }
void VdivqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vdivq_f64(*v0, *v1); }
void VdotS32(int32x2_t* r, int32x2_t* v0, int8x8_t* v1, int8x8_t* v2) { *r = vdot_s32(*v0, *v1, *v2); }
void VdotU32(uint32x2_t* r, uint32x2_t* v0, uint8x8_t* v1, uint8x8_t* v2) { *r = vdot_u32(*v0, *v1, *v2); }
void VdotqS32(int32x4_t* r, int32x4_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vdotq_s32(*v0, *v1, *v2); }
void VdotqU32(uint32x4_t* r, uint32x4_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vdotq_u32(*v0, *v1, *v2); }
void VdupNS8(int8x8_t* r, int8_t* v0) { *r = vdup_n_s8(*v0); }
void VdupNS16(int16x4_t* r, int16_t* v0) { *r = vdup_n_s16(*v0); }
void VdupNS32(int32x2_t* r, int32_t* v0) { *r = vdup_n_s32(*v0); }
void VdupNS64(int64x1_t* r, int64_t* v0) { *r = vdup_n_s64(*v0); }
void VdupNU8(uint8x8_t* r, uint8_t* v0) { *r = vdup_n_u8(*v0); }
void VdupNU16(uint16x4_t* r, uint16_t* v0) { *r = vdup_n_u16(*v0); }
void VdupNU32(uint32x2_t* r, uint32_t* v0) { *r = vdup_n_u32(*v0); }
void VdupNU64(uint64x1_t* r, uint64_t* v0) { *r = vdup_n_u64(*v0); }
void VdupNF32(float32x2_t* r, float32_t* v0) { *r = vdup_n_f32(*v0); }
void VdupNF64(float64x1_t* r, float64_t* v0) { *r = vdup_n_f64(*v0); }
void VdupNP16(poly16x4_t* r, poly16_t* v0) { *r = vdup_n_p16(*v0); }
void VdupNP64(poly64x1_t* r, poly64_t* v0) { *r = vdup_n_p64(*v0); }
void VdupNP8(poly8x8_t* r, poly8_t* v0) { *r = vdup_n_p8(*v0); }
void VdupqNS8(int8x16_t* r, int8_t* v0) { *r = vdupq_n_s8(*v0); }
void VdupqNS16(int16x8_t* r, int16_t* v0) { *r = vdupq_n_s16(*v0); }
void VdupqNS32(int32x4_t* r, int32_t* v0) { *r = vdupq_n_s32(*v0); }
void VdupqNS64(int64x2_t* r, int64_t* v0) { *r = vdupq_n_s64(*v0); }
void VdupqNU8(uint8x16_t* r, uint8_t* v0) { *r = vdupq_n_u8(*v0); }
void VdupqNU16(uint16x8_t* r, uint16_t* v0) { *r = vdupq_n_u16(*v0); }
void VdupqNU32(uint32x4_t* r, uint32_t* v0) { *r = vdupq_n_u32(*v0); }
void VdupqNU64(uint64x2_t* r, uint64_t* v0) { *r = vdupq_n_u64(*v0); }
void VdupqNF32(float32x4_t* r, float32_t* v0) { *r = vdupq_n_f32(*v0); }
void VdupqNF64(float64x2_t* r, float64_t* v0) { *r = vdupq_n_f64(*v0); }
void VdupqNP16(poly16x8_t* r, poly16_t* v0) { *r = vdupq_n_p16(*v0); }
void VdupqNP64(poly64x2_t* r, poly64_t* v0) { *r = vdupq_n_p64(*v0); }
void VdupqNP8(poly8x16_t* r, poly8_t* v0) { *r = vdupq_n_p8(*v0); }
void VeorS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = veor_s8(*v0, *v1); }
void VeorS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = veor_s16(*v0, *v1); }
void VeorS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = veor_s32(*v0, *v1); }
void VeorS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = veor_s64(*v0, *v1); }
void VeorU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = veor_u8(*v0, *v1); }
void VeorU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = veor_u16(*v0, *v1); }
void VeorU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = veor_u32(*v0, *v1); }
void VeorU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = veor_u64(*v0, *v1); }
void Veor3QS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = veor3q_s8(*v0, *v1, *v2); }
void Veor3QS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = veor3q_s16(*v0, *v1, *v2); }
void Veor3QS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = veor3q_s32(*v0, *v1, *v2); }
void Veor3QS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1, int64x2_t* v2) { *r = veor3q_s64(*v0, *v1, *v2); }
void Veor3QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = veor3q_u8(*v0, *v1, *v2); }
void Veor3QU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = veor3q_u16(*v0, *v1, *v2); }
void Veor3QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = veor3q_u32(*v0, *v1, *v2); }
void Veor3QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = veor3q_u64(*v0, *v1, *v2); }
void VeorqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = veorq_s8(*v0, *v1); }
void VeorqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = veorq_s16(*v0, *v1); }
void VeorqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = veorq_s32(*v0, *v1); }
void VeorqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = veorq_s64(*v0, *v1); }
void VeorqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = veorq_u8(*v0, *v1); }
void VeorqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = veorq_u16(*v0, *v1); }
void VeorqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = veorq_u32(*v0, *v1); }
void VeorqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = veorq_u64(*v0, *v1); }
void VfmaF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1, float32x2_t* v2) { *r = vfma_f32(*v0, *v1, *v2); }
void VfmaF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1, float64x1_t* v2) { *r = vfma_f64(*v0, *v1, *v2); }
void VfmaNF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1, float32_t* v2) { *r = vfma_n_f32(*v0, *v1, *v2); }
void VfmaNF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1, float64_t* v2) { *r = vfma_n_f64(*v0, *v1, *v2); }
void VfmaqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1, float32x4_t* v2) { *r = vfmaq_f32(*v0, *v1, *v2); }
void VfmaqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1, float64x2_t* v2) { *r = vfmaq_f64(*v0, *v1, *v2); }
void VfmaqNF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1, float32_t* v2) { *r = vfmaq_n_f32(*v0, *v1, *v2); }
void VfmaqNF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1, float64_t* v2) { *r = vfmaq_n_f64(*v0, *v1, *v2); }
void VfmsF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1, float32x2_t* v2) { *r = vfms_f32(*v0, *v1, *v2); }
void VfmsF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1, float64x1_t* v2) { *r = vfms_f64(*v0, *v1, *v2); }
void VfmsNF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1, float32_t* v2) { *r = vfms_n_f32(*v0, *v1, *v2); }
void VfmsNF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1, float64_t* v2) { *r = vfms_n_f64(*v0, *v1, *v2); }
void VfmsqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1, float32x4_t* v2) { *r = vfmsq_f32(*v0, *v1, *v2); }
void VfmsqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1, float64x2_t* v2) { *r = vfmsq_f64(*v0, *v1, *v2); }
void VfmsqNF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1, float32_t* v2) { *r = vfmsq_n_f32(*v0, *v1, *v2); }
void VfmsqNF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1, float64_t* v2) { *r = vfmsq_n_f64(*v0, *v1, *v2); }
void VgetHighS8(int8x8_t* r, int8x16_t* v0) { *r = vget_high_s8(*v0); }
void VgetHighS16(int16x4_t* r, int16x8_t* v0) { *r = vget_high_s16(*v0); }
void VgetHighS32(int32x2_t* r, int32x4_t* v0) { *r = vget_high_s32(*v0); }
void VgetHighS64(int64x1_t* r, int64x2_t* v0) { *r = vget_high_s64(*v0); }
void VgetHighU8(uint8x8_t* r, uint8x16_t* v0) { *r = vget_high_u8(*v0); }
void VgetHighU16(uint16x4_t* r, uint16x8_t* v0) { *r = vget_high_u16(*v0); }
void VgetHighU32(uint32x2_t* r, uint32x4_t* v0) { *r = vget_high_u32(*v0); }
void VgetHighU64(uint64x1_t* r, uint64x2_t* v0) { *r = vget_high_u64(*v0); }
void VgetHighF32(float32x2_t* r, float32x4_t* v0) { *r = vget_high_f32(*v0); }
void VgetHighF64(float64x1_t* r, float64x2_t* v0) { *r = vget_high_f64(*v0); }
void VgetHighP16(poly16x4_t* r, poly16x8_t* v0) { *r = vget_high_p16(*v0); }
void VgetHighP64(poly64x1_t* r, poly64x2_t* v0) { *r = vget_high_p64(*v0); }
void VgetHighP8(poly8x8_t* r, poly8x16_t* v0) { *r = vget_high_p8(*v0); }
void VgetLowS8(int8x8_t* r, int8x16_t* v0) { *r = vget_low_s8(*v0); }
void VgetLowS16(int16x4_t* r, int16x8_t* v0) { *r = vget_low_s16(*v0); }
void VgetLowS32(int32x2_t* r, int32x4_t* v0) { *r = vget_low_s32(*v0); }
void VgetLowS64(int64x1_t* r, int64x2_t* v0) { *r = vget_low_s64(*v0); }
void VgetLowU8(uint8x8_t* r, uint8x16_t* v0) { *r = vget_low_u8(*v0); }
void VgetLowU16(uint16x4_t* r, uint16x8_t* v0) { *r = vget_low_u16(*v0); }
void VgetLowU32(uint32x2_t* r, uint32x4_t* v0) { *r = vget_low_u32(*v0); }
void VgetLowU64(uint64x1_t* r, uint64x2_t* v0) { *r = vget_low_u64(*v0); }
void VgetLowF32(float32x2_t* r, float32x4_t* v0) { *r = vget_low_f32(*v0); }
void VgetLowF64(float64x1_t* r, float64x2_t* v0) { *r = vget_low_f64(*v0); }
void VgetLowP16(poly16x4_t* r, poly16x8_t* v0) { *r = vget_low_p16(*v0); }
void VgetLowP64(poly64x1_t* r, poly64x2_t* v0) { *r = vget_low_p64(*v0); }
void VgetLowP8(poly8x8_t* r, poly8x16_t* v0) { *r = vget_low_p8(*v0); }
void VhaddS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vhadd_s8(*v0, *v1); }
void VhaddS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vhadd_s16(*v0, *v1); }
void VhaddS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vhadd_s32(*v0, *v1); }
void VhaddU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vhadd_u8(*v0, *v1); }
void VhaddU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vhadd_u16(*v0, *v1); }
void VhaddU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vhadd_u32(*v0, *v1); }
void VhaddqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vhaddq_s8(*v0, *v1); }
void VhaddqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vhaddq_s16(*v0, *v1); }
void VhaddqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vhaddq_s32(*v0, *v1); }
void VhaddqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vhaddq_u8(*v0, *v1); }
void VhaddqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vhaddq_u16(*v0, *v1); }
void VhaddqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vhaddq_u32(*v0, *v1); }
void VhsubS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vhsub_s8(*v0, *v1); }
void VhsubS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vhsub_s16(*v0, *v1); }
void VhsubS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vhsub_s32(*v0, *v1); }
void VhsubU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vhsub_u8(*v0, *v1); }
void VhsubU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vhsub_u16(*v0, *v1); }
void VhsubU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vhsub_u32(*v0, *v1); }
void VhsubqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vhsubq_s8(*v0, *v1); }
void VhsubqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vhsubq_s16(*v0, *v1); }
void VhsubqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vhsubq_s32(*v0, *v1); }
void VhsubqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vhsubq_u8(*v0, *v1); }
void VhsubqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vhsubq_u16(*v0, *v1); }
void VhsubqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vhsubq_u32(*v0, *v1); }
void VmaxS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vmax_s8(*v0, *v1); }
void VmaxS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vmax_s16(*v0, *v1); }
void VmaxS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vmax_s32(*v0, *v1); }
void VmaxU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vmax_u8(*v0, *v1); }
void VmaxU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vmax_u16(*v0, *v1); }
void VmaxU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vmax_u32(*v0, *v1); }
void VmaxF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vmax_f32(*v0, *v1); }
void VmaxF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vmax_f64(*v0, *v1); }
void VmaxnmF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vmaxnm_f32(*v0, *v1); }
void VmaxnmF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vmaxnm_f64(*v0, *v1); }
void VmaxnmqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vmaxnmq_f32(*v0, *v1); }
void VmaxnmqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vmaxnmq_f64(*v0, *v1); }
void VmaxnmvF32(float32_t* r, float32x2_t* v0) { *r = vmaxnmv_f32(*v0); }
void VmaxnmvqF32(float32_t* r, float32x4_t* v0) { *r = vmaxnmvq_f32(*v0); }
void VmaxnmvqF64(float64_t* r, float64x2_t* v0) { *r = vmaxnmvq_f64(*v0); }
void VmaxqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vmaxq_s8(*v0, *v1); }
void VmaxqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vmaxq_s16(*v0, *v1); }
void VmaxqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vmaxq_s32(*v0, *v1); }
void VmaxqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vmaxq_u8(*v0, *v1); }
void VmaxqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vmaxq_u16(*v0, *v1); }
void VmaxqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vmaxq_u32(*v0, *v1); }
void VmaxqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vmaxq_f32(*v0, *v1); }
void VmaxqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vmaxq_f64(*v0, *v1); }
void VmaxvS8(int8_t* r, int8x8_t* v0) { *r = vmaxv_s8(*v0); }
void VmaxvS16(int16_t* r, int16x4_t* v0) { *r = vmaxv_s16(*v0); }
void VmaxvS32(int32_t* r, int32x2_t* v0) { *r = vmaxv_s32(*v0); }
void VmaxvU8(uint8_t* r, uint8x8_t* v0) { *r = vmaxv_u8(*v0); }
void VmaxvU16(uint16_t* r, uint16x4_t* v0) { *r = vmaxv_u16(*v0); }
void VmaxvU32(uint32_t* r, uint32x2_t* v0) { *r = vmaxv_u32(*v0); }
void VmaxvF32(float32_t* r, float32x2_t* v0) { *r = vmaxv_f32(*v0); }
void VmaxvqS8(int8_t* r, int8x16_t* v0) { *r = vmaxvq_s8(*v0); }
void VmaxvqS16(int16_t* r, int16x8_t* v0) { *r = vmaxvq_s16(*v0); }
void VmaxvqS32(int32_t* r, int32x4_t* v0) { *r = vmaxvq_s32(*v0); }
void VmaxvqU8(uint8_t* r, uint8x16_t* v0) { *r = vmaxvq_u8(*v0); }
void VmaxvqU16(uint16_t* r, uint16x8_t* v0) { *r = vmaxvq_u16(*v0); }
void VmaxvqU32(uint32_t* r, uint32x4_t* v0) { *r = vmaxvq_u32(*v0); }
void VmaxvqF32(float32_t* r, float32x4_t* v0) { *r = vmaxvq_f32(*v0); }
void VmaxvqF64(float64_t* r, float64x2_t* v0) { *r = vmaxvq_f64(*v0); }
void VminS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vmin_s8(*v0, *v1); }
void VminS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vmin_s16(*v0, *v1); }
void VminS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vmin_s32(*v0, *v1); }
void VminU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vmin_u8(*v0, *v1); }
void VminU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vmin_u16(*v0, *v1); }
void VminU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vmin_u32(*v0, *v1); }
void VminF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vmin_f32(*v0, *v1); }
void VminF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vmin_f64(*v0, *v1); }
void VminnmF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vminnm_f32(*v0, *v1); }
void VminnmF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vminnm_f64(*v0, *v1); }
void VminnmqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vminnmq_f32(*v0, *v1); }
void VminnmqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vminnmq_f64(*v0, *v1); }
void VminnmvF32(float32_t* r, float32x2_t* v0) { *r = vminnmv_f32(*v0); }
void VminnmvqF32(float32_t* r, float32x4_t* v0) { *r = vminnmvq_f32(*v0); }
void VminnmvqF64(float64_t* r, float64x2_t* v0) { *r = vminnmvq_f64(*v0); }
void VminqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vminq_s8(*v0, *v1); }
void VminqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vminq_s16(*v0, *v1); }
void VminqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vminq_s32(*v0, *v1); }
void VminqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vminq_u8(*v0, *v1); }
void VminqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vminq_u16(*v0, *v1); }
void VminqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vminq_u32(*v0, *v1); }
void VminqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vminq_f32(*v0, *v1); }
void VminqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vminq_f64(*v0, *v1); }
void VminvS8(int8_t* r, int8x8_t* v0) { *r = vminv_s8(*v0); }
void VminvS16(int16_t* r, int16x4_t* v0) { *r = vminv_s16(*v0); }
void VminvS32(int32_t* r, int32x2_t* v0) { *r = vminv_s32(*v0); }
void VminvU8(uint8_t* r, uint8x8_t* v0) { *r = vminv_u8(*v0); }
void VminvU16(uint16_t* r, uint16x4_t* v0) { *r = vminv_u16(*v0); }
void VminvU32(uint32_t* r, uint32x2_t* v0) { *r = vminv_u32(*v0); }
void VminvF32(float32_t* r, float32x2_t* v0) { *r = vminv_f32(*v0); }
void VminvqS8(int8_t* r, int8x16_t* v0) { *r = vminvq_s8(*v0); }
void VminvqS16(int16_t* r, int16x8_t* v0) { *r = vminvq_s16(*v0); }
void VminvqS32(int32_t* r, int32x4_t* v0) { *r = vminvq_s32(*v0); }
void VminvqU8(uint8_t* r, uint8x16_t* v0) { *r = vminvq_u8(*v0); }
void VminvqU16(uint16_t* r, uint16x8_t* v0) { *r = vminvq_u16(*v0); }
void VminvqU32(uint32_t* r, uint32x4_t* v0) { *r = vminvq_u32(*v0); }
void VminvqF32(float32_t* r, float32x4_t* v0) { *r = vminvq_f32(*v0); }
void VminvqF64(float64_t* r, float64x2_t* v0) { *r = vminvq_f64(*v0); }
void VmlaS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1, int8x8_t* v2) { *r = vmla_s8(*v0, *v1, *v2); }
void VmlaS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vmla_s16(*v0, *v1, *v2); }
void VmlaS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vmla_s32(*v0, *v1, *v2); }
void VmlaU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1, uint8x8_t* v2) { *r = vmla_u8(*v0, *v1, *v2); }
void VmlaU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1, uint16x4_t* v2) { *r = vmla_u16(*v0, *v1, *v2); }
void VmlaU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1, uint32x2_t* v2) { *r = vmla_u32(*v0, *v1, *v2); }
void VmlaF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1, float32x2_t* v2) { *r = vmla_f32(*v0, *v1, *v2); }
void VmlaF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1, float64x1_t* v2) { *r = vmla_f64(*v0, *v1, *v2); }
void VmlaNS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1, int16_t* v2) { *r = vmla_n_s16(*v0, *v1, *v2); }
void VmlaNS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1, int32_t* v2) { *r = vmla_n_s32(*v0, *v1, *v2); }
void VmlaNU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1, uint16_t* v2) { *r = vmla_n_u16(*v0, *v1, *v2); }
void VmlaNU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1, uint32_t* v2) { *r = vmla_n_u32(*v0, *v1, *v2); }
void VmlaNF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1, float32_t* v2) { *r = vmla_n_f32(*v0, *v1, *v2); }
void VmlalS8(int16x8_t* r, int16x8_t* v0, int8x8_t* v1, int8x8_t* v2) { *r = vmlal_s8(*v0, *v1, *v2); }
void VmlalS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vmlal_s16(*v0, *v1, *v2); }
void VmlalS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vmlal_s32(*v0, *v1, *v2); }
void VmlalU8(uint16x8_t* r, uint16x8_t* v0, uint8x8_t* v1, uint8x8_t* v2) { *r = vmlal_u8(*v0, *v1, *v2); }
void VmlalU16(uint32x4_t* r, uint32x4_t* v0, uint16x4_t* v1, uint16x4_t* v2) { *r = vmlal_u16(*v0, *v1, *v2); }
void VmlalU32(uint64x2_t* r, uint64x2_t* v0, uint32x2_t* v1, uint32x2_t* v2) { *r = vmlal_u32(*v0, *v1, *v2); }
void VmlalHighS8(int16x8_t* r, int16x8_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vmlal_high_s8(*v0, *v1, *v2); }
void VmlalHighS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vmlal_high_s16(*v0, *v1, *v2); }
void VmlalHighS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vmlal_high_s32(*v0, *v1, *v2); }
void VmlalHighU8(uint16x8_t* r, uint16x8_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vmlal_high_u8(*v0, *v1, *v2); }
void VmlalHighU16(uint32x4_t* r, uint32x4_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vmlal_high_u16(*v0, *v1, *v2); }
void VmlalHighU32(uint64x2_t* r, uint64x2_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vmlal_high_u32(*v0, *v1, *v2); }
void VmlalHighNS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1, int16_t* v2) { *r = vmlal_high_n_s16(*v0, *v1, *v2); }
void VmlalHighNS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1, int32_t* v2) { *r = vmlal_high_n_s32(*v0, *v1, *v2); }
void VmlalHighNU16(uint32x4_t* r, uint32x4_t* v0, uint16x8_t* v1, uint16_t* v2) { *r = vmlal_high_n_u16(*v0, *v1, *v2); }
void VmlalHighNU32(uint64x2_t* r, uint64x2_t* v0, uint32x4_t* v1, uint32_t* v2) { *r = vmlal_high_n_u32(*v0, *v1, *v2); }
void VmlalNS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1, int16_t* v2) { *r = vmlal_n_s16(*v0, *v1, *v2); }
void VmlalNS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1, int32_t* v2) { *r = vmlal_n_s32(*v0, *v1, *v2); }
void VmlalNU16(uint32x4_t* r, uint32x4_t* v0, uint16x4_t* v1, uint16_t* v2) { *r = vmlal_n_u16(*v0, *v1, *v2); }
void VmlalNU32(uint64x2_t* r, uint64x2_t* v0, uint32x2_t* v1, uint32_t* v2) { *r = vmlal_n_u32(*v0, *v1, *v2); }
void VmlaqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vmlaq_s8(*v0, *v1, *v2); }
void VmlaqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vmlaq_s16(*v0, *v1, *v2); }
void VmlaqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vmlaq_s32(*v0, *v1, *v2); }
void VmlaqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vmlaq_u8(*v0, *v1, *v2); }
void VmlaqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vmlaq_u16(*v0, *v1, *v2); }
void VmlaqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vmlaq_u32(*v0, *v1, *v2); }
void VmlaqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1, float32x4_t* v2) { *r = vmlaq_f32(*v0, *v1, *v2); }
void VmlaqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1, float64x2_t* v2) { *r = vmlaq_f64(*v0, *v1, *v2); }
void VmlaqNS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1, int16_t* v2) { *r = vmlaq_n_s16(*v0, *v1, *v2); }
void VmlaqNS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1, int32_t* v2) { *r = vmlaq_n_s32(*v0, *v1, *v2); }
void VmlaqNU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1, uint16_t* v2) { *r = vmlaq_n_u16(*v0, *v1, *v2); }
void VmlaqNU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32_t* v2) { *r = vmlaq_n_u32(*v0, *v1, *v2); }
void VmlaqNF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1, float32_t* v2) { *r = vmlaq_n_f32(*v0, *v1, *v2); }
void VmlsS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1, int8x8_t* v2) { *r = vmls_s8(*v0, *v1, *v2); }
void VmlsS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vmls_s16(*v0, *v1, *v2); }
void VmlsS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vmls_s32(*v0, *v1, *v2); }
void VmlsU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1, uint8x8_t* v2) { *r = vmls_u8(*v0, *v1, *v2); }
void VmlsU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1, uint16x4_t* v2) { *r = vmls_u16(*v0, *v1, *v2); }
void VmlsU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1, uint32x2_t* v2) { *r = vmls_u32(*v0, *v1, *v2); }
void VmlsF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1, float32x2_t* v2) { *r = vmls_f32(*v0, *v1, *v2); }
void VmlsF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1, float64x1_t* v2) { *r = vmls_f64(*v0, *v1, *v2); }
void VmlsNS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1, int16_t* v2) { *r = vmls_n_s16(*v0, *v1, *v2); }
void VmlsNS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1, int32_t* v2) { *r = vmls_n_s32(*v0, *v1, *v2); }
void VmlsNU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1, uint16_t* v2) { *r = vmls_n_u16(*v0, *v1, *v2); }
void VmlsNU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1, uint32_t* v2) { *r = vmls_n_u32(*v0, *v1, *v2); }
void VmlsNF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1, float32_t* v2) { *r = vmls_n_f32(*v0, *v1, *v2); }
void VmlslS8(int16x8_t* r, int16x8_t* v0, int8x8_t* v1, int8x8_t* v2) { *r = vmlsl_s8(*v0, *v1, *v2); }
void VmlslS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vmlsl_s16(*v0, *v1, *v2); }
void VmlslS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vmlsl_s32(*v0, *v1, *v2); }
void VmlslU8(uint16x8_t* r, uint16x8_t* v0, uint8x8_t* v1, uint8x8_t* v2) { *r = vmlsl_u8(*v0, *v1, *v2); }
void VmlslU16(uint32x4_t* r, uint32x4_t* v0, uint16x4_t* v1, uint16x4_t* v2) { *r = vmlsl_u16(*v0, *v1, *v2); }
void VmlslU32(uint64x2_t* r, uint64x2_t* v0, uint32x2_t* v1, uint32x2_t* v2) { *r = vmlsl_u32(*v0, *v1, *v2); }
void VmlslHighS8(int16x8_t* r, int16x8_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vmlsl_high_s8(*v0, *v1, *v2); }
void VmlslHighS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vmlsl_high_s16(*v0, *v1, *v2); }
void VmlslHighS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vmlsl_high_s32(*v0, *v1, *v2); }
void VmlslHighU8(uint16x8_t* r, uint16x8_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vmlsl_high_u8(*v0, *v1, *v2); }
void VmlslHighU16(uint32x4_t* r, uint32x4_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vmlsl_high_u16(*v0, *v1, *v2); }
void VmlslHighU32(uint64x2_t* r, uint64x2_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vmlsl_high_u32(*v0, *v1, *v2); }
void VmlslHighNS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1, int16_t* v2) { *r = vmlsl_high_n_s16(*v0, *v1, *v2); }
void VmlslHighNS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1, int32_t* v2) { *r = vmlsl_high_n_s32(*v0, *v1, *v2); }
void VmlslHighNU16(uint32x4_t* r, uint32x4_t* v0, uint16x8_t* v1, uint16_t* v2) { *r = vmlsl_high_n_u16(*v0, *v1, *v2); }
void VmlslHighNU32(uint64x2_t* r, uint64x2_t* v0, uint32x4_t* v1, uint32_t* v2) { *r = vmlsl_high_n_u32(*v0, *v1, *v2); }
void VmlslNS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1, int16_t* v2) { *r = vmlsl_n_s16(*v0, *v1, *v2); }
void VmlslNS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1, int32_t* v2) { *r = vmlsl_n_s32(*v0, *v1, *v2); }
void VmlslNU16(uint32x4_t* r, uint32x4_t* v0, uint16x4_t* v1, uint16_t* v2) { *r = vmlsl_n_u16(*v0, *v1, *v2); }
void VmlslNU32(uint64x2_t* r, uint64x2_t* v0, uint32x2_t* v1, uint32_t* v2) { *r = vmlsl_n_u32(*v0, *v1, *v2); }
void VmlsqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vmlsq_s8(*v0, *v1, *v2); }
void VmlsqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vmlsq_s16(*v0, *v1, *v2); }
void VmlsqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vmlsq_s32(*v0, *v1, *v2); }
void VmlsqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vmlsq_u8(*v0, *v1, *v2); }
void VmlsqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vmlsq_u16(*v0, *v1, *v2); }
void VmlsqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vmlsq_u32(*v0, *v1, *v2); }
void VmlsqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1, float32x4_t* v2) { *r = vmlsq_f32(*v0, *v1, *v2); }
void VmlsqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1, float64x2_t* v2) { *r = vmlsq_f64(*v0, *v1, *v2); }
void VmlsqNS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1, int16_t* v2) { *r = vmlsq_n_s16(*v0, *v1, *v2); }
void VmlsqNS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1, int32_t* v2) { *r = vmlsq_n_s32(*v0, *v1, *v2); }
void VmlsqNU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1, uint16_t* v2) { *r = vmlsq_n_u16(*v0, *v1, *v2); }
void VmlsqNU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32_t* v2) { *r = vmlsq_n_u32(*v0, *v1, *v2); }
void VmlsqNF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1, float32_t* v2) { *r = vmlsq_n_f32(*v0, *v1, *v2); }
void VmmlaqS32(int32x4_t* r, int32x4_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vmmlaq_s32(*v0, *v1, *v2); }
void VmmlaqU32(uint32x4_t* r, uint32x4_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vmmlaq_u32(*v0, *v1, *v2); }
void VmovNS8(int8x8_t* r, int8_t* v0) { *r = vmov_n_s8(*v0); }
void VmovNS16(int16x4_t* r, int16_t* v0) { *r = vmov_n_s16(*v0); }
void VmovNS32(int32x2_t* r, int32_t* v0) { *r = vmov_n_s32(*v0); }
void VmovNS64(int64x1_t* r, int64_t* v0) { *r = vmov_n_s64(*v0); }
void VmovNU8(uint8x8_t* r, uint8_t* v0) { *r = vmov_n_u8(*v0); }
void VmovNU16(uint16x4_t* r, uint16_t* v0) { *r = vmov_n_u16(*v0); }
void VmovNU32(uint32x2_t* r, uint32_t* v0) { *r = vmov_n_u32(*v0); }
void VmovNU64(uint64x1_t* r, uint64_t* v0) { *r = vmov_n_u64(*v0); }
void VmovNF32(float32x2_t* r, float32_t* v0) { *r = vmov_n_f32(*v0); }
void VmovNF64(float64x1_t* r, float64_t* v0) { *r = vmov_n_f64(*v0); }
void VmovNP16(poly16x4_t* r, poly16_t* v0) { *r = vmov_n_p16(*v0); }
void VmovNP64(poly64x1_t* r, poly64_t* v0) { *r = vmov_n_p64(*v0); }
void VmovNP8(poly8x8_t* r, poly8_t* v0) { *r = vmov_n_p8(*v0); }
void VmovlS8(int16x8_t* r, int8x8_t* v0) { *r = vmovl_s8(*v0); }
void VmovlS16(int32x4_t* r, int16x4_t* v0) { *r = vmovl_s16(*v0); }
void VmovlS32(int64x2_t* r, int32x2_t* v0) { *r = vmovl_s32(*v0); }
void VmovlU8(uint16x8_t* r, uint8x8_t* v0) { *r = vmovl_u8(*v0); }
void VmovlU16(uint32x4_t* r, uint16x4_t* v0) { *r = vmovl_u16(*v0); }
void VmovlU32(uint64x2_t* r, uint32x2_t* v0) { *r = vmovl_u32(*v0); }
void VmovlHighS8(int16x8_t* r, int8x16_t* v0) { *r = vmovl_high_s8(*v0); }
void VmovlHighS16(int32x4_t* r, int16x8_t* v0) { *r = vmovl_high_s16(*v0); }
void VmovlHighS32(int64x2_t* r, int32x4_t* v0) { *r = vmovl_high_s32(*v0); }
void VmovlHighU8(uint16x8_t* r, uint8x16_t* v0) { *r = vmovl_high_u8(*v0); }
void VmovlHighU16(uint32x4_t* r, uint16x8_t* v0) { *r = vmovl_high_u16(*v0); }
void VmovlHighU32(uint64x2_t* r, uint32x4_t* v0) { *r = vmovl_high_u32(*v0); }
void VmovnS16(int8x8_t* r, int16x8_t* v0) { *r = vmovn_s16(*v0); }
void VmovnS32(int16x4_t* r, int32x4_t* v0) { *r = vmovn_s32(*v0); }
void VmovnS64(int32x2_t* r, int64x2_t* v0) { *r = vmovn_s64(*v0); }
void VmovnU16(uint8x8_t* r, uint16x8_t* v0) { *r = vmovn_u16(*v0); }
void VmovnU32(uint16x4_t* r, uint32x4_t* v0) { *r = vmovn_u32(*v0); }
void VmovnU64(uint32x2_t* r, uint64x2_t* v0) { *r = vmovn_u64(*v0); }
void VmovnHighS16(int8x16_t* r, int8x8_t* v0, int16x8_t* v1) { *r = vmovn_high_s16(*v0, *v1); }
void VmovnHighS32(int16x8_t* r, int16x4_t* v0, int32x4_t* v1) { *r = vmovn_high_s32(*v0, *v1); }
void VmovnHighS64(int32x4_t* r, int32x2_t* v0, int64x2_t* v1) { *r = vmovn_high_s64(*v0, *v1); }
void VmovnHighU16(uint8x16_t* r, uint8x8_t* v0, uint16x8_t* v1) { *r = vmovn_high_u16(*v0, *v1); }
void VmovnHighU32(uint16x8_t* r, uint16x4_t* v0, uint32x4_t* v1) { *r = vmovn_high_u32(*v0, *v1); }
void VmovnHighU64(uint32x4_t* r, uint32x2_t* v0, uint64x2_t* v1) { *r = vmovn_high_u64(*v0, *v1); }
void VmovqNS8(int8x16_t* r, int8_t* v0) { *r = vmovq_n_s8(*v0); }
void VmovqNS16(int16x8_t* r, int16_t* v0) { *r = vmovq_n_s16(*v0); }
void VmovqNS32(int32x4_t* r, int32_t* v0) { *r = vmovq_n_s32(*v0); }
void VmovqNS64(int64x2_t* r, int64_t* v0) { *r = vmovq_n_s64(*v0); }
void VmovqNU8(uint8x16_t* r, uint8_t* v0) { *r = vmovq_n_u8(*v0); }
void VmovqNU16(uint16x8_t* r, uint16_t* v0) { *r = vmovq_n_u16(*v0); }
void VmovqNU32(uint32x4_t* r, uint32_t* v0) { *r = vmovq_n_u32(*v0); }
void VmovqNU64(uint64x2_t* r, uint64_t* v0) { *r = vmovq_n_u64(*v0); }
void VmovqNF32(float32x4_t* r, float32_t* v0) { *r = vmovq_n_f32(*v0); }
void VmovqNF64(float64x2_t* r, float64_t* v0) { *r = vmovq_n_f64(*v0); }
void VmovqNP16(poly16x8_t* r, poly16_t* v0) { *r = vmovq_n_p16(*v0); }
void VmovqNP64(poly64x2_t* r, poly64_t* v0) { *r = vmovq_n_p64(*v0); }
void VmovqNP8(poly8x16_t* r, poly8_t* v0) { *r = vmovq_n_p8(*v0); }
void VmulS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vmul_s8(*v0, *v1); }
void VmulS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vmul_s16(*v0, *v1); }
void VmulS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vmul_s32(*v0, *v1); }
void VmulU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vmul_u8(*v0, *v1); }
void VmulU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vmul_u16(*v0, *v1); }
void VmulU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vmul_u32(*v0, *v1); }
void VmulF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vmul_f32(*v0, *v1); }
void VmulF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vmul_f64(*v0, *v1); }
void VmulNS16(int16x4_t* r, int16x4_t* v0, int16_t* v1) { *r = vmul_n_s16(*v0, *v1); }
void VmulNS32(int32x2_t* r, int32x2_t* v0, int32_t* v1) { *r = vmul_n_s32(*v0, *v1); }
void VmulNU16(uint16x4_t* r, uint16x4_t* v0, uint16_t* v1) { *r = vmul_n_u16(*v0, *v1); }
void VmulNU32(uint32x2_t* r, uint32x2_t* v0, uint32_t* v1) { *r = vmul_n_u32(*v0, *v1); }
void VmulNF32(float32x2_t* r, float32x2_t* v0, float32_t* v1) { *r = vmul_n_f32(*v0, *v1); }
void VmulNF64(float64x1_t* r, float64x1_t* v0, float64_t* v1) { *r = vmul_n_f64(*v0, *v1); }
void VmulP8(poly8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vmul_p8(*v0, *v1); }
void VmullS8(int16x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vmull_s8(*v0, *v1); }
void VmullS16(int32x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vmull_s16(*v0, *v1); }
void VmullS32(int64x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vmull_s32(*v0, *v1); }
void VmullU8(uint16x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vmull_u8(*v0, *v1); }
void VmullU16(uint32x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vmull_u16(*v0, *v1); }
void VmullU32(uint64x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vmull_u32(*v0, *v1); }
void VmullHighS8(int16x8_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vmull_high_s8(*v0, *v1); }
void VmullHighS16(int32x4_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vmull_high_s16(*v0, *v1); }
void VmullHighS32(int64x2_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vmull_high_s32(*v0, *v1); }
void VmullHighU8(uint16x8_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vmull_high_u8(*v0, *v1); }
void VmullHighU16(uint32x4_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vmull_high_u16(*v0, *v1); }
void VmullHighU32(uint64x2_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vmull_high_u32(*v0, *v1); }
void VmullHighNS16(int32x4_t* r, int16x8_t* v0, int16_t* v1) { *r = vmull_high_n_s16(*v0, *v1); }
void VmullHighNS32(int64x2_t* r, int32x4_t* v0, int32_t* v1) { *r = vmull_high_n_s32(*v0, *v1); }
void VmullHighNU16(uint32x4_t* r, uint16x8_t* v0, uint16_t* v1) { *r = vmull_high_n_u16(*v0, *v1); }
void VmullHighNU32(uint64x2_t* r, uint32x4_t* v0, uint32_t* v1) { *r = vmull_high_n_u32(*v0, *v1); }
void VmullHighP64(poly128_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vmull_high_p64(*v0, *v1); }
void VmullHighP8(poly16x8_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vmull_high_p8(*v0, *v1); }
void VmullNS16(int32x4_t* r, int16x4_t* v0, int16_t* v1) { *r = vmull_n_s16(*v0, *v1); }
void VmullNS32(int64x2_t* r, int32x2_t* v0, int32_t* v1) { *r = vmull_n_s32(*v0, *v1); }
void VmullNU16(uint32x4_t* r, uint16x4_t* v0, uint16_t* v1) { *r = vmull_n_u16(*v0, *v1); }
void VmullNU32(uint64x2_t* r, uint32x2_t* v0, uint32_t* v1) { *r = vmull_n_u32(*v0, *v1); }
void VmullP64(poly128_t* r, poly64_t* v0, poly64_t* v1) { *r = vmull_p64(*v0, *v1); }
void VmullP8(poly16x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vmull_p8(*v0, *v1); }
void VmulqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vmulq_s8(*v0, *v1); }
void VmulqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vmulq_s16(*v0, *v1); }
void VmulqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vmulq_s32(*v0, *v1); }
void VmulqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vmulq_u8(*v0, *v1); }
void VmulqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vmulq_u16(*v0, *v1); }
void VmulqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vmulq_u32(*v0, *v1); }
void VmulqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vmulq_f32(*v0, *v1); }
void VmulqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vmulq_f64(*v0, *v1); }
void VmulqNS16(int16x8_t* r, int16x8_t* v0, int16_t* v1) { *r = vmulq_n_s16(*v0, *v1); }
void VmulqNS32(int32x4_t* r, int32x4_t* v0, int32_t* v1) { *r = vmulq_n_s32(*v0, *v1); }
void VmulqNU16(uint16x8_t* r, uint16x8_t* v0, uint16_t* v1) { *r = vmulq_n_u16(*v0, *v1); }
void VmulqNU32(uint32x4_t* r, uint32x4_t* v0, uint32_t* v1) { *r = vmulq_n_u32(*v0, *v1); }
void VmulqNF32(float32x4_t* r, float32x4_t* v0, float32_t* v1) { *r = vmulq_n_f32(*v0, *v1); }
void VmulqNF64(float64x2_t* r, float64x2_t* v0, float64_t* v1) { *r = vmulq_n_f64(*v0, *v1); }
void VmulqP8(poly8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vmulq_p8(*v0, *v1); }
void VmulxF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vmulx_f32(*v0, *v1); }
void VmulxF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vmulx_f64(*v0, *v1); }
void VmulxdF64(float64_t* r, float64_t* v0, float64_t* v1) { *r = vmulxd_f64(*v0, *v1); }
void VmulxqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vmulxq_f32(*v0, *v1); }
void VmulxqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vmulxq_f64(*v0, *v1); }
void VmulxsF32(float32_t* r, float32_t* v0, float32_t* v1) { *r = vmulxs_f32(*v0, *v1); }
void VmvnS8(int8x8_t* r, int8x8_t* v0) { *r = vmvn_s8(*v0); }
void VmvnS16(int16x4_t* r, int16x4_t* v0) { *r = vmvn_s16(*v0); }
void VmvnS32(int32x2_t* r, int32x2_t* v0) { *r = vmvn_s32(*v0); }
void VmvnU8(uint8x8_t* r, uint8x8_t* v0) { *r = vmvn_u8(*v0); }
void VmvnU16(uint16x4_t* r, uint16x4_t* v0) { *r = vmvn_u16(*v0); }
void VmvnU32(uint32x2_t* r, uint32x2_t* v0) { *r = vmvn_u32(*v0); }
void VmvnP8(poly8x8_t* r, poly8x8_t* v0) { *r = vmvn_p8(*v0); }
void VmvnqS8(int8x16_t* r, int8x16_t* v0) { *r = vmvnq_s8(*v0); }
void VmvnqS16(int16x8_t* r, int16x8_t* v0) { *r = vmvnq_s16(*v0); }
void VmvnqS32(int32x4_t* r, int32x4_t* v0) { *r = vmvnq_s32(*v0); }
void VmvnqU8(uint8x16_t* r, uint8x16_t* v0) { *r = vmvnq_u8(*v0); }
void VmvnqU16(uint16x8_t* r, uint16x8_t* v0) { *r = vmvnq_u16(*v0); }
void VmvnqU32(uint32x4_t* r, uint32x4_t* v0) { *r = vmvnq_u32(*v0); }
void VmvnqP8(poly8x16_t* r, poly8x16_t* v0) { *r = vmvnq_p8(*v0); }
void VnegS8(int8x8_t* r, int8x8_t* v0) { *r = vneg_s8(*v0); }
void VnegS16(int16x4_t* r, int16x4_t* v0) { *r = vneg_s16(*v0); }
void VnegS32(int32x2_t* r, int32x2_t* v0) { *r = vneg_s32(*v0); }
void VnegS64(int64x1_t* r, int64x1_t* v0) { *r = vneg_s64(*v0); }
void VnegF32(float32x2_t* r, float32x2_t* v0) { *r = vneg_f32(*v0); }
void VnegF64(float64x1_t* r, float64x1_t* v0) { *r = vneg_f64(*v0); }
void VnegdS64(int64_t* r, int64_t* v0) { *r = vnegd_s64(*v0); }
void VnegqS8(int8x16_t* r, int8x16_t* v0) { *r = vnegq_s8(*v0); }
void VnegqS16(int16x8_t* r, int16x8_t* v0) { *r = vnegq_s16(*v0); }
void VnegqS32(int32x4_t* r, int32x4_t* v0) { *r = vnegq_s32(*v0); }
void VnegqS64(int64x2_t* r, int64x2_t* v0) { *r = vnegq_s64(*v0); }
void VnegqF32(float32x4_t* r, float32x4_t* v0) { *r = vnegq_f32(*v0); }
void VnegqF64(float64x2_t* r, float64x2_t* v0) { *r = vnegq_f64(*v0); }
void VornS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vorn_s8(*v0, *v1); }
void VornS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vorn_s16(*v0, *v1); }
void VornS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vorn_s32(*v0, *v1); }
void VornS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vorn_s64(*v0, *v1); }
void VornU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vorn_u8(*v0, *v1); }
void VornU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vorn_u16(*v0, *v1); }
void VornU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vorn_u32(*v0, *v1); }
void VornU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vorn_u64(*v0, *v1); }
void VornqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vornq_s8(*v0, *v1); }
void VornqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vornq_s16(*v0, *v1); }
void VornqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vornq_s32(*v0, *v1); }
void VornqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vornq_s64(*v0, *v1); }
void VornqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vornq_u8(*v0, *v1); }
void VornqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vornq_u16(*v0, *v1); }
void VornqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vornq_u32(*v0, *v1); }
void VornqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vornq_u64(*v0, *v1); }
void VorrS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vorr_s8(*v0, *v1); }
void VorrS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vorr_s16(*v0, *v1); }
void VorrS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vorr_s32(*v0, *v1); }
void VorrS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vorr_s64(*v0, *v1); }
void VorrU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vorr_u8(*v0, *v1); }
void VorrU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vorr_u16(*v0, *v1); }
void VorrU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vorr_u32(*v0, *v1); }
void VorrU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vorr_u64(*v0, *v1); }
void VorrqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vorrq_s8(*v0, *v1); }
void VorrqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vorrq_s16(*v0, *v1); }
void VorrqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vorrq_s32(*v0, *v1); }
void VorrqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vorrq_s64(*v0, *v1); }
void VorrqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vorrq_u8(*v0, *v1); }
void VorrqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vorrq_u16(*v0, *v1); }
void VorrqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vorrq_u32(*v0, *v1); }
void VorrqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vorrq_u64(*v0, *v1); }
void VpadalS8(int16x4_t* r, int16x4_t* v0, int8x8_t* v1) { *r = vpadal_s8(*v0, *v1); }
void VpadalS16(int32x2_t* r, int32x2_t* v0, int16x4_t* v1) { *r = vpadal_s16(*v0, *v1); }
void VpadalS32(int64x1_t* r, int64x1_t* v0, int32x2_t* v1) { *r = vpadal_s32(*v0, *v1); }
void VpadalU8(uint16x4_t* r, uint16x4_t* v0, uint8x8_t* v1) { *r = vpadal_u8(*v0, *v1); }
void VpadalU16(uint32x2_t* r, uint32x2_t* v0, uint16x4_t* v1) { *r = vpadal_u16(*v0, *v1); }
void VpadalU32(uint64x1_t* r, uint64x1_t* v0, uint32x2_t* v1) { *r = vpadal_u32(*v0, *v1); }
void VpadalqS8(int16x8_t* r, int16x8_t* v0, int8x16_t* v1) { *r = vpadalq_s8(*v0, *v1); }
void VpadalqS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1) { *r = vpadalq_s16(*v0, *v1); }
void VpadalqS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1) { *r = vpadalq_s32(*v0, *v1); }
void VpadalqU8(uint16x8_t* r, uint16x8_t* v0, uint8x16_t* v1) { *r = vpadalq_u8(*v0, *v1); }
void VpadalqU16(uint32x4_t* r, uint32x4_t* v0, uint16x8_t* v1) { *r = vpadalq_u16(*v0, *v1); }
void VpadalqU32(uint64x2_t* r, uint64x2_t* v0, uint32x4_t* v1) { *r = vpadalq_u32(*v0, *v1); }
void VpaddS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vpadd_s8(*v0, *v1); }
void VpaddS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vpadd_s16(*v0, *v1); }
void VpaddS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vpadd_s32(*v0, *v1); }
void VpaddU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vpadd_u8(*v0, *v1); }
void VpaddU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vpadd_u16(*v0, *v1); }
void VpaddU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vpadd_u32(*v0, *v1); }
void VpaddF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vpadd_f32(*v0, *v1); }
void VpadddS64(int64_t* r, int64x2_t* v0) { *r = vpaddd_s64(*v0); }
void VpadddU64(uint64_t* r, uint64x2_t* v0) { *r = vpaddd_u64(*v0); }
void VpadddF64(float64_t* r, float64x2_t* v0) { *r = vpaddd_f64(*v0); }
void VpaddlS8(int16x4_t* r, int8x8_t* v0) { *r = vpaddl_s8(*v0); }
void VpaddlS16(int32x2_t* r, int16x4_t* v0) { *r = vpaddl_s16(*v0); }
void VpaddlS32(int64x1_t* r, int32x2_t* v0) { *r = vpaddl_s32(*v0); }
void VpaddlU8(uint16x4_t* r, uint8x8_t* v0) { *r = vpaddl_u8(*v0); }
void VpaddlU16(uint32x2_t* r, uint16x4_t* v0) { *r = vpaddl_u16(*v0); }
void VpaddlU32(uint64x1_t* r, uint32x2_t* v0) { *r = vpaddl_u32(*v0); }
void VpaddlqS8(int16x8_t* r, int8x16_t* v0) { *r = vpaddlq_s8(*v0); }
void VpaddlqS16(int32x4_t* r, int16x8_t* v0) { *r = vpaddlq_s16(*v0); }
void VpaddlqS32(int64x2_t* r, int32x4_t* v0) { *r = vpaddlq_s32(*v0); }
void VpaddlqU8(uint16x8_t* r, uint8x16_t* v0) { *r = vpaddlq_u8(*v0); }
void VpaddlqU16(uint32x4_t* r, uint16x8_t* v0) { *r = vpaddlq_u16(*v0); }
void VpaddlqU32(uint64x2_t* r, uint32x4_t* v0) { *r = vpaddlq_u32(*v0); }
void VpaddqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vpaddq_s8(*v0, *v1); }
void VpaddqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vpaddq_s16(*v0, *v1); }
void VpaddqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vpaddq_s32(*v0, *v1); }
void VpaddqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vpaddq_s64(*v0, *v1); }
void VpaddqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vpaddq_u8(*v0, *v1); }
void VpaddqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vpaddq_u16(*v0, *v1); }
void VpaddqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vpaddq_u32(*v0, *v1); }
void VpaddqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vpaddq_u64(*v0, *v1); }
void VpaddqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vpaddq_f32(*v0, *v1); }
void VpaddqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vpaddq_f64(*v0, *v1); }
void VpaddsF32(float32_t* r, float32x2_t* v0) { *r = vpadds_f32(*v0); }
void VpmaxS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vpmax_s8(*v0, *v1); }
void VpmaxS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vpmax_s16(*v0, *v1); }
void VpmaxS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vpmax_s32(*v0, *v1); }
void VpmaxU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vpmax_u8(*v0, *v1); }
void VpmaxU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vpmax_u16(*v0, *v1); }
void VpmaxU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vpmax_u32(*v0, *v1); }
void VpmaxF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vpmax_f32(*v0, *v1); }
void VpmaxnmF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vpmaxnm_f32(*v0, *v1); }
void VpmaxnmqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vpmaxnmq_f32(*v0, *v1); }
void VpmaxnmqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vpmaxnmq_f64(*v0, *v1); }
void VpmaxnmqdF64(float64_t* r, float64x2_t* v0) { *r = vpmaxnmqd_f64(*v0); }
void VpmaxnmsF32(float32_t* r, float32x2_t* v0) { *r = vpmaxnms_f32(*v0); }
void VpmaxqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vpmaxq_s8(*v0, *v1); }
void VpmaxqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vpmaxq_s16(*v0, *v1); }
void VpmaxqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vpmaxq_s32(*v0, *v1); }
void VpmaxqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vpmaxq_u8(*v0, *v1); }
void VpmaxqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vpmaxq_u16(*v0, *v1); }
void VpmaxqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vpmaxq_u32(*v0, *v1); }
void VpmaxqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vpmaxq_f32(*v0, *v1); }
void VpmaxqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vpmaxq_f64(*v0, *v1); }
void VpmaxqdF64(float64_t* r, float64x2_t* v0) { *r = vpmaxqd_f64(*v0); }
void VpmaxsF32(float32_t* r, float32x2_t* v0) { *r = vpmaxs_f32(*v0); }
void VpminS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vpmin_s8(*v0, *v1); }
void VpminS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vpmin_s16(*v0, *v1); }
void VpminS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vpmin_s32(*v0, *v1); }
void VpminU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vpmin_u8(*v0, *v1); }
void VpminU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vpmin_u16(*v0, *v1); }
void VpminU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vpmin_u32(*v0, *v1); }
void VpminF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vpmin_f32(*v0, *v1); }
void VpminnmF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vpminnm_f32(*v0, *v1); }
void VpminnmqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vpminnmq_f32(*v0, *v1); }
void VpminnmqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vpminnmq_f64(*v0, *v1); }
void VpminnmqdF64(float64_t* r, float64x2_t* v0) { *r = vpminnmqd_f64(*v0); }
void VpminnmsF32(float32_t* r, float32x2_t* v0) { *r = vpminnms_f32(*v0); }
void VpminqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vpminq_s8(*v0, *v1); }
void VpminqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vpminq_s16(*v0, *v1); }
void VpminqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vpminq_s32(*v0, *v1); }
void VpminqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vpminq_u8(*v0, *v1); }
void VpminqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vpminq_u16(*v0, *v1); }
void VpminqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vpminq_u32(*v0, *v1); }
void VpminqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vpminq_f32(*v0, *v1); }
void VpminqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vpminq_f64(*v0, *v1); }
void VpminqdF64(float64_t* r, float64x2_t* v0) { *r = vpminqd_f64(*v0); }
void VpminsF32(float32_t* r, float32x2_t* v0) { *r = vpmins_f32(*v0); }
void VqabsS8(int8x8_t* r, int8x8_t* v0) { *r = vqabs_s8(*v0); }
void VqabsS16(int16x4_t* r, int16x4_t* v0) { *r = vqabs_s16(*v0); }
void VqabsS32(int32x2_t* r, int32x2_t* v0) { *r = vqabs_s32(*v0); }
void VqabsS64(int64x1_t* r, int64x1_t* v0) { *r = vqabs_s64(*v0); }
void VqabsbS8(int8_t* r, int8_t* v0) { *r = vqabsb_s8(*v0); }
void VqabsdS64(int64_t* r, int64_t* v0) { *r = vqabsd_s64(*v0); }
void VqabshS16(int16_t* r, int16_t* v0) { *r = vqabsh_s16(*v0); }
void VqabsqS8(int8x16_t* r, int8x16_t* v0) { *r = vqabsq_s8(*v0); }
void VqabsqS16(int16x8_t* r, int16x8_t* v0) { *r = vqabsq_s16(*v0); }
void VqabsqS32(int32x4_t* r, int32x4_t* v0) { *r = vqabsq_s32(*v0); }
void VqabsqS64(int64x2_t* r, int64x2_t* v0) { *r = vqabsq_s64(*v0); }
void VqabssS32(int32_t* r, int32_t* v0) { *r = vqabss_s32(*v0); }
void VqaddS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vqadd_s8(*v0, *v1); }
void VqaddS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vqadd_s16(*v0, *v1); }
void VqaddS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vqadd_s32(*v0, *v1); }
void VqaddS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vqadd_s64(*v0, *v1); }
void VqaddU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vqadd_u8(*v0, *v1); }
void VqaddU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vqadd_u16(*v0, *v1); }
void VqaddU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vqadd_u32(*v0, *v1); }
void VqaddU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vqadd_u64(*v0, *v1); }
void VqaddbS8(int8_t* r, int8_t* v0, int8_t* v1) { *r = vqaddb_s8(*v0, *v1); }
void VqaddbU8(uint8_t* r, uint8_t* v0, uint8_t* v1) { *r = vqaddb_u8(*v0, *v1); }
void VqadddS64(int64_t* r, int64_t* v0, int64_t* v1) { *r = vqaddd_s64(*v0, *v1); }
void VqadddU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vqaddd_u64(*v0, *v1); }
void VqaddhS16(int16_t* r, int16_t* v0, int16_t* v1) { *r = vqaddh_s16(*v0, *v1); }
void VqaddhU16(uint16_t* r, uint16_t* v0, uint16_t* v1) { *r = vqaddh_u16(*v0, *v1); }
void VqaddqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vqaddq_s8(*v0, *v1); }
void VqaddqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vqaddq_s16(*v0, *v1); }
void VqaddqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vqaddq_s32(*v0, *v1); }
void VqaddqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vqaddq_s64(*v0, *v1); }
void VqaddqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vqaddq_u8(*v0, *v1); }
void VqaddqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vqaddq_u16(*v0, *v1); }
void VqaddqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vqaddq_u32(*v0, *v1); }
void VqaddqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vqaddq_u64(*v0, *v1); }
void VqaddsS32(int32_t* r, int32_t* v0, int32_t* v1) { *r = vqadds_s32(*v0, *v1); }
void VqaddsU32(uint32_t* r, uint32_t* v0, uint32_t* v1) { *r = vqadds_u32(*v0, *v1); }
void VqdmlalS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vqdmlal_s16(*v0, *v1, *v2); }
void VqdmlalS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vqdmlal_s32(*v0, *v1, *v2); }
void VqdmlalHighS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vqdmlal_high_s16(*v0, *v1, *v2); }
void VqdmlalHighS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vqdmlal_high_s32(*v0, *v1, *v2); }
void VqdmlalHighNS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1, int16_t* v2) { *r = vqdmlal_high_n_s16(*v0, *v1, *v2); }
void VqdmlalHighNS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1, int32_t* v2) { *r = vqdmlal_high_n_s32(*v0, *v1, *v2); }
void VqdmlalNS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1, int16_t* v2) { *r = vqdmlal_n_s16(*v0, *v1, *v2); }
void VqdmlalNS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1, int32_t* v2) { *r = vqdmlal_n_s32(*v0, *v1, *v2); }
void VqdmlalhS16(int32_t* r, int32_t* v0, int16_t* v1, int16_t* v2) { *r = vqdmlalh_s16(*v0, *v1, *v2); }
void VqdmlalsS32(int64_t* r, int64_t* v0, int32_t* v1, int32_t* v2) { *r = vqdmlals_s32(*v0, *v1, *v2); }
void VqdmlslS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vqdmlsl_s16(*v0, *v1, *v2); }
void VqdmlslS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vqdmlsl_s32(*v0, *v1, *v2); }
void VqdmlslHighS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vqdmlsl_high_s16(*v0, *v1, *v2); }
void VqdmlslHighS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vqdmlsl_high_s32(*v0, *v1, *v2); }
void VqdmlslHighNS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1, int16_t* v2) { *r = vqdmlsl_high_n_s16(*v0, *v1, *v2); }
void VqdmlslHighNS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1, int32_t* v2) { *r = vqdmlsl_high_n_s32(*v0, *v1, *v2); }
void VqdmlslNS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1, int16_t* v2) { *r = vqdmlsl_n_s16(*v0, *v1, *v2); }
void VqdmlslNS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1, int32_t* v2) { *r = vqdmlsl_n_s32(*v0, *v1, *v2); }
void VqdmlslhS16(int32_t* r, int32_t* v0, int16_t* v1, int16_t* v2) { *r = vqdmlslh_s16(*v0, *v1, *v2); }
void VqdmlslsS32(int64_t* r, int64_t* v0, int32_t* v1, int32_t* v2) { *r = vqdmlsls_s32(*v0, *v1, *v2); }
void VqdmulhS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vqdmulh_s16(*v0, *v1); }
void VqdmulhS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vqdmulh_s32(*v0, *v1); }
void VqdmulhNS16(int16x4_t* r, int16x4_t* v0, int16_t* v1) { *r = vqdmulh_n_s16(*v0, *v1); }
void VqdmulhNS32(int32x2_t* r, int32x2_t* v0, int32_t* v1) { *r = vqdmulh_n_s32(*v0, *v1); }
void VqdmulhhS16(int16_t* r, int16_t* v0, int16_t* v1) { *r = vqdmulhh_s16(*v0, *v1); }
void VqdmulhqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vqdmulhq_s16(*v0, *v1); }
void VqdmulhqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vqdmulhq_s32(*v0, *v1); }
void VqdmulhqNS16(int16x8_t* r, int16x8_t* v0, int16_t* v1) { *r = vqdmulhq_n_s16(*v0, *v1); }
void VqdmulhqNS32(int32x4_t* r, int32x4_t* v0, int32_t* v1) { *r = vqdmulhq_n_s32(*v0, *v1); }
void VqdmulhsS32(int32_t* r, int32_t* v0, int32_t* v1) { *r = vqdmulhs_s32(*v0, *v1); }
void VqdmullS16(int32x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vqdmull_s16(*v0, *v1); }
void VqdmullS32(int64x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vqdmull_s32(*v0, *v1); }
void VqdmullHighS16(int32x4_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vqdmull_high_s16(*v0, *v1); }
void VqdmullHighS32(int64x2_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vqdmull_high_s32(*v0, *v1); }
void VqdmullHighNS16(int32x4_t* r, int16x8_t* v0, int16_t* v1) { *r = vqdmull_high_n_s16(*v0, *v1); }
void VqdmullHighNS32(int64x2_t* r, int32x4_t* v0, int32_t* v1) { *r = vqdmull_high_n_s32(*v0, *v1); }
void VqdmullNS16(int32x4_t* r, int16x4_t* v0, int16_t* v1) { *r = vqdmull_n_s16(*v0, *v1); }
void VqdmullNS32(int64x2_t* r, int32x2_t* v0, int32_t* v1) { *r = vqdmull_n_s32(*v0, *v1); }
void VqdmullhS16(int32_t* r, int16_t* v0, int16_t* v1) { *r = vqdmullh_s16(*v0, *v1); }
void VqdmullsS32(int64_t* r, int32_t* v0, int32_t* v1) { *r = vqdmulls_s32(*v0, *v1); }
void VqmovnS16(int8x8_t* r, int16x8_t* v0) { *r = vqmovn_s16(*v0); }
void VqmovnS32(int16x4_t* r, int32x4_t* v0) { *r = vqmovn_s32(*v0); }
void VqmovnS64(int32x2_t* r, int64x2_t* v0) { *r = vqmovn_s64(*v0); }
void VqmovnU16(uint8x8_t* r, uint16x8_t* v0) { *r = vqmovn_u16(*v0); }
void VqmovnU32(uint16x4_t* r, uint32x4_t* v0) { *r = vqmovn_u32(*v0); }
void VqmovnU64(uint32x2_t* r, uint64x2_t* v0) { *r = vqmovn_u64(*v0); }
void VqmovnHighS16(int8x16_t* r, int8x8_t* v0, int16x8_t* v1) { *r = vqmovn_high_s16(*v0, *v1); }
void VqmovnHighS32(int16x8_t* r, int16x4_t* v0, int32x4_t* v1) { *r = vqmovn_high_s32(*v0, *v1); }
void VqmovnHighS64(int32x4_t* r, int32x2_t* v0, int64x2_t* v1) { *r = vqmovn_high_s64(*v0, *v1); }
void VqmovnHighU16(uint8x16_t* r, uint8x8_t* v0, uint16x8_t* v1) { *r = vqmovn_high_u16(*v0, *v1); }
void VqmovnHighU32(uint16x8_t* r, uint16x4_t* v0, uint32x4_t* v1) { *r = vqmovn_high_u32(*v0, *v1); }
void VqmovnHighU64(uint32x4_t* r, uint32x2_t* v0, uint64x2_t* v1) { *r = vqmovn_high_u64(*v0, *v1); }
void VqmovndS64(int32_t* r, int64_t* v0) { *r = vqmovnd_s64(*v0); }
void VqmovndU64(uint32_t* r, uint64_t* v0) { *r = vqmovnd_u64(*v0); }
void VqmovnhS16(int8_t* r, int16_t* v0) { *r = vqmovnh_s16(*v0); }
void VqmovnhU16(uint8_t* r, uint16_t* v0) { *r = vqmovnh_u16(*v0); }
void VqmovnsS32(int16_t* r, int32_t* v0) { *r = vqmovns_s32(*v0); }
void VqmovnsU32(uint16_t* r, uint32_t* v0) { *r = vqmovns_u32(*v0); }
void VqmovunS16(uint8x8_t* r, int16x8_t* v0) { *r = vqmovun_s16(*v0); }
void VqmovunS32(uint16x4_t* r, int32x4_t* v0) { *r = vqmovun_s32(*v0); }
void VqmovunS64(uint32x2_t* r, int64x2_t* v0) { *r = vqmovun_s64(*v0); }
void VqmovunHighS16(uint8x16_t* r, uint8x8_t* v0, int16x8_t* v1) { *r = vqmovun_high_s16(*v0, *v1); }
void VqmovunHighS32(uint16x8_t* r, uint16x4_t* v0, int32x4_t* v1) { *r = vqmovun_high_s32(*v0, *v1); }
void VqmovunHighS64(uint32x4_t* r, uint32x2_t* v0, int64x2_t* v1) { *r = vqmovun_high_s64(*v0, *v1); }
void VqmovundS64(uint32_t* r, int64_t* v0) { *r = vqmovund_s64(*v0); }
void VqmovunhS16(uint8_t* r, int16_t* v0) { *r = vqmovunh_s16(*v0); }
void VqmovunsS32(uint16_t* r, int32_t* v0) { *r = vqmovuns_s32(*v0); }
void VqnegS8(int8x8_t* r, int8x8_t* v0) { *r = vqneg_s8(*v0); }
void VqnegS16(int16x4_t* r, int16x4_t* v0) { *r = vqneg_s16(*v0); }
void VqnegS32(int32x2_t* r, int32x2_t* v0) { *r = vqneg_s32(*v0); }
void VqnegS64(int64x1_t* r, int64x1_t* v0) { *r = vqneg_s64(*v0); }
void VqnegbS8(int8_t* r, int8_t* v0) { *r = vqnegb_s8(*v0); }
void VqnegdS64(int64_t* r, int64_t* v0) { *r = vqnegd_s64(*v0); }
void VqneghS16(int16_t* r, int16_t* v0) { *r = vqnegh_s16(*v0); }
void VqnegqS8(int8x16_t* r, int8x16_t* v0) { *r = vqnegq_s8(*v0); }
void VqnegqS16(int16x8_t* r, int16x8_t* v0) { *r = vqnegq_s16(*v0); }
void VqnegqS32(int32x4_t* r, int32x4_t* v0) { *r = vqnegq_s32(*v0); }
void VqnegqS64(int64x2_t* r, int64x2_t* v0) { *r = vqnegq_s64(*v0); }
void VqnegsS32(int32_t* r, int32_t* v0) { *r = vqnegs_s32(*v0); }
void VqrdmlahS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vqrdmlah_s16(*v0, *v1, *v2); }
void VqrdmlahS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vqrdmlah_s32(*v0, *v1, *v2); }
void VqrdmlahhS16(int16_t* r, int16_t* v0, int16_t* v1, int16_t* v2) { *r = vqrdmlahh_s16(*v0, *v1, *v2); }
void VqrdmlahqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vqrdmlahq_s16(*v0, *v1, *v2); }
void VqrdmlahqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vqrdmlahq_s32(*v0, *v1, *v2); }
void VqrdmlahsS32(int32_t* r, int32_t* v0, int32_t* v1, int32_t* v2) { *r = vqrdmlahs_s32(*v0, *v1, *v2); }
void VqrdmlshS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vqrdmlsh_s16(*v0, *v1, *v2); }
void VqrdmlshS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vqrdmlsh_s32(*v0, *v1, *v2); }
void VqrdmlshhS16(int16_t* r, int16_t* v0, int16_t* v1, int16_t* v2) { *r = vqrdmlshh_s16(*v0, *v1, *v2); }
void VqrdmlshqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vqrdmlshq_s16(*v0, *v1, *v2); }
void VqrdmlshqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vqrdmlshq_s32(*v0, *v1, *v2); }
void VqrdmlshsS32(int32_t* r, int32_t* v0, int32_t* v1, int32_t* v2) { *r = vqrdmlshs_s32(*v0, *v1, *v2); }
void VqrdmulhS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vqrdmulh_s16(*v0, *v1); }
void VqrdmulhS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vqrdmulh_s32(*v0, *v1); }
void VqrdmulhNS16(int16x4_t* r, int16x4_t* v0, int16_t* v1) { *r = vqrdmulh_n_s16(*v0, *v1); }
void VqrdmulhNS32(int32x2_t* r, int32x2_t* v0, int32_t* v1) { *r = vqrdmulh_n_s32(*v0, *v1); }
void VqrdmulhhS16(int16_t* r, int16_t* v0, int16_t* v1) { *r = vqrdmulhh_s16(*v0, *v1); }
void VqrdmulhqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vqrdmulhq_s16(*v0, *v1); }
void VqrdmulhqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vqrdmulhq_s32(*v0, *v1); }
void VqrdmulhqNS16(int16x8_t* r, int16x8_t* v0, int16_t* v1) { *r = vqrdmulhq_n_s16(*v0, *v1); }
void VqrdmulhqNS32(int32x4_t* r, int32x4_t* v0, int32_t* v1) { *r = vqrdmulhq_n_s32(*v0, *v1); }
void VqrdmulhsS32(int32_t* r, int32_t* v0, int32_t* v1) { *r = vqrdmulhs_s32(*v0, *v1); }
void VqrshlS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vqrshl_s8(*v0, *v1); }
void VqrshlS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vqrshl_s16(*v0, *v1); }
void VqrshlS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vqrshl_s32(*v0, *v1); }
void VqrshlS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vqrshl_s64(*v0, *v1); }
void VqrshlU8(uint8x8_t* r, uint8x8_t* v0, int8x8_t* v1) { *r = vqrshl_u8(*v0, *v1); }
void VqrshlU16(uint16x4_t* r, uint16x4_t* v0, int16x4_t* v1) { *r = vqrshl_u16(*v0, *v1); }
void VqrshlU32(uint32x2_t* r, uint32x2_t* v0, int32x2_t* v1) { *r = vqrshl_u32(*v0, *v1); }
void VqrshlU64(uint64x1_t* r, uint64x1_t* v0, int64x1_t* v1) { *r = vqrshl_u64(*v0, *v1); }
void VqrshlbS8(int8_t* r, int8_t* v0, int8_t* v1) { *r = vqrshlb_s8(*v0, *v1); }
void VqrshlbU8(uint8_t* r, uint8_t* v0, int8_t* v1) { *r = vqrshlb_u8(*v0, *v1); }
void VqrshldS64(int64_t* r, int64_t* v0, int64_t* v1) { *r = vqrshld_s64(*v0, *v1); }
void VqrshldU64(uint64_t* r, uint64_t* v0, int64_t* v1) { *r = vqrshld_u64(*v0, *v1); }
void VqrshlhS16(int16_t* r, int16_t* v0, int16_t* v1) { *r = vqrshlh_s16(*v0, *v1); }
void VqrshlhU16(uint16_t* r, uint16_t* v0, int16_t* v1) { *r = vqrshlh_u16(*v0, *v1); }
void VqrshlqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vqrshlq_s8(*v0, *v1); }
void VqrshlqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vqrshlq_s16(*v0, *v1); }
void VqrshlqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vqrshlq_s32(*v0, *v1); }
void VqrshlqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vqrshlq_s64(*v0, *v1); }
void VqrshlqU8(uint8x16_t* r, uint8x16_t* v0, int8x16_t* v1) { *r = vqrshlq_u8(*v0, *v1); }
void VqrshlqU16(uint16x8_t* r, uint16x8_t* v0, int16x8_t* v1) { *r = vqrshlq_u16(*v0, *v1); }
void VqrshlqU32(uint32x4_t* r, uint32x4_t* v0, int32x4_t* v1) { *r = vqrshlq_u32(*v0, *v1); }
void VqrshlqU64(uint64x2_t* r, uint64x2_t* v0, int64x2_t* v1) { *r = vqrshlq_u64(*v0, *v1); }
void VqrshlsS32(int32_t* r, int32_t* v0, int32_t* v1) { *r = vqrshls_s32(*v0, *v1); }
void VqrshlsU32(uint32_t* r, uint32_t* v0, int32_t* v1) { *r = vqrshls_u32(*v0, *v1); }
void VqshlS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vqshl_s8(*v0, *v1); }
void VqshlS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vqshl_s16(*v0, *v1); }
void VqshlS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vqshl_s32(*v0, *v1); }
void VqshlS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vqshl_s64(*v0, *v1); }
void VqshlU8(uint8x8_t* r, uint8x8_t* v0, int8x8_t* v1) { *r = vqshl_u8(*v0, *v1); }
void VqshlU16(uint16x4_t* r, uint16x4_t* v0, int16x4_t* v1) { *r = vqshl_u16(*v0, *v1); }
void VqshlU32(uint32x2_t* r, uint32x2_t* v0, int32x2_t* v1) { *r = vqshl_u32(*v0, *v1); }
void VqshlU64(uint64x1_t* r, uint64x1_t* v0, int64x1_t* v1) { *r = vqshl_u64(*v0, *v1); }
void VqshlbS8(int8_t* r, int8_t* v0, int8_t* v1) { *r = vqshlb_s8(*v0, *v1); }
void VqshlbU8(uint8_t* r, uint8_t* v0, int8_t* v1) { *r = vqshlb_u8(*v0, *v1); }
void VqshldS64(int64_t* r, int64_t* v0, int64_t* v1) { *r = vqshld_s64(*v0, *v1); }
void VqshldU64(uint64_t* r, uint64_t* v0, int64_t* v1) { *r = vqshld_u64(*v0, *v1); }
void VqshlhS16(int16_t* r, int16_t* v0, int16_t* v1) { *r = vqshlh_s16(*v0, *v1); }
void VqshlhU16(uint16_t* r, uint16_t* v0, int16_t* v1) { *r = vqshlh_u16(*v0, *v1); }
void VqshlqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vqshlq_s8(*v0, *v1); }
void VqshlqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vqshlq_s16(*v0, *v1); }
void VqshlqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vqshlq_s32(*v0, *v1); }
void VqshlqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vqshlq_s64(*v0, *v1); }
void VqshlqU8(uint8x16_t* r, uint8x16_t* v0, int8x16_t* v1) { *r = vqshlq_u8(*v0, *v1); }
void VqshlqU16(uint16x8_t* r, uint16x8_t* v0, int16x8_t* v1) { *r = vqshlq_u16(*v0, *v1); }
void VqshlqU32(uint32x4_t* r, uint32x4_t* v0, int32x4_t* v1) { *r = vqshlq_u32(*v0, *v1); }
void VqshlqU64(uint64x2_t* r, uint64x2_t* v0, int64x2_t* v1) { *r = vqshlq_u64(*v0, *v1); }
void VqshlsS32(int32_t* r, int32_t* v0, int32_t* v1) { *r = vqshls_s32(*v0, *v1); }
void VqshlsU32(uint32_t* r, uint32_t* v0, int32_t* v1) { *r = vqshls_u32(*v0, *v1); }
void VqsubS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vqsub_s8(*v0, *v1); }
void VqsubS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vqsub_s16(*v0, *v1); }
void VqsubS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vqsub_s32(*v0, *v1); }
void VqsubS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vqsub_s64(*v0, *v1); }
void VqsubU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vqsub_u8(*v0, *v1); }
void VqsubU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vqsub_u16(*v0, *v1); }
void VqsubU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vqsub_u32(*v0, *v1); }
void VqsubU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vqsub_u64(*v0, *v1); }
void VqsubbS8(int8_t* r, int8_t* v0, int8_t* v1) { *r = vqsubb_s8(*v0, *v1); }
void VqsubbU8(uint8_t* r, uint8_t* v0, uint8_t* v1) { *r = vqsubb_u8(*v0, *v1); }
void VqsubdS64(int64_t* r, int64_t* v0, int64_t* v1) { *r = vqsubd_s64(*v0, *v1); }
void VqsubdU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vqsubd_u64(*v0, *v1); }
void VqsubhS16(int16_t* r, int16_t* v0, int16_t* v1) { *r = vqsubh_s16(*v0, *v1); }
void VqsubhU16(uint16_t* r, uint16_t* v0, uint16_t* v1) { *r = vqsubh_u16(*v0, *v1); }
void VqsubqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vqsubq_s8(*v0, *v1); }
void VqsubqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vqsubq_s16(*v0, *v1); }
void VqsubqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vqsubq_s32(*v0, *v1); }
void VqsubqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vqsubq_s64(*v0, *v1); }
void VqsubqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vqsubq_u8(*v0, *v1); }
void VqsubqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vqsubq_u16(*v0, *v1); }
void VqsubqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vqsubq_u32(*v0, *v1); }
void VqsubqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vqsubq_u64(*v0, *v1); }
void VqsubsS32(int32_t* r, int32_t* v0, int32_t* v1) { *r = vqsubs_s32(*v0, *v1); }
void VqsubsU32(uint32_t* r, uint32_t* v0, uint32_t* v1) { *r = vqsubs_u32(*v0, *v1); }
void Vqtbl1S8(int8x8_t* r, int8x16_t* v0, uint8x8_t* v1) { *r = vqtbl1_s8(*v0, *v1); }
void Vqtbl1U8(uint8x8_t* r, uint8x16_t* v0, uint8x8_t* v1) { *r = vqtbl1_u8(*v0, *v1); }
void Vqtbl1P8(poly8x8_t* r, poly8x16_t* v0, uint8x8_t* v1) { *r = vqtbl1_p8(*v0, *v1); }
void Vqtbl1QS8(int8x16_t* r, int8x16_t* v0, uint8x16_t* v1) { *r = vqtbl1q_s8(*v0, *v1); }
void Vqtbl1QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vqtbl1q_u8(*v0, *v1); }
void Vqtbl1QP8(poly8x16_t* r, poly8x16_t* v0, uint8x16_t* v1) { *r = vqtbl1q_p8(*v0, *v1); }
void Vqtbl2S8(int8x8_t* r, int8x16x2_t* v0, uint8x8_t* v1) { *r = vqtbl2_s8(*v0, *v1); }
void Vqtbl2U8(uint8x8_t* r, uint8x16x2_t* v0, uint8x8_t* v1) { *r = vqtbl2_u8(*v0, *v1); }
void Vqtbl2P8(poly8x8_t* r, poly8x16x2_t* v0, uint8x8_t* v1) { *r = vqtbl2_p8(*v0, *v1); }
void Vqtbl2QS8(int8x16_t* r, int8x16x2_t* v0, uint8x16_t* v1) { *r = vqtbl2q_s8(*v0, *v1); }
void Vqtbl2QU8(uint8x16_t* r, uint8x16x2_t* v0, uint8x16_t* v1) { *r = vqtbl2q_u8(*v0, *v1); }
void Vqtbl2QP8(poly8x16_t* r, poly8x16x2_t* v0, uint8x16_t* v1) { *r = vqtbl2q_p8(*v0, *v1); }
void Vqtbl3S8(int8x8_t* r, int8x16x3_t* v0, uint8x8_t* v1) { *r = vqtbl3_s8(*v0, *v1); }
void Vqtbl3U8(uint8x8_t* r, uint8x16x3_t* v0, uint8x8_t* v1) { *r = vqtbl3_u8(*v0, *v1); }
void Vqtbl3P8(poly8x8_t* r, poly8x16x3_t* v0, uint8x8_t* v1) { *r = vqtbl3_p8(*v0, *v1); }
void Vqtbl3QS8(int8x16_t* r, int8x16x3_t* v0, uint8x16_t* v1) { *r = vqtbl3q_s8(*v0, *v1); }
void Vqtbl3QU8(uint8x16_t* r, uint8x16x3_t* v0, uint8x16_t* v1) { *r = vqtbl3q_u8(*v0, *v1); }
void Vqtbl3QP8(poly8x16_t* r, poly8x16x3_t* v0, uint8x16_t* v1) { *r = vqtbl3q_p8(*v0, *v1); }
void Vqtbl4S8(int8x8_t* r, int8x16x4_t* v0, uint8x8_t* v1) { *r = vqtbl4_s8(*v0, *v1); }
void Vqtbl4U8(uint8x8_t* r, uint8x16x4_t* v0, uint8x8_t* v1) { *r = vqtbl4_u8(*v0, *v1); }
void Vqtbl4P8(poly8x8_t* r, poly8x16x4_t* v0, uint8x8_t* v1) { *r = vqtbl4_p8(*v0, *v1); }
void Vqtbl4QS8(int8x16_t* r, int8x16x4_t* v0, uint8x16_t* v1) { *r = vqtbl4q_s8(*v0, *v1); }
void Vqtbl4QU8(uint8x16_t* r, uint8x16x4_t* v0, uint8x16_t* v1) { *r = vqtbl4q_u8(*v0, *v1); }
void Vqtbl4QP8(poly8x16_t* r, poly8x16x4_t* v0, uint8x16_t* v1) { *r = vqtbl4q_p8(*v0, *v1); }
void Vqtbx1S8(int8x8_t* r, int8x8_t* v0, int8x16_t* v1, uint8x8_t* v2) { *r = vqtbx1_s8(*v0, *v1, *v2); }
void Vqtbx1U8(uint8x8_t* r, uint8x8_t* v0, uint8x16_t* v1, uint8x8_t* v2) { *r = vqtbx1_u8(*v0, *v1, *v2); }
void Vqtbx1P8(poly8x8_t* r, poly8x8_t* v0, poly8x16_t* v1, uint8x8_t* v2) { *r = vqtbx1_p8(*v0, *v1, *v2); }
void Vqtbx1QS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1, uint8x16_t* v2) { *r = vqtbx1q_s8(*v0, *v1, *v2); }
void Vqtbx1QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vqtbx1q_u8(*v0, *v1, *v2); }
void Vqtbx1QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16_t* v1, uint8x16_t* v2) { *r = vqtbx1q_p8(*v0, *v1, *v2); }
void Vqtbx2S8(int8x8_t* r, int8x8_t* v0, int8x16x2_t* v1, uint8x8_t* v2) { *r = vqtbx2_s8(*v0, *v1, *v2); }
void Vqtbx2U8(uint8x8_t* r, uint8x8_t* v0, uint8x16x2_t* v1, uint8x8_t* v2) { *r = vqtbx2_u8(*v0, *v1, *v2); }
void Vqtbx2P8(poly8x8_t* r, poly8x8_t* v0, poly8x16x2_t* v1, uint8x8_t* v2) { *r = vqtbx2_p8(*v0, *v1, *v2); }
void Vqtbx2QS8(int8x16_t* r, int8x16_t* v0, int8x16x2_t* v1, uint8x16_t* v2) { *r = vqtbx2q_s8(*v0, *v1, *v2); }
void Vqtbx2QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16x2_t* v1, uint8x16_t* v2) { *r = vqtbx2q_u8(*v0, *v1, *v2); }
void Vqtbx2QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16x2_t* v1, uint8x16_t* v2) { *r = vqtbx2q_p8(*v0, *v1, *v2); }
void Vqtbx3S8(int8x8_t* r, int8x8_t* v0, int8x16x3_t* v1, uint8x8_t* v2) { *r = vqtbx3_s8(*v0, *v1, *v2); }
void Vqtbx3U8(uint8x8_t* r, uint8x8_t* v0, uint8x16x3_t* v1, uint8x8_t* v2) { *r = vqtbx3_u8(*v0, *v1, *v2); }
void Vqtbx3P8(poly8x8_t* r, poly8x8_t* v0, poly8x16x3_t* v1, uint8x8_t* v2) { *r = vqtbx3_p8(*v0, *v1, *v2); }
void Vqtbx3QS8(int8x16_t* r, int8x16_t* v0, int8x16x3_t* v1, uint8x16_t* v2) { *r = vqtbx3q_s8(*v0, *v1, *v2); }
void Vqtbx3QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16x3_t* v1, uint8x16_t* v2) { *r = vqtbx3q_u8(*v0, *v1, *v2); }
void Vqtbx3QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16x3_t* v1, uint8x16_t* v2) { *r = vqtbx3q_p8(*v0, *v1, *v2); }
void Vqtbx4S8(int8x8_t* r, int8x8_t* v0, int8x16x4_t* v1, uint8x8_t* v2) { *r = vqtbx4_s8(*v0, *v1, *v2); }
void Vqtbx4U8(uint8x8_t* r, uint8x8_t* v0, uint8x16x4_t* v1, uint8x8_t* v2) { *r = vqtbx4_u8(*v0, *v1, *v2); }
void Vqtbx4P8(poly8x8_t* r, poly8x8_t* v0, poly8x16x4_t* v1, uint8x8_t* v2) { *r = vqtbx4_p8(*v0, *v1, *v2); }
void Vqtbx4QS8(int8x16_t* r, int8x16_t* v0, int8x16x4_t* v1, uint8x16_t* v2) { *r = vqtbx4q_s8(*v0, *v1, *v2); }
void Vqtbx4QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16x4_t* v1, uint8x16_t* v2) { *r = vqtbx4q_u8(*v0, *v1, *v2); }
void Vqtbx4QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16x4_t* v1, uint8x16_t* v2) { *r = vqtbx4q_p8(*v0, *v1, *v2); }
void VraddhnS16(int8x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vraddhn_s16(*v0, *v1); }
void VraddhnS32(int16x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vraddhn_s32(*v0, *v1); }
void VraddhnS64(int32x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vraddhn_s64(*v0, *v1); }
void VraddhnU16(uint8x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vraddhn_u16(*v0, *v1); }
void VraddhnU32(uint16x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vraddhn_u32(*v0, *v1); }
void VraddhnU64(uint32x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vraddhn_u64(*v0, *v1); }
void VraddhnHighS16(int8x16_t* r, int8x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vraddhn_high_s16(*v0, *v1, *v2); }
void VraddhnHighS32(int16x8_t* r, int16x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vraddhn_high_s32(*v0, *v1, *v2); }
void VraddhnHighS64(int32x4_t* r, int32x2_t* v0, int64x2_t* v1, int64x2_t* v2) { *r = vraddhn_high_s64(*v0, *v1, *v2); }
void VraddhnHighU16(uint8x16_t* r, uint8x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vraddhn_high_u16(*v0, *v1, *v2); }
void VraddhnHighU32(uint16x8_t* r, uint16x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vraddhn_high_u32(*v0, *v1, *v2); }
void VraddhnHighU64(uint32x4_t* r, uint32x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = vraddhn_high_u64(*v0, *v1, *v2); }
void Vrax1QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vrax1q_u64(*v0, *v1); }
void VrbitS8(int8x8_t* r, int8x8_t* v0) { *r = vrbit_s8(*v0); }
void VrbitU8(uint8x8_t* r, uint8x8_t* v0) { *r = vrbit_u8(*v0); }
void VrbitP8(poly8x8_t* r, poly8x8_t* v0) { *r = vrbit_p8(*v0); }
void VrbitqS8(int8x16_t* r, int8x16_t* v0) { *r = vrbitq_s8(*v0); }
void VrbitqU8(uint8x16_t* r, uint8x16_t* v0) { *r = vrbitq_u8(*v0); }
void VrbitqP8(poly8x16_t* r, poly8x16_t* v0) { *r = vrbitq_p8(*v0); }
void VrecpeU32(uint32x2_t* r, uint32x2_t* v0) { *r = vrecpe_u32(*v0); }
void VrecpeF32(float32x2_t* r, float32x2_t* v0) { *r = vrecpe_f32(*v0); }
void VrecpeF64(float64x1_t* r, float64x1_t* v0) { *r = vrecpe_f64(*v0); }
void VrecpedF64(float64_t* r, float64_t* v0) { *r = vrecped_f64(*v0); }
void VrecpeqU32(uint32x4_t* r, uint32x4_t* v0) { *r = vrecpeq_u32(*v0); }
void VrecpeqF32(float32x4_t* r, float32x4_t* v0) { *r = vrecpeq_f32(*v0); }
void VrecpeqF64(float64x2_t* r, float64x2_t* v0) { *r = vrecpeq_f64(*v0); }
void VrecpesF32(float32_t* r, float32_t* v0) { *r = vrecpes_f32(*v0); }
void VrecpsF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vrecps_f32(*v0, *v1); }
void VrecpsF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vrecps_f64(*v0, *v1); }
void VrecpsdF64(float64_t* r, float64_t* v0, float64_t* v1) { *r = vrecpsd_f64(*v0, *v1); }
void VrecpsqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vrecpsq_f32(*v0, *v1); }
void VrecpsqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vrecpsq_f64(*v0, *v1); }
void VrecpssF32(float32_t* r, float32_t* v0, float32_t* v1) { *r = vrecpss_f32(*v0, *v1); }
void VrecpxdF64(float64_t* r, float64_t* v0) { *r = vrecpxd_f64(*v0); }
void VrecpxsF32(float32_t* r, float32_t* v0) { *r = vrecpxs_f32(*v0); }
void VreinterpretF32S8(float32x2_t* r, int8x8_t* v0) { *r = vreinterpret_f32_s8(*v0); }
void VreinterpretF32S16(float32x2_t* r, int16x4_t* v0) { *r = vreinterpret_f32_s16(*v0); }
void VreinterpretF32S32(float32x2_t* r, int32x2_t* v0) { *r = vreinterpret_f32_s32(*v0); }
void VreinterpretF32S64(float32x2_t* r, int64x1_t* v0) { *r = vreinterpret_f32_s64(*v0); }
void VreinterpretF32U8(float32x2_t* r, uint8x8_t* v0) { *r = vreinterpret_f32_u8(*v0); }
void VreinterpretF32U16(float32x2_t* r, uint16x4_t* v0) { *r = vreinterpret_f32_u16(*v0); }
void VreinterpretF32U32(float32x2_t* r, uint32x2_t* v0) { *r = vreinterpret_f32_u32(*v0); }
void VreinterpretF32U64(float32x2_t* r, uint64x1_t* v0) { *r = vreinterpret_f32_u64(*v0); }
void VreinterpretF32F64(float32x2_t* r, float64x1_t* v0) { *r = vreinterpret_f32_f64(*v0); }
void VreinterpretF32P16(float32x2_t* r, poly16x4_t* v0) { *r = vreinterpret_f32_p16(*v0); }
void VreinterpretF32P64(float32x2_t* r, poly64x1_t* v0) { *r = vreinterpret_f32_p64(*v0); }
void VreinterpretF32P8(float32x2_t* r, poly8x8_t* v0) { *r = vreinterpret_f32_p8(*v0); }
void VreinterpretF64S8(float64x1_t* r, int8x8_t* v0) { *r = vreinterpret_f64_s8(*v0); }
void VreinterpretF64S16(float64x1_t* r, int16x4_t* v0) { *r = vreinterpret_f64_s16(*v0); }
void VreinterpretF64S32(float64x1_t* r, int32x2_t* v0) { *r = vreinterpret_f64_s32(*v0); }
void VreinterpretF64S64(float64x1_t* r, int64x1_t* v0) { *r = vreinterpret_f64_s64(*v0); }
void VreinterpretF64U8(float64x1_t* r, uint8x8_t* v0) { *r = vreinterpret_f64_u8(*v0); }
void VreinterpretF64U16(float64x1_t* r, uint16x4_t* v0) { *r = vreinterpret_f64_u16(*v0); }
void VreinterpretF64U32(float64x1_t* r, uint32x2_t* v0) { *r = vreinterpret_f64_u32(*v0); }
void VreinterpretF64U64(float64x1_t* r, uint64x1_t* v0) { *r = vreinterpret_f64_u64(*v0); }
void VreinterpretF64F32(float64x1_t* r, float32x2_t* v0) { *r = vreinterpret_f64_f32(*v0); }
void VreinterpretF64P16(float64x1_t* r, poly16x4_t* v0) { *r = vreinterpret_f64_p16(*v0); }
void VreinterpretF64P64(float64x1_t* r, poly64x1_t* v0) { *r = vreinterpret_f64_p64(*v0); }
void VreinterpretF64P8(float64x1_t* r, poly8x8_t* v0) { *r = vreinterpret_f64_p8(*v0); }
void VreinterpretP16S8(poly16x4_t* r, int8x8_t* v0) { *r = vreinterpret_p16_s8(*v0); }
void VreinterpretP16S16(poly16x4_t* r, int16x4_t* v0) { *r = vreinterpret_p16_s16(*v0); }
void VreinterpretP16S32(poly16x4_t* r, int32x2_t* v0) { *r = vreinterpret_p16_s32(*v0); }
void VreinterpretP16S64(poly16x4_t* r, int64x1_t* v0) { *r = vreinterpret_p16_s64(*v0); }
void VreinterpretP16U8(poly16x4_t* r, uint8x8_t* v0) { *r = vreinterpret_p16_u8(*v0); }
void VreinterpretP16U16(poly16x4_t* r, uint16x4_t* v0) { *r = vreinterpret_p16_u16(*v0); }
void VreinterpretP16U32(poly16x4_t* r, uint32x2_t* v0) { *r = vreinterpret_p16_u32(*v0); }
void VreinterpretP16U64(poly16x4_t* r, uint64x1_t* v0) { *r = vreinterpret_p16_u64(*v0); }
void VreinterpretP16F32(poly16x4_t* r, float32x2_t* v0) { *r = vreinterpret_p16_f32(*v0); }
void VreinterpretP16F64(poly16x4_t* r, float64x1_t* v0) { *r = vreinterpret_p16_f64(*v0); }
void VreinterpretP16P64(poly16x4_t* r, poly64x1_t* v0) { *r = vreinterpret_p16_p64(*v0); }
void VreinterpretP16P8(poly16x4_t* r, poly8x8_t* v0) { *r = vreinterpret_p16_p8(*v0); }
void VreinterpretP64S8(poly64x1_t* r, int8x8_t* v0) { *r = vreinterpret_p64_s8(*v0); }
void VreinterpretP64S16(poly64x1_t* r, int16x4_t* v0) { *r = vreinterpret_p64_s16(*v0); }
void VreinterpretP64S32(poly64x1_t* r, int32x2_t* v0) { *r = vreinterpret_p64_s32(*v0); }
void VreinterpretP64S64(poly64x1_t* r, int64x1_t* v0) { *r = vreinterpret_p64_s64(*v0); }
void VreinterpretP64U8(poly64x1_t* r, uint8x8_t* v0) { *r = vreinterpret_p64_u8(*v0); }
void VreinterpretP64U16(poly64x1_t* r, uint16x4_t* v0) { *r = vreinterpret_p64_u16(*v0); }
void VreinterpretP64U32(poly64x1_t* r, uint32x2_t* v0) { *r = vreinterpret_p64_u32(*v0); }
void VreinterpretP64U64(poly64x1_t* r, uint64x1_t* v0) { *r = vreinterpret_p64_u64(*v0); }
void VreinterpretP64F32(poly64x1_t* r, float32x2_t* v0) { *r = vreinterpret_p64_f32(*v0); }
void VreinterpretP64F64(poly64x1_t* r, float64x1_t* v0) { *r = vreinterpret_p64_f64(*v0); }
void VreinterpretP64P16(poly64x1_t* r, poly16x4_t* v0) { *r = vreinterpret_p64_p16(*v0); }
void VreinterpretP64P8(poly64x1_t* r, poly8x8_t* v0) { *r = vreinterpret_p64_p8(*v0); }
void VreinterpretP8S8(poly8x8_t* r, int8x8_t* v0) { *r = vreinterpret_p8_s8(*v0); }
void VreinterpretP8S16(poly8x8_t* r, int16x4_t* v0) { *r = vreinterpret_p8_s16(*v0); }
void VreinterpretP8S32(poly8x8_t* r, int32x2_t* v0) { *r = vreinterpret_p8_s32(*v0); }
void VreinterpretP8S64(poly8x8_t* r, int64x1_t* v0) { *r = vreinterpret_p8_s64(*v0); }
void VreinterpretP8U8(poly8x8_t* r, uint8x8_t* v0) { *r = vreinterpret_p8_u8(*v0); }
void VreinterpretP8U16(poly8x8_t* r, uint16x4_t* v0) { *r = vreinterpret_p8_u16(*v0); }
void VreinterpretP8U32(poly8x8_t* r, uint32x2_t* v0) { *r = vreinterpret_p8_u32(*v0); }
void VreinterpretP8U64(poly8x8_t* r, uint64x1_t* v0) { *r = vreinterpret_p8_u64(*v0); }
void VreinterpretP8F32(poly8x8_t* r, float32x2_t* v0) { *r = vreinterpret_p8_f32(*v0); }
void VreinterpretP8F64(poly8x8_t* r, float64x1_t* v0) { *r = vreinterpret_p8_f64(*v0); }
void VreinterpretP8P16(poly8x8_t* r, poly16x4_t* v0) { *r = vreinterpret_p8_p16(*v0); }
void VreinterpretP8P64(poly8x8_t* r, poly64x1_t* v0) { *r = vreinterpret_p8_p64(*v0); }
void VreinterpretS16S8(int16x4_t* r, int8x8_t* v0) { *r = vreinterpret_s16_s8(*v0); }
void VreinterpretS16S32(int16x4_t* r, int32x2_t* v0) { *r = vreinterpret_s16_s32(*v0); }
void VreinterpretS16S64(int16x4_t* r, int64x1_t* v0) { *r = vreinterpret_s16_s64(*v0); }
void VreinterpretS16U8(int16x4_t* r, uint8x8_t* v0) { *r = vreinterpret_s16_u8(*v0); }
void VreinterpretS16U16(int16x4_t* r, uint16x4_t* v0) { *r = vreinterpret_s16_u16(*v0); }
void VreinterpretS16U32(int16x4_t* r, uint32x2_t* v0) { *r = vreinterpret_s16_u32(*v0); }
void VreinterpretS16U64(int16x4_t* r, uint64x1_t* v0) { *r = vreinterpret_s16_u64(*v0); }
void VreinterpretS16F32(int16x4_t* r, float32x2_t* v0) { *r = vreinterpret_s16_f32(*v0); }
void VreinterpretS16F64(int16x4_t* r, float64x1_t* v0) { *r = vreinterpret_s16_f64(*v0); }
void VreinterpretS16P16(int16x4_t* r, poly16x4_t* v0) { *r = vreinterpret_s16_p16(*v0); }
void VreinterpretS16P64(int16x4_t* r, poly64x1_t* v0) { *r = vreinterpret_s16_p64(*v0); }
void VreinterpretS16P8(int16x4_t* r, poly8x8_t* v0) { *r = vreinterpret_s16_p8(*v0); }
void VreinterpretS32S8(int32x2_t* r, int8x8_t* v0) { *r = vreinterpret_s32_s8(*v0); }
void VreinterpretS32S16(int32x2_t* r, int16x4_t* v0) { *r = vreinterpret_s32_s16(*v0); }
void VreinterpretS32S64(int32x2_t* r, int64x1_t* v0) { *r = vreinterpret_s32_s64(*v0); }
void VreinterpretS32U8(int32x2_t* r, uint8x8_t* v0) { *r = vreinterpret_s32_u8(*v0); }
void VreinterpretS32U16(int32x2_t* r, uint16x4_t* v0) { *r = vreinterpret_s32_u16(*v0); }
void VreinterpretS32U32(int32x2_t* r, uint32x2_t* v0) { *r = vreinterpret_s32_u32(*v0); }
void VreinterpretS32U64(int32x2_t* r, uint64x1_t* v0) { *r = vreinterpret_s32_u64(*v0); }
void VreinterpretS32F32(int32x2_t* r, float32x2_t* v0) { *r = vreinterpret_s32_f32(*v0); }
void VreinterpretS32F64(int32x2_t* r, float64x1_t* v0) { *r = vreinterpret_s32_f64(*v0); }
void VreinterpretS32P16(int32x2_t* r, poly16x4_t* v0) { *r = vreinterpret_s32_p16(*v0); }
void VreinterpretS32P64(int32x2_t* r, poly64x1_t* v0) { *r = vreinterpret_s32_p64(*v0); }
void VreinterpretS32P8(int32x2_t* r, poly8x8_t* v0) { *r = vreinterpret_s32_p8(*v0); }
void VreinterpretS64S8(int64x1_t* r, int8x8_t* v0) { *r = vreinterpret_s64_s8(*v0); }
void VreinterpretS64S16(int64x1_t* r, int16x4_t* v0) { *r = vreinterpret_s64_s16(*v0); }
void VreinterpretS64S32(int64x1_t* r, int32x2_t* v0) { *r = vreinterpret_s64_s32(*v0); }
void VreinterpretS64U8(int64x1_t* r, uint8x8_t* v0) { *r = vreinterpret_s64_u8(*v0); }
void VreinterpretS64U16(int64x1_t* r, uint16x4_t* v0) { *r = vreinterpret_s64_u16(*v0); }
void VreinterpretS64U32(int64x1_t* r, uint32x2_t* v0) { *r = vreinterpret_s64_u32(*v0); }
void VreinterpretS64U64(int64x1_t* r, uint64x1_t* v0) { *r = vreinterpret_s64_u64(*v0); }
void VreinterpretS64F32(int64x1_t* r, float32x2_t* v0) { *r = vreinterpret_s64_f32(*v0); }
void VreinterpretS64F64(int64x1_t* r, float64x1_t* v0) { *r = vreinterpret_s64_f64(*v0); }
void VreinterpretS64P16(int64x1_t* r, poly16x4_t* v0) { *r = vreinterpret_s64_p16(*v0); }
void VreinterpretS64P64(int64x1_t* r, poly64x1_t* v0) { *r = vreinterpret_s64_p64(*v0); }
void VreinterpretS64P8(int64x1_t* r, poly8x8_t* v0) { *r = vreinterpret_s64_p8(*v0); }
void VreinterpretS8S16(int8x8_t* r, int16x4_t* v0) { *r = vreinterpret_s8_s16(*v0); }
void VreinterpretS8S32(int8x8_t* r, int32x2_t* v0) { *r = vreinterpret_s8_s32(*v0); }
void VreinterpretS8S64(int8x8_t* r, int64x1_t* v0) { *r = vreinterpret_s8_s64(*v0); }
void VreinterpretS8U8(int8x8_t* r, uint8x8_t* v0) { *r = vreinterpret_s8_u8(*v0); }
void VreinterpretS8U16(int8x8_t* r, uint16x4_t* v0) { *r = vreinterpret_s8_u16(*v0); }
void VreinterpretS8U32(int8x8_t* r, uint32x2_t* v0) { *r = vreinterpret_s8_u32(*v0); }
void VreinterpretS8U64(int8x8_t* r, uint64x1_t* v0) { *r = vreinterpret_s8_u64(*v0); }
void VreinterpretS8F32(int8x8_t* r, float32x2_t* v0) { *r = vreinterpret_s8_f32(*v0); }
void VreinterpretS8F64(int8x8_t* r, float64x1_t* v0) { *r = vreinterpret_s8_f64(*v0); }
void VreinterpretS8P16(int8x8_t* r, poly16x4_t* v0) { *r = vreinterpret_s8_p16(*v0); }
void VreinterpretS8P64(int8x8_t* r, poly64x1_t* v0) { *r = vreinterpret_s8_p64(*v0); }
void VreinterpretS8P8(int8x8_t* r, poly8x8_t* v0) { *r = vreinterpret_s8_p8(*v0); }
void VreinterpretU16S8(uint16x4_t* r, int8x8_t* v0) { *r = vreinterpret_u16_s8(*v0); }
void VreinterpretU16S16(uint16x4_t* r, int16x4_t* v0) { *r = vreinterpret_u16_s16(*v0); }
void VreinterpretU16S32(uint16x4_t* r, int32x2_t* v0) { *r = vreinterpret_u16_s32(*v0); }
void VreinterpretU16S64(uint16x4_t* r, int64x1_t* v0) { *r = vreinterpret_u16_s64(*v0); }
void VreinterpretU16U8(uint16x4_t* r, uint8x8_t* v0) { *r = vreinterpret_u16_u8(*v0); }
void VreinterpretU16U32(uint16x4_t* r, uint32x2_t* v0) { *r = vreinterpret_u16_u32(*v0); }
void VreinterpretU16U64(uint16x4_t* r, uint64x1_t* v0) { *r = vreinterpret_u16_u64(*v0); }
void VreinterpretU16F32(uint16x4_t* r, float32x2_t* v0) { *r = vreinterpret_u16_f32(*v0); }
void VreinterpretU16F64(uint16x4_t* r, float64x1_t* v0) { *r = vreinterpret_u16_f64(*v0); }
void VreinterpretU16P16(uint16x4_t* r, poly16x4_t* v0) { *r = vreinterpret_u16_p16(*v0); }
void VreinterpretU16P64(uint16x4_t* r, poly64x1_t* v0) { *r = vreinterpret_u16_p64(*v0); }
void VreinterpretU16P8(uint16x4_t* r, poly8x8_t* v0) { *r = vreinterpret_u16_p8(*v0); }
void VreinterpretU32S8(uint32x2_t* r, int8x8_t* v0) { *r = vreinterpret_u32_s8(*v0); }
void VreinterpretU32S16(uint32x2_t* r, int16x4_t* v0) { *r = vreinterpret_u32_s16(*v0); }
void VreinterpretU32S32(uint32x2_t* r, int32x2_t* v0) { *r = vreinterpret_u32_s32(*v0); }
void VreinterpretU32S64(uint32x2_t* r, int64x1_t* v0) { *r = vreinterpret_u32_s64(*v0); }
void VreinterpretU32U8(uint32x2_t* r, uint8x8_t* v0) { *r = vreinterpret_u32_u8(*v0); }
void VreinterpretU32U16(uint32x2_t* r, uint16x4_t* v0) { *r = vreinterpret_u32_u16(*v0); }
void VreinterpretU32U64(uint32x2_t* r, uint64x1_t* v0) { *r = vreinterpret_u32_u64(*v0); }
void VreinterpretU32F32(uint32x2_t* r, float32x2_t* v0) { *r = vreinterpret_u32_f32(*v0); }
void VreinterpretU32F64(uint32x2_t* r, float64x1_t* v0) { *r = vreinterpret_u32_f64(*v0); }
void VreinterpretU32P16(uint32x2_t* r, poly16x4_t* v0) { *r = vreinterpret_u32_p16(*v0); }
void VreinterpretU32P64(uint32x2_t* r, poly64x1_t* v0) { *r = vreinterpret_u32_p64(*v0); }
void VreinterpretU32P8(uint32x2_t* r, poly8x8_t* v0) { *r = vreinterpret_u32_p8(*v0); }
void VreinterpretU64S8(uint64x1_t* r, int8x8_t* v0) { *r = vreinterpret_u64_s8(*v0); }
void VreinterpretU64S16(uint64x1_t* r, int16x4_t* v0) { *r = vreinterpret_u64_s16(*v0); }
void VreinterpretU64S32(uint64x1_t* r, int32x2_t* v0) { *r = vreinterpret_u64_s32(*v0); }
void VreinterpretU64S64(uint64x1_t* r, int64x1_t* v0) { *r = vreinterpret_u64_s64(*v0); }
void VreinterpretU64U8(uint64x1_t* r, uint8x8_t* v0) { *r = vreinterpret_u64_u8(*v0); }
void VreinterpretU64U16(uint64x1_t* r, uint16x4_t* v0) { *r = vreinterpret_u64_u16(*v0); }
void VreinterpretU64U32(uint64x1_t* r, uint32x2_t* v0) { *r = vreinterpret_u64_u32(*v0); }
void VreinterpretU64F32(uint64x1_t* r, float32x2_t* v0) { *r = vreinterpret_u64_f32(*v0); }
void VreinterpretU64F64(uint64x1_t* r, float64x1_t* v0) { *r = vreinterpret_u64_f64(*v0); }
void VreinterpretU64P16(uint64x1_t* r, poly16x4_t* v0) { *r = vreinterpret_u64_p16(*v0); }
void VreinterpretU64P64(uint64x1_t* r, poly64x1_t* v0) { *r = vreinterpret_u64_p64(*v0); }
void VreinterpretU64P8(uint64x1_t* r, poly8x8_t* v0) { *r = vreinterpret_u64_p8(*v0); }
void VreinterpretU8S8(uint8x8_t* r, int8x8_t* v0) { *r = vreinterpret_u8_s8(*v0); }
void VreinterpretU8S16(uint8x8_t* r, int16x4_t* v0) { *r = vreinterpret_u8_s16(*v0); }
void VreinterpretU8S32(uint8x8_t* r, int32x2_t* v0) { *r = vreinterpret_u8_s32(*v0); }
void VreinterpretU8S64(uint8x8_t* r, int64x1_t* v0) { *r = vreinterpret_u8_s64(*v0); }
void VreinterpretU8U16(uint8x8_t* r, uint16x4_t* v0) { *r = vreinterpret_u8_u16(*v0); }
void VreinterpretU8U32(uint8x8_t* r, uint32x2_t* v0) { *r = vreinterpret_u8_u32(*v0); }
void VreinterpretU8U64(uint8x8_t* r, uint64x1_t* v0) { *r = vreinterpret_u8_u64(*v0); }
void VreinterpretU8F32(uint8x8_t* r, float32x2_t* v0) { *r = vreinterpret_u8_f32(*v0); }
void VreinterpretU8F64(uint8x8_t* r, float64x1_t* v0) { *r = vreinterpret_u8_f64(*v0); }
void VreinterpretU8P16(uint8x8_t* r, poly16x4_t* v0) { *r = vreinterpret_u8_p16(*v0); }
void VreinterpretU8P64(uint8x8_t* r, poly64x1_t* v0) { *r = vreinterpret_u8_p64(*v0); }
void VreinterpretU8P8(uint8x8_t* r, poly8x8_t* v0) { *r = vreinterpret_u8_p8(*v0); }
void VreinterpretqF32S8(float32x4_t* r, int8x16_t* v0) { *r = vreinterpretq_f32_s8(*v0); }
void VreinterpretqF32S16(float32x4_t* r, int16x8_t* v0) { *r = vreinterpretq_f32_s16(*v0); }
void VreinterpretqF32S32(float32x4_t* r, int32x4_t* v0) { *r = vreinterpretq_f32_s32(*v0); }
void VreinterpretqF32S64(float32x4_t* r, int64x2_t* v0) { *r = vreinterpretq_f32_s64(*v0); }
void VreinterpretqF32U8(float32x4_t* r, uint8x16_t* v0) { *r = vreinterpretq_f32_u8(*v0); }
void VreinterpretqF32U16(float32x4_t* r, uint16x8_t* v0) { *r = vreinterpretq_f32_u16(*v0); }
void VreinterpretqF32U32(float32x4_t* r, uint32x4_t* v0) { *r = vreinterpretq_f32_u32(*v0); }
void VreinterpretqF32U64(float32x4_t* r, uint64x2_t* v0) { *r = vreinterpretq_f32_u64(*v0); }
void VreinterpretqF32F64(float32x4_t* r, float64x2_t* v0) { *r = vreinterpretq_f32_f64(*v0); }
void VreinterpretqF32P128(float32x4_t* r, poly128_t* v0) { *r = vreinterpretq_f32_p128(*v0); }
void VreinterpretqF32P16(float32x4_t* r, poly16x8_t* v0) { *r = vreinterpretq_f32_p16(*v0); }
void VreinterpretqF32P64(float32x4_t* r, poly64x2_t* v0) { *r = vreinterpretq_f32_p64(*v0); }
void VreinterpretqF32P8(float32x4_t* r, poly8x16_t* v0) { *r = vreinterpretq_f32_p8(*v0); }
void VreinterpretqF64S8(float64x2_t* r, int8x16_t* v0) { *r = vreinterpretq_f64_s8(*v0); }
void VreinterpretqF64S16(float64x2_t* r, int16x8_t* v0) { *r = vreinterpretq_f64_s16(*v0); }
void VreinterpretqF64S32(float64x2_t* r, int32x4_t* v0) { *r = vreinterpretq_f64_s32(*v0); }
void VreinterpretqF64S64(float64x2_t* r, int64x2_t* v0) { *r = vreinterpretq_f64_s64(*v0); }
void VreinterpretqF64U8(float64x2_t* r, uint8x16_t* v0) { *r = vreinterpretq_f64_u8(*v0); }
void VreinterpretqF64U16(float64x2_t* r, uint16x8_t* v0) { *r = vreinterpretq_f64_u16(*v0); }
void VreinterpretqF64U32(float64x2_t* r, uint32x4_t* v0) { *r = vreinterpretq_f64_u32(*v0); }
void VreinterpretqF64U64(float64x2_t* r, uint64x2_t* v0) { *r = vreinterpretq_f64_u64(*v0); }
void VreinterpretqF64F32(float64x2_t* r, float32x4_t* v0) { *r = vreinterpretq_f64_f32(*v0); }
void VreinterpretqF64P128(float64x2_t* r, poly128_t* v0) { *r = vreinterpretq_f64_p128(*v0); }
void VreinterpretqF64P16(float64x2_t* r, poly16x8_t* v0) { *r = vreinterpretq_f64_p16(*v0); }
void VreinterpretqF64P64(float64x2_t* r, poly64x2_t* v0) { *r = vreinterpretq_f64_p64(*v0); }
void VreinterpretqF64P8(float64x2_t* r, poly8x16_t* v0) { *r = vreinterpretq_f64_p8(*v0); }
void VreinterpretqP128S8(poly128_t* r, int8x16_t* v0) { *r = vreinterpretq_p128_s8(*v0); }
void VreinterpretqP128S16(poly128_t* r, int16x8_t* v0) { *r = vreinterpretq_p128_s16(*v0); }
void VreinterpretqP128S32(poly128_t* r, int32x4_t* v0) { *r = vreinterpretq_p128_s32(*v0); }
void VreinterpretqP128S64(poly128_t* r, int64x2_t* v0) { *r = vreinterpretq_p128_s64(*v0); }
void VreinterpretqP128U8(poly128_t* r, uint8x16_t* v0) { *r = vreinterpretq_p128_u8(*v0); }
void VreinterpretqP128U16(poly128_t* r, uint16x8_t* v0) { *r = vreinterpretq_p128_u16(*v0); }
void VreinterpretqP128U32(poly128_t* r, uint32x4_t* v0) { *r = vreinterpretq_p128_u32(*v0); }
void VreinterpretqP128U64(poly128_t* r, uint64x2_t* v0) { *r = vreinterpretq_p128_u64(*v0); }
void VreinterpretqP128F32(poly128_t* r, float32x4_t* v0) { *r = vreinterpretq_p128_f32(*v0); }
void VreinterpretqP128F64(poly128_t* r, float64x2_t* v0) { *r = vreinterpretq_p128_f64(*v0); }
void VreinterpretqP128P16(poly128_t* r, poly16x8_t* v0) { *r = vreinterpretq_p128_p16(*v0); }
void VreinterpretqP128P64(poly128_t* r, poly64x2_t* v0) { *r = vreinterpretq_p128_p64(*v0); }
void VreinterpretqP128P8(poly128_t* r, poly8x16_t* v0) { *r = vreinterpretq_p128_p8(*v0); }
void VreinterpretqP16S8(poly16x8_t* r, int8x16_t* v0) { *r = vreinterpretq_p16_s8(*v0); }
void VreinterpretqP16S16(poly16x8_t* r, int16x8_t* v0) { *r = vreinterpretq_p16_s16(*v0); }
void VreinterpretqP16S32(poly16x8_t* r, int32x4_t* v0) { *r = vreinterpretq_p16_s32(*v0); }
void VreinterpretqP16S64(poly16x8_t* r, int64x2_t* v0) { *r = vreinterpretq_p16_s64(*v0); }
void VreinterpretqP16U8(poly16x8_t* r, uint8x16_t* v0) { *r = vreinterpretq_p16_u8(*v0); }
void VreinterpretqP16U16(poly16x8_t* r, uint16x8_t* v0) { *r = vreinterpretq_p16_u16(*v0); }
void VreinterpretqP16U32(poly16x8_t* r, uint32x4_t* v0) { *r = vreinterpretq_p16_u32(*v0); }
void VreinterpretqP16U64(poly16x8_t* r, uint64x2_t* v0) { *r = vreinterpretq_p16_u64(*v0); }
void VreinterpretqP16F32(poly16x8_t* r, float32x4_t* v0) { *r = vreinterpretq_p16_f32(*v0); }
void VreinterpretqP16F64(poly16x8_t* r, float64x2_t* v0) { *r = vreinterpretq_p16_f64(*v0); }
void VreinterpretqP16P128(poly16x8_t* r, poly128_t* v0) { *r = vreinterpretq_p16_p128(*v0); }
void VreinterpretqP16P64(poly16x8_t* r, poly64x2_t* v0) { *r = vreinterpretq_p16_p64(*v0); }
void VreinterpretqP16P8(poly16x8_t* r, poly8x16_t* v0) { *r = vreinterpretq_p16_p8(*v0); }
void VreinterpretqP64S8(poly64x2_t* r, int8x16_t* v0) { *r = vreinterpretq_p64_s8(*v0); }
void VreinterpretqP64S16(poly64x2_t* r, int16x8_t* v0) { *r = vreinterpretq_p64_s16(*v0); }
void VreinterpretqP64S32(poly64x2_t* r, int32x4_t* v0) { *r = vreinterpretq_p64_s32(*v0); }
void VreinterpretqP64S64(poly64x2_t* r, int64x2_t* v0) { *r = vreinterpretq_p64_s64(*v0); }
void VreinterpretqP64U8(poly64x2_t* r, uint8x16_t* v0) { *r = vreinterpretq_p64_u8(*v0); }
void VreinterpretqP64U16(poly64x2_t* r, uint16x8_t* v0) { *r = vreinterpretq_p64_u16(*v0); }
void VreinterpretqP64U32(poly64x2_t* r, uint32x4_t* v0) { *r = vreinterpretq_p64_u32(*v0); }
void VreinterpretqP64U64(poly64x2_t* r, uint64x2_t* v0) { *r = vreinterpretq_p64_u64(*v0); }
void VreinterpretqP64F32(poly64x2_t* r, float32x4_t* v0) { *r = vreinterpretq_p64_f32(*v0); }
void VreinterpretqP64F64(poly64x2_t* r, float64x2_t* v0) { *r = vreinterpretq_p64_f64(*v0); }
void VreinterpretqP64P128(poly64x2_t* r, poly128_t* v0) { *r = vreinterpretq_p64_p128(*v0); }
void VreinterpretqP64P16(poly64x2_t* r, poly16x8_t* v0) { *r = vreinterpretq_p64_p16(*v0); }
void VreinterpretqP64P8(poly64x2_t* r, poly8x16_t* v0) { *r = vreinterpretq_p64_p8(*v0); }
void VreinterpretqP8S8(poly8x16_t* r, int8x16_t* v0) { *r = vreinterpretq_p8_s8(*v0); }
void VreinterpretqP8S16(poly8x16_t* r, int16x8_t* v0) { *r = vreinterpretq_p8_s16(*v0); }
void VreinterpretqP8S32(poly8x16_t* r, int32x4_t* v0) { *r = vreinterpretq_p8_s32(*v0); }
void VreinterpretqP8S64(poly8x16_t* r, int64x2_t* v0) { *r = vreinterpretq_p8_s64(*v0); }
void VreinterpretqP8U8(poly8x16_t* r, uint8x16_t* v0) { *r = vreinterpretq_p8_u8(*v0); }
void VreinterpretqP8U16(poly8x16_t* r, uint16x8_t* v0) { *r = vreinterpretq_p8_u16(*v0); }
void VreinterpretqP8U32(poly8x16_t* r, uint32x4_t* v0) { *r = vreinterpretq_p8_u32(*v0); }
void VreinterpretqP8U64(poly8x16_t* r, uint64x2_t* v0) { *r = vreinterpretq_p8_u64(*v0); }
void VreinterpretqP8F32(poly8x16_t* r, float32x4_t* v0) { *r = vreinterpretq_p8_f32(*v0); }
void VreinterpretqP8F64(poly8x16_t* r, float64x2_t* v0) { *r = vreinterpretq_p8_f64(*v0); }
void VreinterpretqP8P128(poly8x16_t* r, poly128_t* v0) { *r = vreinterpretq_p8_p128(*v0); }
void VreinterpretqP8P16(poly8x16_t* r, poly16x8_t* v0) { *r = vreinterpretq_p8_p16(*v0); }
void VreinterpretqP8P64(poly8x16_t* r, poly64x2_t* v0) { *r = vreinterpretq_p8_p64(*v0); }
void VreinterpretqS16S8(int16x8_t* r, int8x16_t* v0) { *r = vreinterpretq_s16_s8(*v0); }
void VreinterpretqS16S32(int16x8_t* r, int32x4_t* v0) { *r = vreinterpretq_s16_s32(*v0); }
void VreinterpretqS16S64(int16x8_t* r, int64x2_t* v0) { *r = vreinterpretq_s16_s64(*v0); }
void VreinterpretqS16U8(int16x8_t* r, uint8x16_t* v0) { *r = vreinterpretq_s16_u8(*v0); }
void VreinterpretqS16U16(int16x8_t* r, uint16x8_t* v0) { *r = vreinterpretq_s16_u16(*v0); }
void VreinterpretqS16U32(int16x8_t* r, uint32x4_t* v0) { *r = vreinterpretq_s16_u32(*v0); }
void VreinterpretqS16U64(int16x8_t* r, uint64x2_t* v0) { *r = vreinterpretq_s16_u64(*v0); }
void VreinterpretqS16F32(int16x8_t* r, float32x4_t* v0) { *r = vreinterpretq_s16_f32(*v0); }
void VreinterpretqS16F64(int16x8_t* r, float64x2_t* v0) { *r = vreinterpretq_s16_f64(*v0); }
void VreinterpretqS16P128(int16x8_t* r, poly128_t* v0) { *r = vreinterpretq_s16_p128(*v0); }
void VreinterpretqS16P16(int16x8_t* r, poly16x8_t* v0) { *r = vreinterpretq_s16_p16(*v0); }
void VreinterpretqS16P64(int16x8_t* r, poly64x2_t* v0) { *r = vreinterpretq_s16_p64(*v0); }
void VreinterpretqS16P8(int16x8_t* r, poly8x16_t* v0) { *r = vreinterpretq_s16_p8(*v0); }
void VreinterpretqS32S8(int32x4_t* r, int8x16_t* v0) { *r = vreinterpretq_s32_s8(*v0); }
void VreinterpretqS32S16(int32x4_t* r, int16x8_t* v0) { *r = vreinterpretq_s32_s16(*v0); }
void VreinterpretqS32S64(int32x4_t* r, int64x2_t* v0) { *r = vreinterpretq_s32_s64(*v0); }
void VreinterpretqS32U8(int32x4_t* r, uint8x16_t* v0) { *r = vreinterpretq_s32_u8(*v0); }
void VreinterpretqS32U16(int32x4_t* r, uint16x8_t* v0) { *r = vreinterpretq_s32_u16(*v0); }
void VreinterpretqS32U32(int32x4_t* r, uint32x4_t* v0) { *r = vreinterpretq_s32_u32(*v0); }
void VreinterpretqS32U64(int32x4_t* r, uint64x2_t* v0) { *r = vreinterpretq_s32_u64(*v0); }
void VreinterpretqS32F32(int32x4_t* r, float32x4_t* v0) { *r = vreinterpretq_s32_f32(*v0); }
void VreinterpretqS32F64(int32x4_t* r, float64x2_t* v0) { *r = vreinterpretq_s32_f64(*v0); }
void VreinterpretqS32P128(int32x4_t* r, poly128_t* v0) { *r = vreinterpretq_s32_p128(*v0); }
void VreinterpretqS32P16(int32x4_t* r, poly16x8_t* v0) { *r = vreinterpretq_s32_p16(*v0); }
void VreinterpretqS32P64(int32x4_t* r, poly64x2_t* v0) { *r = vreinterpretq_s32_p64(*v0); }
void VreinterpretqS32P8(int32x4_t* r, poly8x16_t* v0) { *r = vreinterpretq_s32_p8(*v0); }
void VreinterpretqS64S8(int64x2_t* r, int8x16_t* v0) { *r = vreinterpretq_s64_s8(*v0); }
void VreinterpretqS64S16(int64x2_t* r, int16x8_t* v0) { *r = vreinterpretq_s64_s16(*v0); }
void VreinterpretqS64S32(int64x2_t* r, int32x4_t* v0) { *r = vreinterpretq_s64_s32(*v0); }
void VreinterpretqS64U8(int64x2_t* r, uint8x16_t* v0) { *r = vreinterpretq_s64_u8(*v0); }
void VreinterpretqS64U16(int64x2_t* r, uint16x8_t* v0) { *r = vreinterpretq_s64_u16(*v0); }
void VreinterpretqS64U32(int64x2_t* r, uint32x4_t* v0) { *r = vreinterpretq_s64_u32(*v0); }
void VreinterpretqS64U64(int64x2_t* r, uint64x2_t* v0) { *r = vreinterpretq_s64_u64(*v0); }
void VreinterpretqS64F32(int64x2_t* r, float32x4_t* v0) { *r = vreinterpretq_s64_f32(*v0); }
void VreinterpretqS64F64(int64x2_t* r, float64x2_t* v0) { *r = vreinterpretq_s64_f64(*v0); }
void VreinterpretqS64P128(int64x2_t* r, poly128_t* v0) { *r = vreinterpretq_s64_p128(*v0); }
void VreinterpretqS64P16(int64x2_t* r, poly16x8_t* v0) { *r = vreinterpretq_s64_p16(*v0); }
void VreinterpretqS64P64(int64x2_t* r, poly64x2_t* v0) { *r = vreinterpretq_s64_p64(*v0); }
void VreinterpretqS64P8(int64x2_t* r, poly8x16_t* v0) { *r = vreinterpretq_s64_p8(*v0); }
void VreinterpretqS8S16(int8x16_t* r, int16x8_t* v0) { *r = vreinterpretq_s8_s16(*v0); }
void VreinterpretqS8S32(int8x16_t* r, int32x4_t* v0) { *r = vreinterpretq_s8_s32(*v0); }
void VreinterpretqS8S64(int8x16_t* r, int64x2_t* v0) { *r = vreinterpretq_s8_s64(*v0); }
void VreinterpretqS8U8(int8x16_t* r, uint8x16_t* v0) { *r = vreinterpretq_s8_u8(*v0); }
void VreinterpretqS8U16(int8x16_t* r, uint16x8_t* v0) { *r = vreinterpretq_s8_u16(*v0); }
void VreinterpretqS8U32(int8x16_t* r, uint32x4_t* v0) { *r = vreinterpretq_s8_u32(*v0); }
void VreinterpretqS8U64(int8x16_t* r, uint64x2_t* v0) { *r = vreinterpretq_s8_u64(*v0); }
void VreinterpretqS8F32(int8x16_t* r, float32x4_t* v0) { *r = vreinterpretq_s8_f32(*v0); }
void VreinterpretqS8F64(int8x16_t* r, float64x2_t* v0) { *r = vreinterpretq_s8_f64(*v0); }
void VreinterpretqS8P128(int8x16_t* r, poly128_t* v0) { *r = vreinterpretq_s8_p128(*v0); }
void VreinterpretqS8P16(int8x16_t* r, poly16x8_t* v0) { *r = vreinterpretq_s8_p16(*v0); }
void VreinterpretqS8P64(int8x16_t* r, poly64x2_t* v0) { *r = vreinterpretq_s8_p64(*v0); }
void VreinterpretqS8P8(int8x16_t* r, poly8x16_t* v0) { *r = vreinterpretq_s8_p8(*v0); }
void VreinterpretqU16S8(uint16x8_t* r, int8x16_t* v0) { *r = vreinterpretq_u16_s8(*v0); }
void VreinterpretqU16S16(uint16x8_t* r, int16x8_t* v0) { *r = vreinterpretq_u16_s16(*v0); }
void VreinterpretqU16S32(uint16x8_t* r, int32x4_t* v0) { *r = vreinterpretq_u16_s32(*v0); }
void VreinterpretqU16S64(uint16x8_t* r, int64x2_t* v0) { *r = vreinterpretq_u16_s64(*v0); }
void VreinterpretqU16U8(uint16x8_t* r, uint8x16_t* v0) { *r = vreinterpretq_u16_u8(*v0); }
void VreinterpretqU16U32(uint16x8_t* r, uint32x4_t* v0) { *r = vreinterpretq_u16_u32(*v0); }
void VreinterpretqU16U64(uint16x8_t* r, uint64x2_t* v0) { *r = vreinterpretq_u16_u64(*v0); }
void VreinterpretqU16F32(uint16x8_t* r, float32x4_t* v0) { *r = vreinterpretq_u16_f32(*v0); }
void VreinterpretqU16F64(uint16x8_t* r, float64x2_t* v0) { *r = vreinterpretq_u16_f64(*v0); }
void VreinterpretqU16P128(uint16x8_t* r, poly128_t* v0) { *r = vreinterpretq_u16_p128(*v0); }
void VreinterpretqU16P16(uint16x8_t* r, poly16x8_t* v0) { *r = vreinterpretq_u16_p16(*v0); }
void VreinterpretqU16P64(uint16x8_t* r, poly64x2_t* v0) { *r = vreinterpretq_u16_p64(*v0); }
void VreinterpretqU16P8(uint16x8_t* r, poly8x16_t* v0) { *r = vreinterpretq_u16_p8(*v0); }
void VreinterpretqU32S8(uint32x4_t* r, int8x16_t* v0) { *r = vreinterpretq_u32_s8(*v0); }
void VreinterpretqU32S16(uint32x4_t* r, int16x8_t* v0) { *r = vreinterpretq_u32_s16(*v0); }
void VreinterpretqU32S32(uint32x4_t* r, int32x4_t* v0) { *r = vreinterpretq_u32_s32(*v0); }
void VreinterpretqU32S64(uint32x4_t* r, int64x2_t* v0) { *r = vreinterpretq_u32_s64(*v0); }
void VreinterpretqU32U8(uint32x4_t* r, uint8x16_t* v0) { *r = vreinterpretq_u32_u8(*v0); }
void VreinterpretqU32U16(uint32x4_t* r, uint16x8_t* v0) { *r = vreinterpretq_u32_u16(*v0); }
void VreinterpretqU32U64(uint32x4_t* r, uint64x2_t* v0) { *r = vreinterpretq_u32_u64(*v0); }
void VreinterpretqU32F32(uint32x4_t* r, float32x4_t* v0) { *r = vreinterpretq_u32_f32(*v0); }
void VreinterpretqU32F64(uint32x4_t* r, float64x2_t* v0) { *r = vreinterpretq_u32_f64(*v0); }
void VreinterpretqU32P128(uint32x4_t* r, poly128_t* v0) { *r = vreinterpretq_u32_p128(*v0); }
void VreinterpretqU32P16(uint32x4_t* r, poly16x8_t* v0) { *r = vreinterpretq_u32_p16(*v0); }
void VreinterpretqU32P64(uint32x4_t* r, poly64x2_t* v0) { *r = vreinterpretq_u32_p64(*v0); }
void VreinterpretqU32P8(uint32x4_t* r, poly8x16_t* v0) { *r = vreinterpretq_u32_p8(*v0); }
void VreinterpretqU64S8(uint64x2_t* r, int8x16_t* v0) { *r = vreinterpretq_u64_s8(*v0); }
void VreinterpretqU64S16(uint64x2_t* r, int16x8_t* v0) { *r = vreinterpretq_u64_s16(*v0); }
void VreinterpretqU64S32(uint64x2_t* r, int32x4_t* v0) { *r = vreinterpretq_u64_s32(*v0); }
void VreinterpretqU64S64(uint64x2_t* r, int64x2_t* v0) { *r = vreinterpretq_u64_s64(*v0); }
void VreinterpretqU64U8(uint64x2_t* r, uint8x16_t* v0) { *r = vreinterpretq_u64_u8(*v0); }
void VreinterpretqU64U16(uint64x2_t* r, uint16x8_t* v0) { *r = vreinterpretq_u64_u16(*v0); }
void VreinterpretqU64U32(uint64x2_t* r, uint32x4_t* v0) { *r = vreinterpretq_u64_u32(*v0); }
void VreinterpretqU64F32(uint64x2_t* r, float32x4_t* v0) { *r = vreinterpretq_u64_f32(*v0); }
void VreinterpretqU64F64(uint64x2_t* r, float64x2_t* v0) { *r = vreinterpretq_u64_f64(*v0); }
void VreinterpretqU64P128(uint64x2_t* r, poly128_t* v0) { *r = vreinterpretq_u64_p128(*v0); }
void VreinterpretqU64P16(uint64x2_t* r, poly16x8_t* v0) { *r = vreinterpretq_u64_p16(*v0); }
void VreinterpretqU64P64(uint64x2_t* r, poly64x2_t* v0) { *r = vreinterpretq_u64_p64(*v0); }
void VreinterpretqU64P8(uint64x2_t* r, poly8x16_t* v0) { *r = vreinterpretq_u64_p8(*v0); }
void VreinterpretqU8S8(uint8x16_t* r, int8x16_t* v0) { *r = vreinterpretq_u8_s8(*v0); }
void VreinterpretqU8S16(uint8x16_t* r, int16x8_t* v0) { *r = vreinterpretq_u8_s16(*v0); }
void VreinterpretqU8S32(uint8x16_t* r, int32x4_t* v0) { *r = vreinterpretq_u8_s32(*v0); }
void VreinterpretqU8S64(uint8x16_t* r, int64x2_t* v0) { *r = vreinterpretq_u8_s64(*v0); }
void VreinterpretqU8U16(uint8x16_t* r, uint16x8_t* v0) { *r = vreinterpretq_u8_u16(*v0); }
void VreinterpretqU8U32(uint8x16_t* r, uint32x4_t* v0) { *r = vreinterpretq_u8_u32(*v0); }
void VreinterpretqU8U64(uint8x16_t* r, uint64x2_t* v0) { *r = vreinterpretq_u8_u64(*v0); }
void VreinterpretqU8F32(uint8x16_t* r, float32x4_t* v0) { *r = vreinterpretq_u8_f32(*v0); }
void VreinterpretqU8F64(uint8x16_t* r, float64x2_t* v0) { *r = vreinterpretq_u8_f64(*v0); }
void VreinterpretqU8P128(uint8x16_t* r, poly128_t* v0) { *r = vreinterpretq_u8_p128(*v0); }
void VreinterpretqU8P16(uint8x16_t* r, poly16x8_t* v0) { *r = vreinterpretq_u8_p16(*v0); }
void VreinterpretqU8P64(uint8x16_t* r, poly64x2_t* v0) { *r = vreinterpretq_u8_p64(*v0); }
void VreinterpretqU8P8(uint8x16_t* r, poly8x16_t* v0) { *r = vreinterpretq_u8_p8(*v0); }
void Vrev16S8(int8x8_t* r, int8x8_t* v0) { *r = vrev16_s8(*v0); }
void Vrev16U8(uint8x8_t* r, uint8x8_t* v0) { *r = vrev16_u8(*v0); }
void Vrev16P8(poly8x8_t* r, poly8x8_t* v0) { *r = vrev16_p8(*v0); }
void Vrev16QS8(int8x16_t* r, int8x16_t* v0) { *r = vrev16q_s8(*v0); }
void Vrev16QU8(uint8x16_t* r, uint8x16_t* v0) { *r = vrev16q_u8(*v0); }
void Vrev16QP8(poly8x16_t* r, poly8x16_t* v0) { *r = vrev16q_p8(*v0); }
void Vrev32S8(int8x8_t* r, int8x8_t* v0) { *r = vrev32_s8(*v0); }
void Vrev32S16(int16x4_t* r, int16x4_t* v0) { *r = vrev32_s16(*v0); }
void Vrev32U8(uint8x8_t* r, uint8x8_t* v0) { *r = vrev32_u8(*v0); }
void Vrev32U16(uint16x4_t* r, uint16x4_t* v0) { *r = vrev32_u16(*v0); }
void Vrev32P16(poly16x4_t* r, poly16x4_t* v0) { *r = vrev32_p16(*v0); }
void Vrev32P8(poly8x8_t* r, poly8x8_t* v0) { *r = vrev32_p8(*v0); }
void Vrev32QS8(int8x16_t* r, int8x16_t* v0) { *r = vrev32q_s8(*v0); }
void Vrev32QS16(int16x8_t* r, int16x8_t* v0) { *r = vrev32q_s16(*v0); }
void Vrev32QU8(uint8x16_t* r, uint8x16_t* v0) { *r = vrev32q_u8(*v0); }
void Vrev32QU16(uint16x8_t* r, uint16x8_t* v0) { *r = vrev32q_u16(*v0); }
void Vrev32QP16(poly16x8_t* r, poly16x8_t* v0) { *r = vrev32q_p16(*v0); }
void Vrev32QP8(poly8x16_t* r, poly8x16_t* v0) { *r = vrev32q_p8(*v0); }
void Vrev64S8(int8x8_t* r, int8x8_t* v0) { *r = vrev64_s8(*v0); }
void Vrev64S16(int16x4_t* r, int16x4_t* v0) { *r = vrev64_s16(*v0); }
void Vrev64S32(int32x2_t* r, int32x2_t* v0) { *r = vrev64_s32(*v0); }
void Vrev64U8(uint8x8_t* r, uint8x8_t* v0) { *r = vrev64_u8(*v0); }
void Vrev64U16(uint16x4_t* r, uint16x4_t* v0) { *r = vrev64_u16(*v0); }
void Vrev64U32(uint32x2_t* r, uint32x2_t* v0) { *r = vrev64_u32(*v0); }
void Vrev64F32(float32x2_t* r, float32x2_t* v0) { *r = vrev64_f32(*v0); }
void Vrev64P16(poly16x4_t* r, poly16x4_t* v0) { *r = vrev64_p16(*v0); }
void Vrev64P8(poly8x8_t* r, poly8x8_t* v0) { *r = vrev64_p8(*v0); }
void Vrev64QS8(int8x16_t* r, int8x16_t* v0) { *r = vrev64q_s8(*v0); }
void Vrev64QS16(int16x8_t* r, int16x8_t* v0) { *r = vrev64q_s16(*v0); }
void Vrev64QS32(int32x4_t* r, int32x4_t* v0) { *r = vrev64q_s32(*v0); }
void Vrev64QU8(uint8x16_t* r, uint8x16_t* v0) { *r = vrev64q_u8(*v0); }
void Vrev64QU16(uint16x8_t* r, uint16x8_t* v0) { *r = vrev64q_u16(*v0); }
void Vrev64QU32(uint32x4_t* r, uint32x4_t* v0) { *r = vrev64q_u32(*v0); }
void Vrev64QF32(float32x4_t* r, float32x4_t* v0) { *r = vrev64q_f32(*v0); }
void Vrev64QP16(poly16x8_t* r, poly16x8_t* v0) { *r = vrev64q_p16(*v0); }
void Vrev64QP8(poly8x16_t* r, poly8x16_t* v0) { *r = vrev64q_p8(*v0); }
void VrhaddS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vrhadd_s8(*v0, *v1); }
void VrhaddS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vrhadd_s16(*v0, *v1); }
void VrhaddS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vrhadd_s32(*v0, *v1); }
void VrhaddU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vrhadd_u8(*v0, *v1); }
void VrhaddU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vrhadd_u16(*v0, *v1); }
void VrhaddU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vrhadd_u32(*v0, *v1); }
void VrhaddqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vrhaddq_s8(*v0, *v1); }
void VrhaddqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vrhaddq_s16(*v0, *v1); }
void VrhaddqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vrhaddq_s32(*v0, *v1); }
void VrhaddqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vrhaddq_u8(*v0, *v1); }
void VrhaddqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vrhaddq_u16(*v0, *v1); }
void VrhaddqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vrhaddq_u32(*v0, *v1); }
void VrndF32(float32x2_t* r, float32x2_t* v0) { *r = vrnd_f32(*v0); }
void VrndF64(float64x1_t* r, float64x1_t* v0) { *r = vrnd_f64(*v0); }
void Vrnd32XF32(float32x2_t* r, float32x2_t* v0) { *r = vrnd32x_f32(*v0); }
void Vrnd32XF64(float64x1_t* r, float64x1_t* v0) { *r = vrnd32x_f64(*v0); }
void Vrnd32XqF32(float32x4_t* r, float32x4_t* v0) { *r = vrnd32xq_f32(*v0); }
void Vrnd32XqF64(float64x2_t* r, float64x2_t* v0) { *r = vrnd32xq_f64(*v0); }
void Vrnd32ZF32(float32x2_t* r, float32x2_t* v0) { *r = vrnd32z_f32(*v0); }
void Vrnd32ZF64(float64x1_t* r, float64x1_t* v0) { *r = vrnd32z_f64(*v0); }
void Vrnd32ZqF32(float32x4_t* r, float32x4_t* v0) { *r = vrnd32zq_f32(*v0); }
void Vrnd32ZqF64(float64x2_t* r, float64x2_t* v0) { *r = vrnd32zq_f64(*v0); }
void Vrnd64XF32(float32x2_t* r, float32x2_t* v0) { *r = vrnd64x_f32(*v0); }
void Vrnd64XF64(float64x1_t* r, float64x1_t* v0) { *r = vrnd64x_f64(*v0); }
void Vrnd64XqF32(float32x4_t* r, float32x4_t* v0) { *r = vrnd64xq_f32(*v0); }
void Vrnd64XqF64(float64x2_t* r, float64x2_t* v0) { *r = vrnd64xq_f64(*v0); }
void Vrnd64ZF32(float32x2_t* r, float32x2_t* v0) { *r = vrnd64z_f32(*v0); }
void Vrnd64ZF64(float64x1_t* r, float64x1_t* v0) { *r = vrnd64z_f64(*v0); }
void Vrnd64ZqF32(float32x4_t* r, float32x4_t* v0) { *r = vrnd64zq_f32(*v0); }
void Vrnd64ZqF64(float64x2_t* r, float64x2_t* v0) { *r = vrnd64zq_f64(*v0); }
void VrndaF32(float32x2_t* r, float32x2_t* v0) { *r = vrnda_f32(*v0); }
void VrndaF64(float64x1_t* r, float64x1_t* v0) { *r = vrnda_f64(*v0); }
void VrndaqF32(float32x4_t* r, float32x4_t* v0) { *r = vrndaq_f32(*v0); }
void VrndaqF64(float64x2_t* r, float64x2_t* v0) { *r = vrndaq_f64(*v0); }
void VrndiF32(float32x2_t* r, float32x2_t* v0) { *r = vrndi_f32(*v0); }
void VrndiF64(float64x1_t* r, float64x1_t* v0) { *r = vrndi_f64(*v0); }
void VrndiqF32(float32x4_t* r, float32x4_t* v0) { *r = vrndiq_f32(*v0); }
void VrndiqF64(float64x2_t* r, float64x2_t* v0) { *r = vrndiq_f64(*v0); }
void VrndmF32(float32x2_t* r, float32x2_t* v0) { *r = vrndm_f32(*v0); }
void VrndmF64(float64x1_t* r, float64x1_t* v0) { *r = vrndm_f64(*v0); }
void VrndmqF32(float32x4_t* r, float32x4_t* v0) { *r = vrndmq_f32(*v0); }
void VrndmqF64(float64x2_t* r, float64x2_t* v0) { *r = vrndmq_f64(*v0); }
void VrndnF32(float32x2_t* r, float32x2_t* v0) { *r = vrndn_f32(*v0); }
void VrndnF64(float64x1_t* r, float64x1_t* v0) { *r = vrndn_f64(*v0); }
void VrndnqF32(float32x4_t* r, float32x4_t* v0) { *r = vrndnq_f32(*v0); }
void VrndnqF64(float64x2_t* r, float64x2_t* v0) { *r = vrndnq_f64(*v0); }
void VrndnsF32(float32_t* r, float32_t* v0) { *r = vrndns_f32(*v0); }
void VrndpF32(float32x2_t* r, float32x2_t* v0) { *r = vrndp_f32(*v0); }
void VrndpF64(float64x1_t* r, float64x1_t* v0) { *r = vrndp_f64(*v0); }
void VrndpqF32(float32x4_t* r, float32x4_t* v0) { *r = vrndpq_f32(*v0); }
void VrndpqF64(float64x2_t* r, float64x2_t* v0) { *r = vrndpq_f64(*v0); }
void VrndqF32(float32x4_t* r, float32x4_t* v0) { *r = vrndq_f32(*v0); }
void VrndqF64(float64x2_t* r, float64x2_t* v0) { *r = vrndq_f64(*v0); }
void VrndxF32(float32x2_t* r, float32x2_t* v0) { *r = vrndx_f32(*v0); }
void VrndxF64(float64x1_t* r, float64x1_t* v0) { *r = vrndx_f64(*v0); }
void VrndxqF32(float32x4_t* r, float32x4_t* v0) { *r = vrndxq_f32(*v0); }
void VrndxqF64(float64x2_t* r, float64x2_t* v0) { *r = vrndxq_f64(*v0); }
void VrshlS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vrshl_s8(*v0, *v1); }
void VrshlS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vrshl_s16(*v0, *v1); }
void VrshlS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vrshl_s32(*v0, *v1); }
void VrshlS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vrshl_s64(*v0, *v1); }
void VrshlU8(uint8x8_t* r, uint8x8_t* v0, int8x8_t* v1) { *r = vrshl_u8(*v0, *v1); }
void VrshlU16(uint16x4_t* r, uint16x4_t* v0, int16x4_t* v1) { *r = vrshl_u16(*v0, *v1); }
void VrshlU32(uint32x2_t* r, uint32x2_t* v0, int32x2_t* v1) { *r = vrshl_u32(*v0, *v1); }
void VrshlU64(uint64x1_t* r, uint64x1_t* v0, int64x1_t* v1) { *r = vrshl_u64(*v0, *v1); }
void VrshldS64(int64_t* r, int64_t* v0, int64_t* v1) { *r = vrshld_s64(*v0, *v1); }
void VrshldU64(uint64_t* r, uint64_t* v0, int64_t* v1) { *r = vrshld_u64(*v0, *v1); }
void VrshlqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vrshlq_s8(*v0, *v1); }
void VrshlqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vrshlq_s16(*v0, *v1); }
void VrshlqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vrshlq_s32(*v0, *v1); }
void VrshlqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vrshlq_s64(*v0, *v1); }
void VrshlqU8(uint8x16_t* r, uint8x16_t* v0, int8x16_t* v1) { *r = vrshlq_u8(*v0, *v1); }
void VrshlqU16(uint16x8_t* r, uint16x8_t* v0, int16x8_t* v1) { *r = vrshlq_u16(*v0, *v1); }
void VrshlqU32(uint32x4_t* r, uint32x4_t* v0, int32x4_t* v1) { *r = vrshlq_u32(*v0, *v1); }
void VrshlqU64(uint64x2_t* r, uint64x2_t* v0, int64x2_t* v1) { *r = vrshlq_u64(*v0, *v1); }
void VrsqrteU32(uint32x2_t* r, uint32x2_t* v0) { *r = vrsqrte_u32(*v0); }
void VrsqrteF32(float32x2_t* r, float32x2_t* v0) { *r = vrsqrte_f32(*v0); }
void VrsqrteF64(float64x1_t* r, float64x1_t* v0) { *r = vrsqrte_f64(*v0); }
void VrsqrtedF64(float64_t* r, float64_t* v0) { *r = vrsqrted_f64(*v0); }
void VrsqrteqU32(uint32x4_t* r, uint32x4_t* v0) { *r = vrsqrteq_u32(*v0); }
void VrsqrteqF32(float32x4_t* r, float32x4_t* v0) { *r = vrsqrteq_f32(*v0); }
void VrsqrteqF64(float64x2_t* r, float64x2_t* v0) { *r = vrsqrteq_f64(*v0); }
void VrsqrtesF32(float32_t* r, float32_t* v0) { *r = vrsqrtes_f32(*v0); }
void VrsqrtsF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vrsqrts_f32(*v0, *v1); }
void VrsqrtsF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vrsqrts_f64(*v0, *v1); }
void VrsqrtsdF64(float64_t* r, float64_t* v0, float64_t* v1) { *r = vrsqrtsd_f64(*v0, *v1); }
void VrsqrtsqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vrsqrtsq_f32(*v0, *v1); }
void VrsqrtsqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vrsqrtsq_f64(*v0, *v1); }
void VrsqrtssF32(float32_t* r, float32_t* v0, float32_t* v1) { *r = vrsqrtss_f32(*v0, *v1); }
void VrsubhnS16(int8x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vrsubhn_s16(*v0, *v1); }
void VrsubhnS32(int16x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vrsubhn_s32(*v0, *v1); }
void VrsubhnS64(int32x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vrsubhn_s64(*v0, *v1); }
void VrsubhnU16(uint8x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vrsubhn_u16(*v0, *v1); }
void VrsubhnU32(uint16x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vrsubhn_u32(*v0, *v1); }
void VrsubhnU64(uint32x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vrsubhn_u64(*v0, *v1); }
void VrsubhnHighS16(int8x16_t* r, int8x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vrsubhn_high_s16(*v0, *v1, *v2); }
void VrsubhnHighS32(int16x8_t* r, int16x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vrsubhn_high_s32(*v0, *v1, *v2); }
void VrsubhnHighS64(int32x4_t* r, int32x2_t* v0, int64x2_t* v1, int64x2_t* v2) { *r = vrsubhn_high_s64(*v0, *v1, *v2); }
void VrsubhnHighU16(uint8x16_t* r, uint8x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vrsubhn_high_u16(*v0, *v1, *v2); }
void VrsubhnHighU32(uint16x8_t* r, uint16x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vrsubhn_high_u32(*v0, *v1, *v2); }
void VrsubhnHighU64(uint32x4_t* r, uint32x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = vrsubhn_high_u64(*v0, *v1, *v2); }
void Vsha1CqU32(uint32x4_t* r, uint32x4_t* v0, uint32_t* v1, uint32x4_t* v2) { *r = vsha1cq_u32(*v0, *v1, *v2); }
void Vsha1HU32(uint32_t* r, uint32_t* v0) { *r = vsha1h_u32(*v0); }
void Vsha1MqU32(uint32x4_t* r, uint32x4_t* v0, uint32_t* v1, uint32x4_t* v2) { *r = vsha1mq_u32(*v0, *v1, *v2); }
void Vsha1PqU32(uint32x4_t* r, uint32x4_t* v0, uint32_t* v1, uint32x4_t* v2) { *r = vsha1pq_u32(*v0, *v1, *v2); }
void Vsha1Su0QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vsha1su0q_u32(*v0, *v1, *v2); }
void Vsha1Su1QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vsha1su1q_u32(*v0, *v1); }
void Vsha256H2QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vsha256h2q_u32(*v0, *v1, *v2); }
void Vsha256HqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vsha256hq_u32(*v0, *v1, *v2); }
void Vsha256Su0QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vsha256su0q_u32(*v0, *v1); }
void Vsha256Su1QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vsha256su1q_u32(*v0, *v1, *v2); }
void Vsha512H2QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = vsha512h2q_u64(*v0, *v1, *v2); }
void Vsha512HqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = vsha512hq_u64(*v0, *v1, *v2); }
void Vsha512Su0QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vsha512su0q_u64(*v0, *v1); }
void Vsha512Su1QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = vsha512su1q_u64(*v0, *v1, *v2); }
void VshlS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vshl_s8(*v0, *v1); }
void VshlS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vshl_s16(*v0, *v1); }
void VshlS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vshl_s32(*v0, *v1); }
void VshlS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vshl_s64(*v0, *v1); }
void VshlU8(uint8x8_t* r, uint8x8_t* v0, int8x8_t* v1) { *r = vshl_u8(*v0, *v1); }
void VshlU16(uint16x4_t* r, uint16x4_t* v0, int16x4_t* v1) { *r = vshl_u16(*v0, *v1); }
void VshlU32(uint32x2_t* r, uint32x2_t* v0, int32x2_t* v1) { *r = vshl_u32(*v0, *v1); }
void VshlU64(uint64x1_t* r, uint64x1_t* v0, int64x1_t* v1) { *r = vshl_u64(*v0, *v1); }
void VshldS64(int64_t* r, int64_t* v0, int64_t* v1) { *r = vshld_s64(*v0, *v1); }
void VshldU64(uint64_t* r, uint64_t* v0, int64_t* v1) { *r = vshld_u64(*v0, *v1); }
void VshlqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vshlq_s8(*v0, *v1); }
void VshlqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vshlq_s16(*v0, *v1); }
void VshlqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vshlq_s32(*v0, *v1); }
void VshlqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vshlq_s64(*v0, *v1); }
void VshlqU8(uint8x16_t* r, uint8x16_t* v0, int8x16_t* v1) { *r = vshlq_u8(*v0, *v1); }
void VshlqU16(uint16x8_t* r, uint16x8_t* v0, int16x8_t* v1) { *r = vshlq_u16(*v0, *v1); }
void VshlqU32(uint32x4_t* r, uint32x4_t* v0, int32x4_t* v1) { *r = vshlq_u32(*v0, *v1); }
void VshlqU64(uint64x2_t* r, uint64x2_t* v0, int64x2_t* v1) { *r = vshlq_u64(*v0, *v1); }
void Vsm3Partw1QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vsm3partw1q_u32(*v0, *v1, *v2); }
void Vsm3Partw2QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vsm3partw2q_u32(*v0, *v1, *v2); }
void Vsm3Ss1QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vsm3ss1q_u32(*v0, *v1, *v2); }
void Vsm4EkeyqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vsm4ekeyq_u32(*v0, *v1); }
void Vsm4EqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vsm4eq_u32(*v0, *v1); }
void VsqaddU8(uint8x8_t* r, uint8x8_t* v0, int8x8_t* v1) { *r = vsqadd_u8(*v0, *v1); }
void VsqaddU16(uint16x4_t* r, uint16x4_t* v0, int16x4_t* v1) { *r = vsqadd_u16(*v0, *v1); }
void VsqaddU32(uint32x2_t* r, uint32x2_t* v0, int32x2_t* v1) { *r = vsqadd_u32(*v0, *v1); }
void VsqaddU64(uint64x1_t* r, uint64x1_t* v0, int64x1_t* v1) { *r = vsqadd_u64(*v0, *v1); }
void VsqaddbU8(uint8_t* r, uint8_t* v0, int8_t* v1) { *r = vsqaddb_u8(*v0, *v1); }
void VsqadddU64(uint64_t* r, uint64_t* v0, int64_t* v1) { *r = vsqaddd_u64(*v0, *v1); }
void VsqaddhU16(uint16_t* r, uint16_t* v0, int16_t* v1) { *r = vsqaddh_u16(*v0, *v1); }
void VsqaddqU8(uint8x16_t* r, uint8x16_t* v0, int8x16_t* v1) { *r = vsqaddq_u8(*v0, *v1); }
void VsqaddqU16(uint16x8_t* r, uint16x8_t* v0, int16x8_t* v1) { *r = vsqaddq_u16(*v0, *v1); }
void VsqaddqU32(uint32x4_t* r, uint32x4_t* v0, int32x4_t* v1) { *r = vsqaddq_u32(*v0, *v1); }
void VsqaddqU64(uint64x2_t* r, uint64x2_t* v0, int64x2_t* v1) { *r = vsqaddq_u64(*v0, *v1); }
void VsqaddsU32(uint32_t* r, uint32_t* v0, int32_t* v1) { *r = vsqadds_u32(*v0, *v1); }
void VsqrtF32(float32x2_t* r, float32x2_t* v0) { *r = vsqrt_f32(*v0); }
void VsqrtF64(float64x1_t* r, float64x1_t* v0) { *r = vsqrt_f64(*v0); }
void VsqrtqF32(float32x4_t* r, float32x4_t* v0) { *r = vsqrtq_f32(*v0); }
void VsqrtqF64(float64x2_t* r, float64x2_t* v0) { *r = vsqrtq_f64(*v0); }
void VsubS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vsub_s8(*v0, *v1); }
void VsubS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vsub_s16(*v0, *v1); }
void VsubS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vsub_s32(*v0, *v1); }
void VsubS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vsub_s64(*v0, *v1); }
void VsubU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vsub_u8(*v0, *v1); }
void VsubU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vsub_u16(*v0, *v1); }
void VsubU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vsub_u32(*v0, *v1); }
void VsubU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vsub_u64(*v0, *v1); }
void VsubF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vsub_f32(*v0, *v1); }
void VsubF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vsub_f64(*v0, *v1); }
void VsubdS64(int64_t* r, int64_t* v0, int64_t* v1) { *r = vsubd_s64(*v0, *v1); }
void VsubdU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vsubd_u64(*v0, *v1); }
void VsubhnS16(int8x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vsubhn_s16(*v0, *v1); }
void VsubhnS32(int16x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vsubhn_s32(*v0, *v1); }
void VsubhnS64(int32x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vsubhn_s64(*v0, *v1); }
void VsubhnU16(uint8x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vsubhn_u16(*v0, *v1); }
void VsubhnU32(uint16x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vsubhn_u32(*v0, *v1); }
void VsubhnU64(uint32x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vsubhn_u64(*v0, *v1); }
void VsubhnHighS16(int8x16_t* r, int8x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vsubhn_high_s16(*v0, *v1, *v2); }
void VsubhnHighS32(int16x8_t* r, int16x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vsubhn_high_s32(*v0, *v1, *v2); }
void VsubhnHighS64(int32x4_t* r, int32x2_t* v0, int64x2_t* v1, int64x2_t* v2) { *r = vsubhn_high_s64(*v0, *v1, *v2); }
void VsubhnHighU16(uint8x16_t* r, uint8x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vsubhn_high_u16(*v0, *v1, *v2); }
void VsubhnHighU32(uint16x8_t* r, uint16x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vsubhn_high_u32(*v0, *v1, *v2); }
void VsubhnHighU64(uint32x4_t* r, uint32x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = vsubhn_high_u64(*v0, *v1, *v2); }
void VsublS8(int16x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vsubl_s8(*v0, *v1); }
void VsublS16(int32x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vsubl_s16(*v0, *v1); }
void VsublS32(int64x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vsubl_s32(*v0, *v1); }
void VsublU8(uint16x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vsubl_u8(*v0, *v1); }
void VsublU16(uint32x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vsubl_u16(*v0, *v1); }
void VsublU32(uint64x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vsubl_u32(*v0, *v1); }
void VsublHighS8(int16x8_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vsubl_high_s8(*v0, *v1); }
void VsublHighS16(int32x4_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vsubl_high_s16(*v0, *v1); }
void VsublHighS32(int64x2_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vsubl_high_s32(*v0, *v1); }
void VsublHighU8(uint16x8_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vsubl_high_u8(*v0, *v1); }
void VsublHighU16(uint32x4_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vsubl_high_u16(*v0, *v1); }
void VsublHighU32(uint64x2_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vsubl_high_u32(*v0, *v1); }
void VsubqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vsubq_s8(*v0, *v1); }
void VsubqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vsubq_s16(*v0, *v1); }
void VsubqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vsubq_s32(*v0, *v1); }
void VsubqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vsubq_s64(*v0, *v1); }
void VsubqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vsubq_u8(*v0, *v1); }
void VsubqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vsubq_u16(*v0, *v1); }
void VsubqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vsubq_u32(*v0, *v1); }
void VsubqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vsubq_u64(*v0, *v1); }
void VsubqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vsubq_f32(*v0, *v1); }
void VsubqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vsubq_f64(*v0, *v1); }
void VsubwS8(int16x8_t* r, int16x8_t* v0, int8x8_t* v1) { *r = vsubw_s8(*v0, *v1); }
void VsubwS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1) { *r = vsubw_s16(*v0, *v1); }
void VsubwS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1) { *r = vsubw_s32(*v0, *v1); }
void VsubwU8(uint16x8_t* r, uint16x8_t* v0, uint8x8_t* v1) { *r = vsubw_u8(*v0, *v1); }
void VsubwU16(uint32x4_t* r, uint32x4_t* v0, uint16x4_t* v1) { *r = vsubw_u16(*v0, *v1); }
void VsubwU32(uint64x2_t* r, uint64x2_t* v0, uint32x2_t* v1) { *r = vsubw_u32(*v0, *v1); }
void VsubwHighS8(int16x8_t* r, int16x8_t* v0, int8x16_t* v1) { *r = vsubw_high_s8(*v0, *v1); }
void VsubwHighS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1) { *r = vsubw_high_s16(*v0, *v1); }
void VsubwHighS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1) { *r = vsubw_high_s32(*v0, *v1); }
void VsubwHighU8(uint16x8_t* r, uint16x8_t* v0, uint8x16_t* v1) { *r = vsubw_high_u8(*v0, *v1); }
void VsubwHighU16(uint32x4_t* r, uint32x4_t* v0, uint16x8_t* v1) { *r = vsubw_high_u16(*v0, *v1); }
void VsubwHighU32(uint64x2_t* r, uint64x2_t* v0, uint32x4_t* v1) { *r = vsubw_high_u32(*v0, *v1); }
void Vtbl1S8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vtbl1_s8(*v0, *v1); }
void Vtbl1U8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vtbl1_u8(*v0, *v1); }
void Vtbl1P8(poly8x8_t* r, poly8x8_t* v0, uint8x8_t* v1) { *r = vtbl1_p8(*v0, *v1); }
void Vtbl2S8(int8x8_t* r, int8x8x2_t* v0, int8x8_t* v1) { *r = vtbl2_s8(*v0, *v1); }
void Vtbl2U8(uint8x8_t* r, uint8x8x2_t* v0, uint8x8_t* v1) { *r = vtbl2_u8(*v0, *v1); }
void Vtbl2P8(poly8x8_t* r, poly8x8x2_t* v0, uint8x8_t* v1) { *r = vtbl2_p8(*v0, *v1); }
void Vtbl3S8(int8x8_t* r, int8x8x3_t* v0, int8x8_t* v1) { *r = vtbl3_s8(*v0, *v1); }
void Vtbl3U8(uint8x8_t* r, uint8x8x3_t* v0, uint8x8_t* v1) { *r = vtbl3_u8(*v0, *v1); }
void Vtbl3P8(poly8x8_t* r, poly8x8x3_t* v0, uint8x8_t* v1) { *r = vtbl3_p8(*v0, *v1); }
void Vtbl4S8(int8x8_t* r, int8x8x4_t* v0, int8x8_t* v1) { *r = vtbl4_s8(*v0, *v1); }
void Vtbl4U8(uint8x8_t* r, uint8x8x4_t* v0, uint8x8_t* v1) { *r = vtbl4_u8(*v0, *v1); }
void Vtbl4P8(poly8x8_t* r, poly8x8x4_t* v0, uint8x8_t* v1) { *r = vtbl4_p8(*v0, *v1); }
void Vtbx1S8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1, int8x8_t* v2) { *r = vtbx1_s8(*v0, *v1, *v2); }
void Vtbx1U8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1, uint8x8_t* v2) { *r = vtbx1_u8(*v0, *v1, *v2); }
void Vtbx1P8(poly8x8_t* r, poly8x8_t* v0, poly8x8_t* v1, uint8x8_t* v2) { *r = vtbx1_p8(*v0, *v1, *v2); }
void Vtbx2S8(int8x8_t* r, int8x8_t* v0, int8x8x2_t* v1, int8x8_t* v2) { *r = vtbx2_s8(*v0, *v1, *v2); }
void Vtbx2U8(uint8x8_t* r, uint8x8_t* v0, uint8x8x2_t* v1, uint8x8_t* v2) { *r = vtbx2_u8(*v0, *v1, *v2); }
void Vtbx2P8(poly8x8_t* r, poly8x8_t* v0, poly8x8x2_t* v1, uint8x8_t* v2) { *r = vtbx2_p8(*v0, *v1, *v2); }
void Vtbx3S8(int8x8_t* r, int8x8_t* v0, int8x8x3_t* v1, int8x8_t* v2) { *r = vtbx3_s8(*v0, *v1, *v2); }
void Vtbx3U8(uint8x8_t* r, uint8x8_t* v0, uint8x8x3_t* v1, uint8x8_t* v2) { *r = vtbx3_u8(*v0, *v1, *v2); }
void Vtbx3P8(poly8x8_t* r, poly8x8_t* v0, poly8x8x3_t* v1, uint8x8_t* v2) { *r = vtbx3_p8(*v0, *v1, *v2); }
void Vtbx4S8(int8x8_t* r, int8x8_t* v0, int8x8x4_t* v1, int8x8_t* v2) { *r = vtbx4_s8(*v0, *v1, *v2); }
void Vtbx4U8(uint8x8_t* r, uint8x8_t* v0, uint8x8x4_t* v1, uint8x8_t* v2) { *r = vtbx4_u8(*v0, *v1, *v2); }
void Vtbx4P8(poly8x8_t* r, poly8x8_t* v0, poly8x8x4_t* v1, uint8x8_t* v2) { *r = vtbx4_p8(*v0, *v1, *v2); }
void VtrnS8(int8x8x2_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vtrn_s8(*v0, *v1); }
void VtrnS16(int16x4x2_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vtrn_s16(*v0, *v1); }
void VtrnS32(int32x2x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vtrn_s32(*v0, *v1); }
void VtrnU8(uint8x8x2_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vtrn_u8(*v0, *v1); }
void VtrnU16(uint16x4x2_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vtrn_u16(*v0, *v1); }
void VtrnU32(uint32x2x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vtrn_u32(*v0, *v1); }
void VtrnF32(float32x2x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vtrn_f32(*v0, *v1); }
void Vtrn1S8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vtrn1_s8(*v0, *v1); }
void Vtrn1S16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vtrn1_s16(*v0, *v1); }
void Vtrn1S32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vtrn1_s32(*v0, *v1); }
void Vtrn1U8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vtrn1_u8(*v0, *v1); }
void Vtrn1U16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vtrn1_u16(*v0, *v1); }
void Vtrn1U32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vtrn1_u32(*v0, *v1); }
void Vtrn1F32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vtrn1_f32(*v0, *v1); }
void Vtrn1P16(poly16x4_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vtrn1_p16(*v0, *v1); }
void Vtrn1P8(poly8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vtrn1_p8(*v0, *v1); }
void Vtrn1QS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vtrn1q_s8(*v0, *v1); }
void Vtrn1QS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vtrn1q_s16(*v0, *v1); }
void Vtrn1QS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vtrn1q_s32(*v0, *v1); }
void Vtrn1QS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vtrn1q_s64(*v0, *v1); }
void Vtrn1QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vtrn1q_u8(*v0, *v1); }
void Vtrn1QU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vtrn1q_u16(*v0, *v1); }
void Vtrn1QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vtrn1q_u32(*v0, *v1); }
void Vtrn1QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vtrn1q_u64(*v0, *v1); }
void Vtrn1QF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vtrn1q_f32(*v0, *v1); }
void Vtrn1QF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vtrn1q_f64(*v0, *v1); }
void Vtrn1QP16(poly16x8_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vtrn1q_p16(*v0, *v1); }
void Vtrn1QP64(poly64x2_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vtrn1q_p64(*v0, *v1); }
void Vtrn1QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vtrn1q_p8(*v0, *v1); }
void Vtrn2S8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vtrn2_s8(*v0, *v1); }
void Vtrn2S16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vtrn2_s16(*v0, *v1); }
void Vtrn2S32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vtrn2_s32(*v0, *v1); }
void Vtrn2U8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vtrn2_u8(*v0, *v1); }
void Vtrn2U16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vtrn2_u16(*v0, *v1); }
void Vtrn2U32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vtrn2_u32(*v0, *v1); }
void Vtrn2F32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vtrn2_f32(*v0, *v1); }
void Vtrn2P16(poly16x4_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vtrn2_p16(*v0, *v1); }
void Vtrn2P8(poly8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vtrn2_p8(*v0, *v1); }
void Vtrn2QS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vtrn2q_s8(*v0, *v1); }
void Vtrn2QS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vtrn2q_s16(*v0, *v1); }
void Vtrn2QS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vtrn2q_s32(*v0, *v1); }
void Vtrn2QS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vtrn2q_s64(*v0, *v1); }
void Vtrn2QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vtrn2q_u8(*v0, *v1); }
void Vtrn2QU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vtrn2q_u16(*v0, *v1); }
void Vtrn2QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vtrn2q_u32(*v0, *v1); }
void Vtrn2QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vtrn2q_u64(*v0, *v1); }
void Vtrn2QF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vtrn2q_f32(*v0, *v1); }
void Vtrn2QF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vtrn2q_f64(*v0, *v1); }
void Vtrn2QP16(poly16x8_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vtrn2q_p16(*v0, *v1); }
void Vtrn2QP64(poly64x2_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vtrn2q_p64(*v0, *v1); }
void Vtrn2QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vtrn2q_p8(*v0, *v1); }
void VtrnP16(poly16x4x2_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vtrn_p16(*v0, *v1); }
void VtrnP8(poly8x8x2_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vtrn_p8(*v0, *v1); }
void VtrnqS8(int8x16x2_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vtrnq_s8(*v0, *v1); }
void VtrnqS16(int16x8x2_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vtrnq_s16(*v0, *v1); }
void VtrnqS32(int32x4x2_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vtrnq_s32(*v0, *v1); }
void VtrnqU8(uint8x16x2_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vtrnq_u8(*v0, *v1); }
void VtrnqU16(uint16x8x2_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vtrnq_u16(*v0, *v1); }
void VtrnqU32(uint32x4x2_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vtrnq_u32(*v0, *v1); }
void VtrnqF32(float32x4x2_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vtrnq_f32(*v0, *v1); }
void VtrnqP16(poly16x8x2_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vtrnq_p16(*v0, *v1); }
void VtrnqP8(poly8x16x2_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vtrnq_p8(*v0, *v1); }
void VtstS8(uint8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vtst_s8(*v0, *v1); }
void VtstS16(uint16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vtst_s16(*v0, *v1); }
void VtstS32(uint32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vtst_s32(*v0, *v1); }
void VtstS64(uint64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vtst_s64(*v0, *v1); }
void VtstU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vtst_u8(*v0, *v1); }
void VtstU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vtst_u16(*v0, *v1); }
void VtstU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vtst_u32(*v0, *v1); }
void VtstU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vtst_u64(*v0, *v1); }
void VtstP16(uint16x4_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vtst_p16(*v0, *v1); }
void VtstP64(uint64x1_t* r, poly64x1_t* v0, poly64x1_t* v1) { *r = vtst_p64(*v0, *v1); }
void VtstP8(uint8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vtst_p8(*v0, *v1); }
void VtstdS64(uint64_t* r, int64_t* v0, int64_t* v1) { *r = vtstd_s64(*v0, *v1); }
void VtstdU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vtstd_u64(*v0, *v1); }
void VtstqS8(uint8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vtstq_s8(*v0, *v1); }
void VtstqS16(uint16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vtstq_s16(*v0, *v1); }
void VtstqS32(uint32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vtstq_s32(*v0, *v1); }
void VtstqS64(uint64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vtstq_s64(*v0, *v1); }
void VtstqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vtstq_u8(*v0, *v1); }
void VtstqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vtstq_u16(*v0, *v1); }
void VtstqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vtstq_u32(*v0, *v1); }
void VtstqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vtstq_u64(*v0, *v1); }
void VtstqP16(uint16x8_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vtstq_p16(*v0, *v1); }
void VtstqP64(uint64x2_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vtstq_p64(*v0, *v1); }
void VtstqP8(uint8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vtstq_p8(*v0, *v1); }
void VuqaddS8(int8x8_t* r, int8x8_t* v0, uint8x8_t* v1) { *r = vuqadd_s8(*v0, *v1); }
void VuqaddS16(int16x4_t* r, int16x4_t* v0, uint16x4_t* v1) { *r = vuqadd_s16(*v0, *v1); }
void VuqaddS32(int32x2_t* r, int32x2_t* v0, uint32x2_t* v1) { *r = vuqadd_s32(*v0, *v1); }
void VuqaddS64(int64x1_t* r, int64x1_t* v0, uint64x1_t* v1) { *r = vuqadd_s64(*v0, *v1); }
void VuqaddbS8(int8_t* r, int8_t* v0, uint8_t* v1) { *r = vuqaddb_s8(*v0, *v1); }
void VuqadddS64(int64_t* r, int64_t* v0, uint64_t* v1) { *r = vuqaddd_s64(*v0, *v1); }
void VuqaddhS16(int16_t* r, int16_t* v0, uint16_t* v1) { *r = vuqaddh_s16(*v0, *v1); }
void VuqaddqS8(int8x16_t* r, int8x16_t* v0, uint8x16_t* v1) { *r = vuqaddq_s8(*v0, *v1); }
void VuqaddqS16(int16x8_t* r, int16x8_t* v0, uint16x8_t* v1) { *r = vuqaddq_s16(*v0, *v1); }
void VuqaddqS32(int32x4_t* r, int32x4_t* v0, uint32x4_t* v1) { *r = vuqaddq_s32(*v0, *v1); }
void VuqaddqS64(int64x2_t* r, int64x2_t* v0, uint64x2_t* v1) { *r = vuqaddq_s64(*v0, *v1); }
void VuqaddsS32(int32_t* r, int32_t* v0, uint32_t* v1) { *r = vuqadds_s32(*v0, *v1); }
void VusdotS32(int32x2_t* r, int32x2_t* v0, uint8x8_t* v1, int8x8_t* v2) { *r = vusdot_s32(*v0, *v1, *v2); }
void VusdotqS32(int32x4_t* r, int32x4_t* v0, uint8x16_t* v1, int8x16_t* v2) { *r = vusdotq_s32(*v0, *v1, *v2); }
void VusmmlaqS32(int32x4_t* r, int32x4_t* v0, uint8x16_t* v1, int8x16_t* v2) { *r = vusmmlaq_s32(*v0, *v1, *v2); }
void VuzpS8(int8x8x2_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vuzp_s8(*v0, *v1); }
void VuzpS16(int16x4x2_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vuzp_s16(*v0, *v1); }
void VuzpS32(int32x2x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vuzp_s32(*v0, *v1); }
void VuzpU8(uint8x8x2_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vuzp_u8(*v0, *v1); }
void VuzpU16(uint16x4x2_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vuzp_u16(*v0, *v1); }
void VuzpU32(uint32x2x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vuzp_u32(*v0, *v1); }
void VuzpF32(float32x2x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vuzp_f32(*v0, *v1); }
void Vuzp1S8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vuzp1_s8(*v0, *v1); }
void Vuzp1S16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vuzp1_s16(*v0, *v1); }
void Vuzp1S32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vuzp1_s32(*v0, *v1); }
void Vuzp1U8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vuzp1_u8(*v0, *v1); }
void Vuzp1U16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vuzp1_u16(*v0, *v1); }
void Vuzp1U32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vuzp1_u32(*v0, *v1); }
void Vuzp1F32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vuzp1_f32(*v0, *v1); }
void Vuzp1P16(poly16x4_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vuzp1_p16(*v0, *v1); }
void Vuzp1P8(poly8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vuzp1_p8(*v0, *v1); }
void Vuzp1QS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vuzp1q_s8(*v0, *v1); }
void Vuzp1QS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vuzp1q_s16(*v0, *v1); }
void Vuzp1QS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vuzp1q_s32(*v0, *v1); }
void Vuzp1QS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vuzp1q_s64(*v0, *v1); }
void Vuzp1QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vuzp1q_u8(*v0, *v1); }
void Vuzp1QU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vuzp1q_u16(*v0, *v1); }
void Vuzp1QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vuzp1q_u32(*v0, *v1); }
void Vuzp1QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vuzp1q_u64(*v0, *v1); }
void Vuzp1QF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vuzp1q_f32(*v0, *v1); }
void Vuzp1QF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vuzp1q_f64(*v0, *v1); }
void Vuzp1QP16(poly16x8_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vuzp1q_p16(*v0, *v1); }
void Vuzp1QP64(poly64x2_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vuzp1q_p64(*v0, *v1); }
void Vuzp1QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vuzp1q_p8(*v0, *v1); }
void Vuzp2S8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vuzp2_s8(*v0, *v1); }
void Vuzp2S16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vuzp2_s16(*v0, *v1); }
void Vuzp2S32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vuzp2_s32(*v0, *v1); }
void Vuzp2U8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vuzp2_u8(*v0, *v1); }
void Vuzp2U16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vuzp2_u16(*v0, *v1); }
void Vuzp2U32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vuzp2_u32(*v0, *v1); }
void Vuzp2F32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vuzp2_f32(*v0, *v1); }
void Vuzp2P16(poly16x4_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vuzp2_p16(*v0, *v1); }
void Vuzp2P8(poly8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vuzp2_p8(*v0, *v1); }
void Vuzp2QS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vuzp2q_s8(*v0, *v1); }
void Vuzp2QS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vuzp2q_s16(*v0, *v1); }
void Vuzp2QS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vuzp2q_s32(*v0, *v1); }
void Vuzp2QS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vuzp2q_s64(*v0, *v1); }
void Vuzp2QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vuzp2q_u8(*v0, *v1); }
void Vuzp2QU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vuzp2q_u16(*v0, *v1); }
void Vuzp2QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vuzp2q_u32(*v0, *v1); }
void Vuzp2QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vuzp2q_u64(*v0, *v1); }
void Vuzp2QF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vuzp2q_f32(*v0, *v1); }
void Vuzp2QF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vuzp2q_f64(*v0, *v1); }
void Vuzp2QP16(poly16x8_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vuzp2q_p16(*v0, *v1); }
void Vuzp2QP64(poly64x2_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vuzp2q_p64(*v0, *v1); }
void Vuzp2QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vuzp2q_p8(*v0, *v1); }
void VuzpP16(poly16x4x2_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vuzp_p16(*v0, *v1); }
void VuzpP8(poly8x8x2_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vuzp_p8(*v0, *v1); }
void VuzpqS8(int8x16x2_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vuzpq_s8(*v0, *v1); }
void VuzpqS16(int16x8x2_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vuzpq_s16(*v0, *v1); }
void VuzpqS32(int32x4x2_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vuzpq_s32(*v0, *v1); }
void VuzpqU8(uint8x16x2_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vuzpq_u8(*v0, *v1); }
void VuzpqU16(uint16x8x2_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vuzpq_u16(*v0, *v1); }
void VuzpqU32(uint32x4x2_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vuzpq_u32(*v0, *v1); }
void VuzpqF32(float32x4x2_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vuzpq_f32(*v0, *v1); }
void VuzpqP16(poly16x8x2_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vuzpq_p16(*v0, *v1); }
void VuzpqP8(poly8x16x2_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vuzpq_p8(*v0, *v1); }
void VzipS8(int8x8x2_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vzip_s8(*v0, *v1); }
void VzipS16(int16x4x2_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vzip_s16(*v0, *v1); }
void VzipS32(int32x2x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vzip_s32(*v0, *v1); }
void VzipU8(uint8x8x2_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vzip_u8(*v0, *v1); }
void VzipU16(uint16x4x2_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vzip_u16(*v0, *v1); }
void VzipU32(uint32x2x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vzip_u32(*v0, *v1); }
void VzipF32(float32x2x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vzip_f32(*v0, *v1); }
void Vzip1S8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vzip1_s8(*v0, *v1); }
void Vzip1S16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vzip1_s16(*v0, *v1); }
void Vzip1S32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vzip1_s32(*v0, *v1); }
void Vzip1U8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vzip1_u8(*v0, *v1); }
void Vzip1U16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vzip1_u16(*v0, *v1); }
void Vzip1U32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vzip1_u32(*v0, *v1); }
void Vzip1F32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vzip1_f32(*v0, *v1); }
void Vzip1P16(poly16x4_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vzip1_p16(*v0, *v1); }
void Vzip1P8(poly8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vzip1_p8(*v0, *v1); }
void Vzip1QS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vzip1q_s8(*v0, *v1); }
void Vzip1QS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vzip1q_s16(*v0, *v1); }
void Vzip1QS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vzip1q_s32(*v0, *v1); }
void Vzip1QS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vzip1q_s64(*v0, *v1); }
void Vzip1QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vzip1q_u8(*v0, *v1); }
void Vzip1QU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vzip1q_u16(*v0, *v1); }
void Vzip1QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vzip1q_u32(*v0, *v1); }
void Vzip1QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vzip1q_u64(*v0, *v1); }
void Vzip1QF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vzip1q_f32(*v0, *v1); }
void Vzip1QF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vzip1q_f64(*v0, *v1); }
void Vzip1QP16(poly16x8_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vzip1q_p16(*v0, *v1); }
void Vzip1QP64(poly64x2_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vzip1q_p64(*v0, *v1); }
void Vzip1QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vzip1q_p8(*v0, *v1); }
void Vzip2S8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vzip2_s8(*v0, *v1); }
void Vzip2S16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vzip2_s16(*v0, *v1); }
void Vzip2S32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vzip2_s32(*v0, *v1); }
void Vzip2U8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vzip2_u8(*v0, *v1); }
void Vzip2U16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vzip2_u16(*v0, *v1); }
void Vzip2U32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vzip2_u32(*v0, *v1); }
void Vzip2F32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vzip2_f32(*v0, *v1); }
void Vzip2P16(poly16x4_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vzip2_p16(*v0, *v1); }
void Vzip2P8(poly8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vzip2_p8(*v0, *v1); }
void Vzip2QS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vzip2q_s8(*v0, *v1); }
void Vzip2QS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vzip2q_s16(*v0, *v1); }
void Vzip2QS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vzip2q_s32(*v0, *v1); }
void Vzip2QS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vzip2q_s64(*v0, *v1); }
void Vzip2QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vzip2q_u8(*v0, *v1); }
void Vzip2QU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vzip2q_u16(*v0, *v1); }
void Vzip2QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vzip2q_u32(*v0, *v1); }
void Vzip2QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vzip2q_u64(*v0, *v1); }
void Vzip2QF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vzip2q_f32(*v0, *v1); }
void Vzip2QF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vzip2q_f64(*v0, *v1); }
void Vzip2QP16(poly16x8_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vzip2q_p16(*v0, *v1); }
void Vzip2QP64(poly64x2_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vzip2q_p64(*v0, *v1); }
void Vzip2QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vzip2q_p8(*v0, *v1); }
void VzipP16(poly16x4x2_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vzip_p16(*v0, *v1); }
void VzipP8(poly8x8x2_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vzip_p8(*v0, *v1); }
void VzipqS8(int8x16x2_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vzipq_s8(*v0, *v1); }
void VzipqS16(int16x8x2_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vzipq_s16(*v0, *v1); }
void VzipqS32(int32x4x2_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vzipq_s32(*v0, *v1); }
void VzipqU8(uint8x16x2_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vzipq_u8(*v0, *v1); }
void VzipqU16(uint16x8x2_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vzipq_u16(*v0, *v1); }
void VzipqU32(uint32x4x2_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vzipq_u32(*v0, *v1); }
void VzipqF32(float32x4x2_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vzipq_f32(*v0, *v1); }
void VzipqP16(poly16x8x2_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vzipq_p16(*v0, *v1); }
void VzipqP8(poly8x16x2_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vzipq_p8(*v0, *v1); }


================================================
FILE: arm/neon/functions.go
================================================
package neon

import (
	"github.com/alivanz/go-simd/arm"
)

/*
#include <arm_neon.h>
*/
import "C"

// Signed Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register.
//
//go:linkname VabaS8 VabaS8
//go:noescape
func VabaS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8, v2 *arm.Int8X8)

// Signed Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register.
//
//go:linkname VabaS16 VabaS16
//go:noescape
func VabaS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4, v2 *arm.Int16X4)

// Signed Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register.
//
//go:linkname VabaS32 VabaS32
//go:noescape
func VabaS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2, v2 *arm.Int32X2)

// Unsigned Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register.
//
//go:linkname VabaU8 VabaU8
//go:noescape
func VabaU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8, v2 *arm.Uint8X8)

// Unsigned Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register.
//
//go:linkname VabaU16 VabaU16
//go:noescape
func VabaU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4, v2 *arm.Uint16X4)

// Unsigned Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register.
//
//go:linkname VabaU32 VabaU32
//go:noescape
func VabaU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2, v2 *arm.Uint32X2)

// Signed Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VabalS8 VabalS8
//go:noescape
func VabalS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X8, v2 *arm.Int8X8)

// Signed Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VabalS16 VabalS16
//go:noescape
func VabalS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4, v2 *arm.Int16X4)

// Signed Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VabalS32 VabalS32
//go:noescape
func VabalS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2, v2 *arm.Int32X2)

// Unsigned Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.
//
//go:linkname VabalU8 VabalU8
//go:noescape
func VabalU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X8, v2 *arm.Uint8X8)

// Unsigned Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.
//
//go:linkname VabalU16 VabalU16
//go:noescape
func VabalU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X4, v2 *arm.Uint16X4)

// Unsigned Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.
//
//go:linkname VabalU32 VabalU32
//go:noescape
func VabalU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X2, v2 *arm.Uint32X2)

// Signed Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VabalHighS8 VabalHighS8
//go:noescape
func VabalHighS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X16, v2 *arm.Int8X16)

// Signed Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VabalHighS16 VabalHighS16
//go:noescape
func VabalHighS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8, v2 *arm.Int16X8)

// Signed Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VabalHighS32 VabalHighS32
//go:noescape
func VabalHighS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4, v2 *arm.Int32X4)

// Unsigned Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.
//
//go:linkname VabalHighU8 VabalHighU8
//go:noescape
func VabalHighU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X16, v2 *arm.Uint8X16)

// Unsigned Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.
//
//go:linkname VabalHighU16 VabalHighU16
//go:noescape
func VabalHighU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X8, v2 *arm.Uint16X8)

// Unsigned Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.
//
//go:linkname VabalHighU32 VabalHighU32
//go:noescape
func VabalHighU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X4, v2 *arm.Uint32X4)

// Signed Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register.
//
//go:linkname VabaqS8 VabaqS8
//go:noescape
func VabaqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16, v2 *arm.Int8X16)

// Signed Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register.
//
//go:linkname VabaqS16 VabaqS16
//go:noescape
func VabaqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8, v2 *arm.Int16X8)

// Signed Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register.
//
//go:linkname VabaqS32 VabaqS32
//go:noescape
func VabaqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4, v2 *arm.Int32X4)

// Unsigned Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register.
//
//go:linkname VabaqU8 VabaqU8
//go:noescape
func VabaqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16, v2 *arm.Uint8X16)

// Unsigned Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register.
//
//go:linkname VabaqU16 VabaqU16
//go:noescape
func VabaqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8)

// Unsigned Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register.
//
//go:linkname VabaqU32 VabaqU32
//go:noescape
func VabaqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)

// Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdS8 VabdS8
//go:noescape
func VabdS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdS16 VabdS16
//go:noescape
func VabdS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdS32 VabdS32
//go:noescape
func VabdS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdU8 VabdU8
//go:noescape
func VabdU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdU16 VabdU16
//go:noescape
func VabdU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdU32 VabdU32
//go:noescape
func VabdU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdF32 VabdF32
//go:noescape
func VabdF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdF64 VabdF64
//go:noescape
func VabdF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)

// Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabddF64 VabddF64
//go:noescape
func VabddF64(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64)

// Signed Absolute Difference Long. This instruction subtracts the vector elements of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the results into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VabdlS8 VabdlS8
//go:noescape
func VabdlS8(r *arm.Int16X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Signed Absolute Difference Long. This instruction subtracts the vector elements of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the results into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VabdlS16 VabdlS16
//go:noescape
func VabdlS16(r *arm.Int32X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Signed Absolute Difference Long. This instruction subtracts the vector elements of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the results into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VabdlS32 VabdlS32
//go:noescape
func VabdlS32(r *arm.Int64X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Unsigned Absolute Difference Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.
//
//go:linkname VabdlU8 VabdlU8
//go:noescape
func VabdlU8(r *arm.Uint16X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Unsigned Absolute Difference Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.
//
//go:linkname VabdlU16 VabdlU16
//go:noescape
func VabdlU16(r *arm.Uint32X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Unsigned Absolute Difference Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.
//
//go:linkname VabdlU32 VabdlU32
//go:noescape
func VabdlU32(r *arm.Uint64X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Signed Absolute Difference Long. This instruction subtracts the vector elements of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the results into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VabdlHighS8 VabdlHighS8
//go:noescape
func VabdlHighS8(r *arm.Int16X8, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Signed Absolute Difference Long. This instruction subtracts the vector elements of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the results into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VabdlHighS16 VabdlHighS16
//go:noescape
func VabdlHighS16(r *arm.Int32X4, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Signed Absolute Difference Long. This instruction subtracts the vector elements of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the results into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VabdlHighS32 VabdlHighS32
//go:noescape
func VabdlHighS32(r *arm.Int64X2, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Unsigned Absolute Difference Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.
//
//go:linkname VabdlHighU8 VabdlHighU8
//go:noescape
func VabdlHighU8(r *arm.Uint16X8, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Unsigned Absolute Difference Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.
//
//go:linkname VabdlHighU16 VabdlHighU16
//go:noescape
func VabdlHighU16(r *arm.Uint32X4, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Unsigned Absolute Difference Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.
//
//go:linkname VabdlHighU32 VabdlHighU32
//go:noescape
func VabdlHighU32(r *arm.Uint64X2, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdqS8 VabdqS8
//go:noescape
func VabdqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdqS16 VabdqS16
//go:noescape
func VabdqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdqS32 VabdqS32
//go:noescape
func VabdqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdqU8 VabdqU8
//go:noescape
func VabdqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdqU16 VabdqU16
//go:noescape
func VabdqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdqU32 VabdqU32
//go:noescape
func VabdqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdqF32 VabdqF32
//go:noescape
func VabdqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdqF64 VabdqF64
//go:noescape
func VabdqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdsF32 VabdsF32
//go:noescape
func VabdsF32(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32)

// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabsS8 VabsS8
//go:noescape
func VabsS8(r *arm.Int8X8, v0 *arm.Int8X8)

// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabsS16 VabsS16
//go:noescape
func VabsS16(r *arm.Int16X4, v0 *arm.Int16X4)

// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabsS32 VabsS32
//go:noescape
func VabsS32(r *arm.Int32X2, v0 *arm.Int32X2)

// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabsS64 VabsS64
//go:noescape
func VabsS64(r *arm.Int64X1, v0 *arm.Int64X1)

// Floating-point Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabsF32 VabsF32
//go:noescape
func VabsF32(r *arm.Float32X2, v0 *arm.Float32X2)

// Floating-point Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabsF64 VabsF64
//go:noescape
func VabsF64(r *arm.Float64X1, v0 *arm.Float64X1)

// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabsdS64 VabsdS64
//go:noescape
func VabsdS64(r *arm.Int64, v0 *arm.Int64)

// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabsqS8 VabsqS8
//go:noescape
func VabsqS8(r *arm.Int8X16, v0 *arm.Int8X16)

// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabsqS16 VabsqS16
//go:noescape
func VabsqS16(r *arm.Int16X8, v0 *arm.Int16X8)

// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabsqS32 VabsqS32
//go:noescape
func VabsqS32(r *arm.Int32X4, v0 *arm.Int32X4)

// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabsqS64 VabsqS64
//go:noescape
func VabsqS64(r *arm.Int64X2, v0 *arm.Int64X2)

// Floating-point Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabsqF32 VabsqF32
//go:noescape
func VabsqF32(r *arm.Float32X4, v0 *arm.Float32X4)

// Floating-point Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabsqF64 VabsqF64
//go:noescape
func VabsqF64(r *arm.Float64X2, v0 *arm.Float64X2)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddS8 VaddS8
//go:noescape
func VaddS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddS16 VaddS16
//go:noescape
func VaddS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddS32 VaddS32
//go:noescape
func VaddS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddS64 VaddS64
//go:noescape
func VaddS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddU8 VaddU8
//go:noescape
func VaddU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddU16 VaddU16
//go:noescape
func VaddU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddU32 VaddU32
//go:noescape
func VaddU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddU64 VaddU64
//go:noescape
func VaddU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)

// Floating-point Add (vector). This instruction adds corresponding vector elements in the two source SIMD&FP registers, writes the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VaddF32 VaddF32
//go:noescape
func VaddF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Floating-point Add (vector). This instruction adds corresponding vector elements in the two source SIMD&FP registers, writes the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VaddF64 VaddF64
//go:noescape
func VaddF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VaddP16 VaddP16
//go:noescape
func VaddP16(r *arm.Poly16X4, v0 *arm.Poly16X4, v1 *arm.Poly16X4)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VaddP64 VaddP64
//go:noescape
func VaddP64(r *arm.Poly64X1, v0 *arm.Poly64X1, v1 *arm.Poly64X1)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VaddP8 VaddP8
//go:noescape
func VaddP8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VadddS64 VadddS64
//go:noescape
func VadddS64(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VadddU64 VadddU64
//go:noescape
func VadddU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64)

// Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VaddhnS16 VaddhnS16
//go:noescape
func VaddhnS16(r *arm.Int8X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VaddhnS32 VaddhnS32
//go:noescape
func VaddhnS32(r *arm.Int16X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VaddhnS64 VaddhnS64
//go:noescape
func VaddhnS64(r *arm.Int32X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VaddhnU16 VaddhnU16
//go:noescape
func VaddhnU16(r *arm.Uint8X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VaddhnU32 VaddhnU32
//go:noescape
func VaddhnU32(r *arm.Uint16X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VaddhnU64 VaddhnU64
//go:noescape
func VaddhnU64(r *arm.Uint32X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)

// Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VaddhnHighS16 VaddhnHighS16
//go:noescape
func VaddhnHighS16(r *arm.Int8X16, v0 *arm.Int8X8, v1 *arm.Int16X8, v2 *arm.Int16X8)

// Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VaddhnHighS32 VaddhnHighS32
//go:noescape
func VaddhnHighS32(r *arm.Int16X8, v0 *arm.Int16X4, v1 *arm.Int32X4, v2 *arm.Int32X4)

// Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VaddhnHighS64 VaddhnHighS64
//go:noescape
func VaddhnHighS64(r *arm.Int32X4, v0 *arm.Int32X2, v1 *arm.Int64X2, v2 *arm.Int64X2)

// Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VaddhnHighU16 VaddhnHighU16
//go:noescape
func VaddhnHighU16(r *arm.Uint8X16, v0 *arm.Uint8X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8)

// Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VaddhnHighU32 VaddhnHighU32
//go:noescape
func VaddhnHighU32(r *arm.Uint16X8, v0 *arm.Uint16X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)

// Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VaddhnHighU64 VaddhnHighU64
//go:noescape
func VaddhnHighU64(r *arm.Uint32X4, v0 *arm.Uint32X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2)

// Signed Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are signed integer values.
//
//go:linkname VaddlS8 VaddlS8
//go:noescape
func VaddlS8(r *arm.Int16X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Signed Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are signed integer values.
//
//go:linkname VaddlS16 VaddlS16
//go:noescape
func VaddlS16(r *arm.Int32X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Signed Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are signed integer values.
//
//go:linkname VaddlS32 VaddlS32
//go:noescape
func VaddlS32(r *arm.Int64X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Unsigned Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.
//
//go:linkname VaddlU8 VaddlU8
//go:noescape
func VaddlU8(r *arm.Uint16X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Unsigned Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.
//
//go:linkname VaddlU16 VaddlU16
//go:noescape
func VaddlU16(r *arm.Uint32X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Unsigned Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.
//
//go:linkname VaddlU32 VaddlU32
//go:noescape
func VaddlU32(r *arm.Uint64X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Signed Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are signed integer values.
//
//go:linkname VaddlHighS8 VaddlHighS8
//go:noescape
func VaddlHighS8(r *arm.Int16X8, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Signed Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are signed integer values.
//
//go:linkname VaddlHighS16 VaddlHighS16
//go:noescape
func VaddlHighS16(r *arm.Int32X4, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Signed Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are signed integer values.
//
//go:linkname VaddlHighS32 VaddlHighS32
//go:noescape
func VaddlHighS32(r *arm.Int64X2, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Unsigned Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.
//
//go:linkname VaddlHighU8 VaddlHighU8
//go:noescape
func VaddlHighU8(r *arm.Uint16X8, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Unsigned Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.
//
//go:linkname VaddlHighU16 VaddlHighU16
//go:noescape
func VaddlHighU16(r *arm.Uint32X4, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Unsigned Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.
//
//go:linkname VaddlHighU32 VaddlHighU32
//go:noescape
func VaddlHighU32(r *arm.Uint64X2, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Signed Add Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are signed integer values.
//
//go:linkname VaddlvS8 VaddlvS8
//go:noescape
func VaddlvS8(r *arm.Int16, v0 *arm.Int8X8)

// Signed Add Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are signed integer values.
//
//go:linkname VaddlvS16 VaddlvS16
//go:noescape
func VaddlvS16(r *arm.Int32, v0 *arm.Int16X4)

// Signed Add Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VaddlvS32 VaddlvS32
//go:noescape
func VaddlvS32(r *arm.Int64, v0 *arm.Int32X2)

// Unsigned sum Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are unsigned integer values.
//
//go:linkname VaddlvU8 VaddlvU8
//go:noescape
func VaddlvU8(r *arm.Uint16, v0 *arm.Uint8X8)

// Unsigned sum Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are unsigned integer values.
//
//go:linkname VaddlvU16 VaddlvU16
//go:noescape
func VaddlvU16(r *arm.Uint32, v0 *arm.Uint16X4)

// Unsigned Add Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VaddlvU32 VaddlvU32
//go:noescape
func VaddlvU32(r *arm.Uint64, v0 *arm.Uint32X2)

// Signed Add Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are signed integer values.
//
//go:linkname VaddlvqS8 VaddlvqS8
//go:noescape
func VaddlvqS8(r *arm.Int16, v0 *arm.Int8X16)

// Signed Add Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are signed integer values.
//
//go:linkname VaddlvqS16 VaddlvqS16
//go:noescape
func VaddlvqS16(r *arm.Int32, v0 *arm.Int16X8)

// Signed Add Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are signed integer values.
//
//go:linkname VaddlvqS32 VaddlvqS32
//go:noescape
func VaddlvqS32(r *arm.Int64, v0 *arm.Int32X4)

// Unsigned sum Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are unsigned integer values.
//
//go:linkname VaddlvqU8 VaddlvqU8
//go:noescape
func VaddlvqU8(r *arm.Uint16, v0 *arm.Uint8X16)

// Unsigned sum Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are unsigned integer values.
//
//go:linkname VaddlvqU16 VaddlvqU16
//go:noescape
func VaddlvqU16(r *arm.Uint32, v0 *arm.Uint16X8)

// Unsigned sum Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are unsigned integer values.
//
//go:linkname VaddlvqU32 VaddlvqU32
//go:noescape
func VaddlvqU32(r *arm.Uint64, v0 *arm.Uint32X4)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddqS8 VaddqS8
//go:noescape
func VaddqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddqS16 VaddqS16
//go:noescape
func VaddqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddqS32 VaddqS32
//go:noescape
func VaddqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddqS64 VaddqS64
//go:noescape
func VaddqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddqU8 VaddqU8
//go:noescape
func VaddqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddqU16 VaddqU16
//go:noescape
func VaddqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddqU32 VaddqU32
//go:noescape
func VaddqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddqU64 VaddqU64
//go:noescape
func VaddqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)

// Floating-point Add (vector). This instruction adds corresponding vector elements in the two source SIMD&FP registers, writes the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VaddqF32 VaddqF32
//go:noescape
func VaddqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Floating-point Add (vector). This instruction adds corresponding vector elements in the two source SIMD&FP registers, writes the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VaddqF64 VaddqF64
//go:noescape
func VaddqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VaddqP128 VaddqP128
//go:noescape
func VaddqP128(r *arm.Poly128, v0 *arm.Poly128, v1 *arm.Poly128)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VaddqP16 VaddqP16
//go:noescape
func VaddqP16(r *arm.Poly16X8, v0 *arm.Poly16X8, v1 *arm.Poly16X8)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VaddqP64 VaddqP64
//go:noescape
func VaddqP64(r *arm.Poly64X2, v0 *arm.Poly64X2, v1 *arm.Poly64X2)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VaddqP8 VaddqP8
//go:noescape
func VaddqP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16)

// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.
//
//go:linkname VaddvS8 VaddvS8
//go:noescape
func VaddvS8(r *arm.Int8, v0 *arm.Int8X8)

// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.
//
//go:linkname VaddvS16 VaddvS16
//go:noescape
func VaddvS16(r *arm.Int16, v0 *arm.Int16X4)

// Add across vector
//
//go:linkname VaddvS32 VaddvS32
//go:noescape
func VaddvS32(r *arm.Int32, v0 *arm.Int32X2)

// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.
//
//go:linkname VaddvU8 VaddvU8
//go:noescape
func VaddvU8(r *arm.Uint8, v0 *arm.Uint8X8)

// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.
//
//go:linkname VaddvU16 VaddvU16
//go:noescape
func VaddvU16(r *arm.Uint16, v0 *arm.Uint16X4)

// Add across vector
//
//go:linkname VaddvU32 VaddvU32
//go:noescape
func VaddvU32(r *arm.Uint32, v0 *arm.Uint32X2)

// Floating-point add across vector
//
//go:linkname VaddvF32 VaddvF32
//go:noescape
func VaddvF32(r *arm.Float32, v0 *arm.Float32X2)

// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.
//
//go:linkname VaddvqS8 VaddvqS8
//go:noescape
func VaddvqS8(r *arm.Int8, v0 *arm.Int8X16)

// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.
//
//go:linkname VaddvqS16 VaddvqS16
//go:noescape
func VaddvqS16(r *arm.Int16, v0 *arm.Int16X8)

// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.
//
//go:linkname VaddvqS32 VaddvqS32
//go:noescape
func VaddvqS32(r *arm.Int32, v0 *arm.Int32X4)

// Add across vector
//
//go:linkname VaddvqS64 VaddvqS64
//go:noescape
func VaddvqS64(r *arm.Int64, v0 *arm.Int64X2)

// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.
//
//go:linkname VaddvqU8 VaddvqU8
//go:noescape
func VaddvqU8(r *arm.Uint8, v0 *arm.Uint8X16)

// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.
//
//go:linkname VaddvqU16 VaddvqU16
//go:noescape
func VaddvqU16(r *arm.Uint16, v0 *arm.Uint16X8)

// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.
//
//go:linkname VaddvqU32 VaddvqU32
//go:noescape
func VaddvqU32(r *arm.Uint32, v0 *arm.Uint32X4)

// Add across vector
//
//go:linkname VaddvqU64 VaddvqU64
//go:noescape
func VaddvqU64(r *arm.Uint64, v0 *arm.Uint64X2)

// Floating-point add across vector
//
//go:linkname VaddvqF32 VaddvqF32
//go:noescape
func VaddvqF32(r *arm.Float32, v0 *arm.Float32X4)

// Floating-point add across vector
//
//go:linkname VaddvqF64 VaddvqF64
//go:noescape
func VaddvqF64(r *arm.Float64, v0 *arm.Float64X2)

// Signed Add Wide. This instruction adds vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the results in a vector, and writes the vector to the SIMD&FP destination register.
//
//go:linkname VaddwS8 VaddwS8
//go:noescape
func VaddwS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X8)

// Signed Add Wide. This instruction adds vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the results in a vector, and writes the vector to the SIMD&FP destination register.
//
//go:linkname VaddwS16 VaddwS16
//go:noescape
func VaddwS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4)

// Signed Add Wide. This instruction adds vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the results in a vector, and writes the vector to the SIMD&FP destination register.
//
//go:linkname VaddwS32 VaddwS32
//go:noescape
func VaddwS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2)

// Unsigned Add Wide. This instruction adds the vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. The vector elements of the destination register and the first source register are twice as long as the vector elements of the second source register. All the values in this instruction are unsigned integer values.
//
//go:linkname VaddwU8 VaddwU8
//go:noescape
func VaddwU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X8)

// Unsigned Add Wide. This instruction adds the vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. The vector elements of the destination register and the first source register are twice as long as the vector elements of the second source register. All the values in this instruction are unsigned integer values.
//
//go:linkname VaddwU16 VaddwU16
//go:noescape
func VaddwU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X4)

// Unsigned Add Wide. This instruction adds the vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. The vector elements of the destination register and the first source register are twice as long as the vector elements of the second source register. All the values in this instruction are unsigned integer values.
//
//go:linkname VaddwU32 VaddwU32
//go:noescape
func VaddwU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X2)

// Signed Add Wide. This instruction adds vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the results in a vector, and writes the vector to the SIMD&FP destination register.
//
//go:linkname VaddwHighS8 VaddwHighS8
//go:noescape
func VaddwHighS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X16)

// Signed Add Wide. This instruction adds vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the results in a vector, and writes the vector to the SIMD&FP destination register.
//
//go:linkname VaddwHighS16 VaddwHighS16
//go:noescape
func VaddwHighS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8)

// Signed Add Wide. This instruction adds vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the results in a vector, and writes the vector to the SIMD&FP destination register.
//
//go:linkname VaddwHighS32 VaddwHighS32
//go:noescape
func VaddwHighS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4)

// Unsigned Add Wide. This instruction adds the vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. The vector elements of the destination register and the first source register are twice as long as the vector elements of the second source register. All the values in this instruction are unsigned integer values.
//
//go:linkname VaddwHighU8 VaddwHighU8
//go:noescape
func VaddwHighU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X16)

// Unsigned Add Wide. This instruction adds the vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. The vector elements of the destination register and the first source register are twice as long as the vector elements of the second source register. All the values in this instruction are unsigned integer values.
//
//go:linkname VaddwHighU16 VaddwHighU16
//go:noescape
func VaddwHighU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X8)

// Unsigned Add Wide. This instruction adds the vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. The vector elements of the destination register and the first source register are twice as long as the vector elements of the second source register. All the values in this instruction are unsigned integer values.
//
//go:linkname VaddwHighU32 VaddwHighU32
//go:noescape
func VaddwHighU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X4)

// AES single round decryption.
//
//go:linkname VaesdqU8 VaesdqU8
//go:noescape
func VaesdqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// AES single round encryption.
//
//go:linkname VaeseqU8 VaeseqU8
//go:noescape
func VaeseqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// AES inverse mix columns.
//
//go:linkname VaesimcqU8 VaesimcqU8
//go:noescape
func VaesimcqU8(r *arm.Uint8X16, v0 *arm.Uint8X16)

// AES mix columns.
//
//go:linkname VaesmcqU8 VaesmcqU8
//go:noescape
func VaesmcqU8(r *arm.Uint8X16, v0 *arm.Uint8X16)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandS8 VandS8
//go:noescape
func VandS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandS16 VandS16
//go:noescape
func VandS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandS32 VandS32
//go:noescape
func VandS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandS64 VandS64
//go:noescape
func VandS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandU8 VandU8
//go:noescape
func VandU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandU16 VandU16
//go:noescape
func VandU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandU32 VandU32
//go:noescape
func VandU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandU64 VandU64
//go:noescape
func VandU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandqS8 VandqS8
//go:noescape
func VandqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandqS16 VandqS16
//go:noescape
func VandqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandqS32 VandqS32
//go:noescape
func VandqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandqS64 VandqS64
//go:noescape
func VandqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandqU8 VandqU8
//go:noescape
func VandqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandqU16 VandqU16
//go:noescape
func VandqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandqU32 VandqU32
//go:noescape
func VandqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandqU64 VandqU64
//go:noescape
func VandqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)

// Bit Clear and Exclusive OR performs a bitwise AND of the 128-bit vector in a source SIMD&FP register and the complement of the vector in another source SIMD&FP register, then performs a bitwise exclusive OR of the resulting vector and the vector in a third source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbcaxqS8 VbcaxqS8
//go:noescape
func VbcaxqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16, v2 *arm.Int8X16)

// Bit Clear and Exclusive OR performs a bitwise AND of the 128-bit vector in a source SIMD&FP register and the complement of the vector in another source SIMD&FP register, then performs a bitwise exclusive OR of the resulting vector and the vector in a third source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbcaxqS16 VbcaxqS16
//go:noescape
func VbcaxqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8, v2 *arm.Int16X8)

// Bit Clear and Exclusive OR performs a bitwise AND of the 128-bit vector in a source SIMD&FP register and the complement of the vector in another source SIMD&FP register, then performs a bitwise exclusive OR of the resulting vector and the vector in a third source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbcaxqS32 VbcaxqS32
//go:noescape
func VbcaxqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4, v2 *arm.Int32X4)

// Bit Clear and Exclusive OR performs a bitwise AND of the 128-bit vector in a source SIMD&FP register and the complement of the vector in another source SIMD&FP register, then performs a bitwise exclusive OR of the resulting vector and the vector in a third source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbcaxqS64 VbcaxqS64
//go:noescape
func VbcaxqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2, v2 *arm.Int64X2)

// Bit Clear and Exclusive OR performs a bitwise AND of the 128-bit vector in a source SIMD&FP register and the complement of the vector in another source SIMD&FP register, then performs a bitwise exclusive OR of the resulting vector and the vector in a third source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbcaxqU8 VbcaxqU8
//go:noescape
func VbcaxqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16, v2 *arm.Uint8X16)

// Bit Clear and Exclusive OR performs a bitwise AND of the 128-bit vector in a source SIMD&FP register and the complement of the vector in another source SIMD&FP register, then performs a bitwise exclusive OR of the resulting vector and the vector in a third source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbcaxqU16 VbcaxqU16
//go:noescape
func VbcaxqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8)

// Bit Clear and Exclusive OR performs a bitwise AND of the 128-bit vector in a source SIMD&FP register and the complement of the vector in another source SIMD&FP register, then performs a bitwise exclusive OR of the resulting vector and the vector in a third source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbcaxqU32 VbcaxqU32
//go:noescape
func VbcaxqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)

// Bit Clear and Exclusive OR performs a bitwise AND of the 128-bit vector in a source SIMD&FP register and the complement of the vector in another source SIMD&FP register, then performs a bitwise exclusive OR of the resulting vector and the vector in a third source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbcaxqU64 VbcaxqU64
//go:noescape
func VbcaxqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicS8 VbicS8
//go:noescape
func VbicS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicS16 VbicS16
//go:noescape
func VbicS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicS32 VbicS32
//go:noescape
func VbicS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicS64 VbicS64
//go:noescape
func VbicS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicU8 VbicU8
//go:noescape
func VbicU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicU16 VbicU16
//go:noescape
func VbicU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicU32 VbicU32
//go:noescape
func VbicU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicU64 VbicU64
//go:noescape
func VbicU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicqS8 VbicqS8
//go:noescape
func VbicqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicqS16 VbicqS16
//go:noescape
func VbicqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicqS32 VbicqS32
//go:noescape
func VbicqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicqS64 VbicqS64
//go:noescape
func VbicqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicqU8 VbicqU8
//go:noescape
func VbicqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicqU16 VbicqU16
//go:noescape
func VbicqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicqU32 VbicqU32
//go:noescape
func VbicqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicqU64 VbicqU64
//go:noescape
func VbicqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)

// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.
//
//go:linkname VbslS8 VbslS8
//go:noescape
func VbslS8(r *arm.Int8X8, v0 *arm.Uint8X8, v1 *arm.Int8X8, v2 *arm.Int8X8)

// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.
//
//go:linkname VbslS16 VbslS16
//go:noescape
func VbslS16(r *arm.Int16X4, v0 *arm.Uint16X4, v1 *arm.Int16X4, v2 *arm.Int16X4)

// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.
//
//go:linkname VbslS32 VbslS32
//go:noescape
func VbslS32(r *arm.Int32X2, v0 *arm.Uint32X2, v1 *arm.Int32X2, v2 *arm.Int32X2)

// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.
//
//go:linkname VbslS64 VbslS64
//go:noescape
func VbslS64(r *arm.Int64X1, v0 *arm.Uint64X1, v1 *arm.Int64X1, v2 *arm.Int64X1)

// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.
//
//go:linkname VbslU8 VbslU8
//go:noescape
func VbslU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8, v2 *arm.Uint8X8)

// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.
//
//go:linkname VbslU16 VbslU16
//go:noescape
func VbslU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4, v2 *arm.Uint16X4)

// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.
//
//go:linkname VbslU32 VbslU32
//go:noescape
func VbslU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2, v2 *arm.Uint32X2)

// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.
//
//go:linkname VbslU64 VbslU64
//go:noescape
func VbslU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1, v2 *arm.Uint64X1)

// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.
//
//go:linkname VbslF32 VbslF32
//go:noescape
func VbslF32(r *arm.Float32X2, v0 *arm.Uint32X2, v1 *arm.Float32X2, v2 *arm.Float32X2)

// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.
//
//go:linkname VbslF64 VbslF64
//go:noescape
func VbslF64(r *arm.Float64X1, v0 *arm.Uint64X1, v1 *arm.Float64X1, v2 *arm.Float64X1)

// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.
//
//go:linkname VbslP16 VbslP16
//go:noescape
func VbslP16(r *arm.Poly16X4, v0 *arm.Uint16X4, v1 *arm.Poly16X4, v2 *arm.Poly16X4)

// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.
//
//go:linkname VbslP64 VbslP64
//go:noescape
func VbslP64(r *arm.Poly64X1, v0 *arm.Uint64X1, v1 *arm.Poly64X1, v2 *arm.Poly64X1)

// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.
//
//go:linkname VbslP8 VbslP8
//go:noescape
func VbslP8(r *arm.Poly8X8, v0 *arm.Uint8X8, v1 *arm.Poly8X8, v2 *arm.Poly8X8)

// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.
//
//go:linkname VbslqS8 VbslqS8
//go:noescape
func VbslqS8(r *arm.Int8X16, v0 *arm.Uint8X16, v1 *arm.Int8X16, v2 *arm.Int8X16)

// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.
//
//go:linkname VbslqS16 VbslqS16
//go:noescape
func VbslqS16(r *arm.Int16X8, v0 *arm.Uint16X8, v1 *arm.Int16X8, v2 *arm.Int16X8)

// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.
//
//go:linkname VbslqS32 VbslqS32
//go:noescape
func VbslqS32(r *arm.Int32X4, v0 *arm.Uint32X4, v1 *arm.Int32X4, v2 *arm.Int32X4)

// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.
//
//go:linkname VbslqS64 VbslqS64
//go:noescape
func VbslqS64(r *arm.Int64X2, v0 *arm.Uint64X2, v1 *arm.Int64X2, v2 *arm.Int64X2)

// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.
//
//go:linkname VbslqU8 VbslqU8
//go:noescape
func VbslqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16, v2 *arm.Uint8X16)

// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.
//
//go:linkname VbslqU16 VbslqU16
//go:noescape
func VbslqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8)

// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.
//
//go:linkname VbslqU32 VbslqU32
//go:noescape
func VbslqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)

// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.
//
//go:linkname VbslqU64 VbslqU64
//go:noescape
func VbslqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2)

// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.
//
//go:linkname VbslqF32 VbslqF32
//go:noescape
func VbslqF32(r *arm.Float32X4, v0 *arm.Uint32X4, v1 *arm.Float32X4, v2 *arm.Float32X4)

// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.
//
//go:linkname VbslqF64 VbslqF64
//go:noescape
func VbslqF64(r *arm.Float64X2, v0 *arm.Uint64X2, v1 *arm.Float64X2, v2 *arm.Float64X2)

// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.
//
//go:linkname VbslqP16 VbslqP16
//go:noescape
func VbslqP16(r *arm.Poly16X8, v0 *arm.Uint16X8, v1 *arm.Poly16X8, v2 *arm.Poly16X8)

// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.
//
//go:linkname VbslqP64 VbslqP64
//go:noescape
func VbslqP64(r *arm.Poly64X2, v0 *arm.Uint64X2, v1 *arm.Poly64X2, v2 *arm.Poly64X2)

// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.
//
//go:linkname VbslqP8 VbslqP8
//go:noescape
func VbslqP8(r *arm.Poly8X16, v0 *arm.Uint8X16, v1 *arm.Poly8X16, v2 *arm.Poly8X16)

// Floating-point Complex Add.
//
//go:linkname VcaddRot270F32 VcaddRot270F32
//go:noescape
func VcaddRot270F32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Floating-point Complex Add.
//
//go:linkname VcaddRot90F32 VcaddRot90F32
//go:noescape
func VcaddRot90F32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Floating-point Complex Add.
//
//go:linkname VcaddqRot270F32 VcaddqRot270F32
//go:noescape
func VcaddqRot270F32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Floating-point Complex Add.
//
//go:linkname VcaddqRot270F64 VcaddqRot270F64
//go:noescape
func VcaddqRot270F64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Floating-point Complex Add.
//
//go:linkname VcaddqRot90F32 VcaddqRot90F32
//go:noescape
func VcaddqRot90F32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Floating-point Complex Add.
//
//go:linkname VcaddqRot90F64 VcaddqRot90F64
//go:noescape
func VcaddqRot90F64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcageF32 VcageF32
//go:noescape
func VcageF32(r *arm.Uint32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcageF64 VcageF64
//go:noescape
func VcageF64(r *arm.Uint64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)

// Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcagedF64 VcagedF64
//go:noescape
func VcagedF64(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64)

// Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcageqF32 VcageqF32
//go:noescape
func VcageqF32(r *arm.Uint32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcageqF64 VcageqF64
//go:noescape
func VcageqF64(r *arm.Uint64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcagesF32 VcagesF32
//go:noescape
func VcagesF32(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32)

// Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcagtF32 VcagtF32
//go:noescape
func VcagtF32(r *arm.Uint32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcagtF64 VcagtF64
//go:noescape
func VcagtF64(r *arm.Uint64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)

// Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcagtdF64 VcagtdF64
//go:noescape
func VcagtdF64(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64)

// Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcagtqF32 VcagtqF32
//go:noescape
func VcagtqF32(r *arm.Uint32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcagtqF64 VcagtqF64
//go:noescape
func VcagtqF64(r *arm.Uint64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcagtsF32 VcagtsF32
//go:noescape
func VcagtsF32(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32)

// Floating-point absolute compare less than or equal
//
//go:linkname VcaleF32 VcaleF32
//go:noescape
func VcaleF32(r *arm.Uint32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Floating-point absolute compare less than or equal
//
//go:linkname VcaleF64 VcaleF64
//go:noescape
func VcaleF64(r *arm.Uint64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)

// Floating-point absolute compare less than or equal
//
//go:linkname VcaledF64 VcaledF64
//go:noescape
func VcaledF64(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64)

// Floating-point absolute compare less than or equal
//
//go:linkname VcaleqF32 VcaleqF32
//go:noescape
func VcaleqF32(r *arm.Uint32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Floating-point absolute compare less than or equal
//
//go:linkname VcaleqF64 VcaleqF64
//go:noescape
func VcaleqF64(r *arm.Uint64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Floating-point absolute compare less than or equal
//
//go:linkname VcalesF32 VcalesF32
//go:noescape
func VcalesF32(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32)

// Floating-point absolute compare less than
//
//go:linkname VcaltF32 VcaltF32
//go:noescape
func VcaltF32(r *arm.Uint32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Floating-point absolute compare less than
//
//go:linkname VcaltF64 VcaltF64
//go:noescape
func VcaltF64(r *arm.Uint64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)

// Floating-point absolute compare less than
//
//go:linkname VcaltdF64 VcaltdF64
//go:noescape
func VcaltdF64(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64)

// Floating-point absolute compare less than
//
//go:linkname VcaltqF32 VcaltqF32
//go:noescape
func VcaltqF32(r *arm.Uint32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Floating-point absolute compare less than
//
//go:linkname VcaltqF64 VcaltqF64
//go:noescape
func VcaltqF64(r *arm.Uint64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Floating-point absolute compare less than
//
//go:linkname VcaltsF32 VcaltsF32
//go:noescape
func VcaltsF32(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqS8 VceqS8
//go:noescape
func VceqS8(r *arm.Uint8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqS16 VceqS16
//go:noescape
func VceqS16(r *arm.Uint16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqS32 VceqS32
//go:noescape
func VceqS32(r *arm.Uint32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqS64 VceqS64
//go:noescape
func VceqS64(r *arm.Uint64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqU8 VceqU8
//go:noescape
func VceqU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqU16 VceqU16
//go:noescape
func VceqU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqU32 VceqU32
//go:noescape
func VceqU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqU64 VceqU64
//go:noescape
func VceqU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)

// Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqF32 VceqF32
//go:noescape
func VceqF32(r *arm.Uint32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqF64 VceqF64
//go:noescape
func VceqF64(r *arm.Uint64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqP64 VceqP64
//go:noescape
func VceqP64(r *arm.Uint64X1, v0 *arm.Poly64X1, v1 *arm.Poly64X1)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqP8 VceqP8
//go:noescape
func VceqP8(r *arm.Uint8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqdS64 VceqdS64
//go:noescape
func VceqdS64(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqdU64 VceqdU64
//go:noescape
func VceqdU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64)

// Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqdF64 VceqdF64
//go:noescape
func VceqdF64(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqqS8 VceqqS8
//go:noescape
func VceqqS8(r *arm.Uint8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqqS16 VceqqS16
//go:noescape
func VceqqS16(r *arm.Uint16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqqS32 VceqqS32
//go:noescape
func VceqqS32(r *arm.Uint32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqqS64 VceqqS64
//go:noescape
func VceqqS64(r *arm.Uint64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqqU8 VceqqU8
//go:noescape
func VceqqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqqU16 VceqqU16
//go:noescape
func VceqqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqqU32 VceqqU32
//go:noescape
func VceqqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqqU64 VceqqU64
//go:noescape
func VceqqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)

// Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqqF32 VceqqF32
//go:noescape
func VceqqF32(r *arm.Uint32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqqF64 VceqqF64
//go:noescape
func VceqqF64(r *arm.Uint64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqqP64 VceqqP64
//go:noescape
func VceqqP64(r *arm.Uint64X2, v0 *arm.Poly64X2, v1 *arm.Poly64X2)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqqP8 VceqqP8
//go:noescape
func VceqqP8(r *arm.Uint8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16)

// Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqsF32 VceqsF32
//go:noescape
func VceqsF32(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzS8 VceqzS8
//go:noescape
func VceqzS8(r *arm.Uint8X8, v0 *arm.Int8X8)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzS16 VceqzS16
//go:noescape
func VceqzS16(r *arm.Uint16X4, v0 *arm.Int16X4)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzS32 VceqzS32
//go:noescape
func VceqzS32(r *arm.Uint32X2, v0 *arm.Int32X2)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzS64 VceqzS64
//go:noescape
func VceqzS64(r *arm.Uint64X1, v0 *arm.Int64X1)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzU8 VceqzU8
//go:noescape
func VceqzU8(r *arm.Uint8X8, v0 *arm.Uint8X8)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzU16 VceqzU16
//go:noescape
func VceqzU16(r *arm.Uint16X4, v0 *arm.Uint16X4)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzU32 VceqzU32
//go:noescape
func VceqzU32(r *arm.Uint32X2, v0 *arm.Uint32X2)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzU64 VceqzU64
//go:noescape
func VceqzU64(r *arm.Uint64X1, v0 *arm.Uint64X1)

// Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzF32 VceqzF32
//go:noescape
func VceqzF32(r *arm.Uint32X2, v0 *arm.Float32X2)

// Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzF64 VceqzF64
//go:noescape
func VceqzF64(r *arm.Uint64X1, v0 *arm.Float64X1)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzP64 VceqzP64
//go:noescape
func VceqzP64(r *arm.Uint64X1, v0 *arm.Poly64X1)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzP8 VceqzP8
//go:noescape
func VceqzP8(r *arm.Uint8X8, v0 *arm.Poly8X8)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzdS64 VceqzdS64
//go:noescape
func VceqzdS64(r *arm.Uint64, v0 *arm.Int64)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzdU64 VceqzdU64
//go:noescape
func VceqzdU64(r *arm.Uint64, v0 *arm.Uint64)

// Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzdF64 VceqzdF64
//go:noescape
func VceqzdF64(r *arm.Uint64, v0 *arm.Float64)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzqS8 VceqzqS8
//go:noescape
func VceqzqS8(r *arm.Uint8X16, v0 *arm.Int8X16)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzqS16 VceqzqS16
//go:noescape
func VceqzqS16(r *arm.Uint16X8, v0 *arm.Int16X8)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzqS32 VceqzqS32
//go:noescape
func VceqzqS32(r *arm.Uint32X4, v0 *arm.Int32X4)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzqS64 VceqzqS64
//go:noescape
func VceqzqS64(r *arm.Uint64X2, v0 *arm.Int64X2)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzqU8 VceqzqU8
//go:noescape
func VceqzqU8(r *arm.Uint8X16, v0 *arm.Uint8X16)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzqU16 VceqzqU16
//go:noescape
func VceqzqU16(r *arm.Uint16X8, v0 *arm.Uint16X8)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzqU32 VceqzqU32
//go:noescape
func VceqzqU32(r *arm.Uint32X4, v0 *arm.Uint32X4)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzqU64 VceqzqU64
//go:noescape
func VceqzqU64(r *arm.Uint64X2, v0 *arm.Uint64X2)

// Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzqF32 VceqzqF32
//go:noescape
func VceqzqF32(r *arm.Uint32X4, v0 *arm.Float32X4)

// Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzqF64 VceqzqF64
//go:noescape
func VceqzqF64(r *arm.Uint64X2, v0 *arm.Float64X2)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzqP64 VceqzqP64
//go:noescape
func VceqzqP64(r *arm.Uint64X2, v0 *arm.Poly64X2)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzqP8 VceqzqP8
//go:noescape
func VceqzqP8(r *arm.Uint8X16, v0 *arm.Poly8X16)

// Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzsF32 VceqzsF32
//go:noescape
func VceqzsF32(r *arm.Uint32, v0 *arm.Float32)

// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeS8 VcgeS8
//go:noescape
func VcgeS8(r *arm.Uint8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeS16 VcgeS16
//go:noescape
func VcgeS16(r *arm.Uint16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeS32 VcgeS32
//go:noescape
func VcgeS32(r *arm.Uint32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeS64 VcgeS64
//go:noescape
func VcgeS64(r *arm.Uint64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)

// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeU8 VcgeU8
//go:noescape
func VcgeU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeU16 VcgeU16
//go:noescape
func VcgeU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeU32 VcgeU32
//go:noescape
func VcgeU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeU64 VcgeU64
//go:noescape
func VcgeU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)

// Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeF32 VcgeF32
//go:noescape
func VcgeF32(r *arm.Uint32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeF64 VcgeF64
//go:noescape
func VcgeF64(r *arm.Uint64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)

// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgedS64 VcgedS64
//go:noescape
func VcgedS64(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64)

// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgedU64 VcgedU64
//go:noescape
func VcgedU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64)

// Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgedF64 VcgedF64
//go:noescape
func VcgedF64(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64)

// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeqS8 VcgeqS8
//go:noescape
func VcgeqS8(r *arm.Uint8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeqS16 VcgeqS16
//go:noescape
func VcgeqS16(r *arm.Uint16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeqS32 VcgeqS32
//go:noescape
func VcgeqS32(r *arm.Uint32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeqS64 VcgeqS64
//go:noescape
func VcgeqS64(r *arm.Uint64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeqU8 VcgeqU8
//go:noescape
func VcgeqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeqU16 VcgeqU16
//go:noescape
func VcgeqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeqU32 VcgeqU32
//go:noescape
func VcgeqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeqU64 VcgeqU64
//go:noescape
func VcgeqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)

// Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeqF32 VcgeqF32
//go:noescape
func VcgeqF32(r *arm.Uint32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeqF64 VcgeqF64
//go:noescape
func VcgeqF64(r *arm.Uint64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgesF32 VcgesF32
//go:noescape
func VcgesF32(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32)

// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezS8 VcgezS8
//go:noescape
func VcgezS8(r *arm.Uint8X8, v0 *arm.Int8X8)

// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezS16 VcgezS16
//go:noescape
func VcgezS16(r *arm.Uint16X4, v0 *arm.Int16X4)

// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezS32 VcgezS32
//go:noescape
func VcgezS32(r *arm.Uint32X2, v0 *arm.Int32X2)

// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezS64 VcgezS64
//go:noescape
func VcgezS64(r *arm.Uint64X1, v0 *arm.Int64X1)

// Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezF32 VcgezF32
//go:noescape
func VcgezF32(r *arm.Uint32X2, v0 *arm.Float32X2)

// Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezF64 VcgezF64
//go:noescape
func VcgezF64(r *arm.Uint64X1, v0 *arm.Float64X1)

// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezdS64 VcgezdS64
//go:noescape
func VcgezdS64(r *arm.Uint64, v0 *arm.Int64)

// Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezdF64 VcgezdF64
//go:noescape
func VcgezdF64(r *arm.Uint64, v0 *arm.Float64)

// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezqS8 VcgezqS8
//go:noescape
func VcgezqS8(r *arm.Uint8X16, v0 *arm.Int8X16)

// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezqS16 VcgezqS16
//go:noescape
func VcgezqS16(r *arm.Uint16X8, v0 *arm.Int16X8)

// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezqS32 VcgezqS32
//go:noescape
func VcgezqS32(r *arm.Uint32X4, v0 *arm.Int32X4)

// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezqS64 VcgezqS64
//go:noescape
func VcgezqS64(r *arm.Uint64X2, v0 *arm.Int64X2)

// Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezqF32 VcgezqF32
//go:noescape
func VcgezqF32(r *arm.Uint32X4, v0 *arm.Float32X4)

// Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezqF64 VcgezqF64
//go:noescape
func VcgezqF64(r *arm.Uint64X2, v0 *arm.Float64X2)

// Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezsF32 VcgezsF32
//go:noescape
func VcgezsF32(r *arm.Uint32, v0 *arm.Float32)

// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtS8 VcgtS8
//go:noescape
func VcgtS8(r *arm.Uint8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtS16 VcgtS16
//go:noescape
func VcgtS16(r *arm.Uint16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtS32 VcgtS32
//go:noescape
func VcgtS32(r *arm.Uint32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtS64 VcgtS64
//go:noescape
func VcgtS64(r *arm.Uint64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)

// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtU8 VcgtU8
//go:noescape
func VcgtU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtU16 VcgtU16
//go:noescape
func VcgtU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtU32 VcgtU32
//go:noescape
func VcgtU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtU64 VcgtU64
//go:noescape
func VcgtU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)

// Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtF32 VcgtF32
//go:noescape
func VcgtF32(r *arm.Uint32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtF64 VcgtF64
//go:noescape
func VcgtF64(r *arm.Uint64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)

// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtdS64 VcgtdS64
//go:noescape
func VcgtdS64(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64)

// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtdU64 VcgtdU64
//go:noescape
func VcgtdU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64)

// Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtdF64 VcgtdF64
//go:noescape
func VcgtdF64(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64)

// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtqS8 VcgtqS8
//go:noescape
func VcgtqS8(r *arm.Uint8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtqS16 VcgtqS16
//go:noescape
func VcgtqS16(r *arm.Uint16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtqS32 VcgtqS32
//go:noescape
func VcgtqS32(r *arm.Uint32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtqS64 VcgtqS64
//go:noescape
func VcgtqS64(r *arm.Uint64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtqU8 VcgtqU8
//go:noescape
func VcgtqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtqU16 VcgtqU16
//go:noescape
func VcgtqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtqU32 VcgtqU32
//go:noescape
func VcgtqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtqU64 VcgtqU64
//go:noescape
func VcgtqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)

// Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtqF32 VcgtqF32
//go:noescape
func VcgtqF32(r *arm.Uint32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtqF64 VcgtqF64
//go:noescape
func VcgtqF64(r *arm.Uint64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtsF32 VcgtsF32
//go:noescape
func VcgtsF32(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32)

// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzS8 VcgtzS8
//go:noescape
func VcgtzS8(r *arm.Uint8X8, v0 *arm.Int8X8)

// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzS16 VcgtzS16
//go:noescape
func VcgtzS16(r *arm.Uint16X4, v0 *arm.Int16X4)

// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzS32 VcgtzS32
//go:noescape
func VcgtzS32(r *arm.Uint32X2, v0 *arm.Int32X2)

// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzS64 VcgtzS64
//go:noescape
func VcgtzS64(r *arm.Uint64X1, v0 *arm.Int64X1)

// Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzF32 VcgtzF32
//go:noescape
func VcgtzF32(r *arm.Uint32X2, v0 *arm.Float32X2)

// Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzF64 VcgtzF64
//go:noescape
func VcgtzF64(r *arm.Uint64X1, v0 *arm.Float64X1)

// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzdS64 VcgtzdS64
//go:noescape
func VcgtzdS64(r *arm.Uint64, v0 *arm.Int64)

// Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzdF64 VcgtzdF64
//go:noescape
func VcgtzdF64(r *arm.Uint64, v0 *arm.Float64)

// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzqS8 VcgtzqS8
//go:noescape
func VcgtzqS8(r *arm.Uint8X16, v0 *arm.Int8X16)

// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzqS16 VcgtzqS16
//go:noescape
func VcgtzqS16(r *arm.Uint16X8, v0 *arm.Int16X8)

// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzqS32 VcgtzqS32
//go:noescape
func VcgtzqS32(r *arm.Uint32X4, v0 *arm.Int32X4)

// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzqS64 VcgtzqS64
//go:noescape
func VcgtzqS64(r *arm.Uint64X2, v0 *arm.Int64X2)

// Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzqF32 VcgtzqF32
//go:noescape
func VcgtzqF32(r *arm.Uint32X4, v0 *arm.Float32X4)

// Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzqF64 VcgtzqF64
//go:noescape
func VcgtzqF64(r *arm.Uint64X2, v0 *arm.Float64X2)

// Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzsF32 VcgtzsF32
//go:noescape
func VcgtzsF32(r *arm.Uint32, v0 *arm.Float32)

// Compare signed less than or equal
//
//go:linkname VcleS8 VcleS8
//go:noescape
func VcleS8(r *arm.Uint8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Compare signed less than or equal
//
//go:linkname VcleS16 VcleS16
//go:noescape
func VcleS16(r *arm.Uint16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Compare signed less than or equal
//
//go:linkname VcleS32 VcleS32
//go:noescape
func VcleS32(r *arm.Uint32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Compare signed less than or equal
//
//go:linkname VcleS64 VcleS64
//go:noescape
func VcleS64(r *arm.Uint64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)

// Compare unsigned less than or equal
//
//go:linkname VcleU8 VcleU8
//go:noescape
func VcleU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Compare unsigned less than or equal
//
//go:linkname VcleU16 VcleU16
//go:noescape
func VcleU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Compare unsigned less than or equal
//
//go:linkname VcleU32 VcleU32
//go:noescape
func VcleU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Compare unsigned less than or equal
//
//go:linkname VcleU64 VcleU64
//go:noescape
func VcleU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)

// Floating-point compare less than or equal
//
//go:linkname VcleF32 VcleF32
//go:noescape
func VcleF32(r *arm.Uint32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Floating-point compare less than or equal
//
//go:linkname VcleF64 VcleF64
//go:noescape
func VcleF64(r *arm.Uint64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)

// Compare signed less than or equal
//
//go:linkname VcledS64 VcledS64
//go:noescape
func VcledS64(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64)

// Compare unsigned less than or equal
//
//go:linkname VcledU64 VcledU64
//go:noescape
func VcledU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64)

// Floating-point compare less than or equal
//
//go:linkname VcledF64 VcledF64
//go:noescape
func VcledF64(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64)

// Compare signed less than or equal
//
//go:linkname VcleqS8 VcleqS8
//go:noescape
func VcleqS8(r *arm.Uint8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Compare signed less than or equal
//
//go:linkname VcleqS16 VcleqS16
//go:noescape
func VcleqS16(r *arm.Uint16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Compare signed less than or equal
//
//go:linkname VcleqS32 VcleqS32
//go:noescape
func VcleqS32(r *arm.Uint32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Compare signed less than or equal
//
//go:linkname VcleqS64 VcleqS64
//go:noescape
func VcleqS64(r *arm.Uint64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Compare unsigned less than or equal
//
//go:linkname VcleqU8 VcleqU8
//go:noescape
func VcleqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Compare unsigned less than or equal
//
//go:linkname VcleqU16 VcleqU16
//go:noescape
func VcleqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Compare unsigned less than or equal
//
//go:linkname VcleqU32 VcleqU32
//go:noescape
func VcleqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Compare unsigned less than or equal
//
//go:linkname VcleqU64 VcleqU64
//go:noescape
func VcleqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)

// Floating-point compare less than or equal
//
//go:linkname VcleqF32 VcleqF32
//go:noescape
func VcleqF32(r *arm.Uint32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Floating-point compare less than or equal
//
//go:linkname VcleqF64 VcleqF64
//go:noescape
func VcleqF64(r *arm.Uint64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Floating-point compare less than or equal
//
//go:linkname VclesF32 VclesF32
//go:noescape
func VclesF32(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32)

// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezS8 VclezS8
//go:noescape
func VclezS8(r *arm.Uint8X8, v0 *arm.Int8X8)

// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezS16 VclezS16
//go:noescape
func VclezS16(r *arm.Uint16X4, v0 *arm.Int16X4)

// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezS32 VclezS32
//go:noescape
func VclezS32(r *arm.Uint32X2, v0 *arm.Int32X2)

// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezS64 VclezS64
//go:noescape
func VclezS64(r *arm.Uint64X1, v0 *arm.Int64X1)

// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezF32 VclezF32
//go:noescape
func VclezF32(r *arm.Uint32X2, v0 *arm.Float32X2)

// Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezF64 VclezF64
//go:noescape
func VclezF64(r *arm.Uint64X1, v0 *arm.Float64X1)

// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezdS64 VclezdS64
//go:noescape
func VclezdS64(r *arm.Uint64, v0 *arm.Int64)

// Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezdF64 VclezdF64
//go:noescape
func VclezdF64(r *arm.Uint64, v0 *arm.Float64)

// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezqS8 VclezqS8
//go:noescape
func VclezqS8(r *arm.Uint8X16, v0 *arm.Int8X16)

// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezqS16 VclezqS16
//go:noescape
func VclezqS16(r *arm.Uint16X8, v0 *arm.Int16X8)

// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezqS32 VclezqS32
//go:noescape
func VclezqS32(r *arm.Uint32X4, v0 *arm.Int32X4)

// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezqS64 VclezqS64
//go:noescape
func VclezqS64(r *arm.Uint64X2, v0 *arm.Int64X2)

// Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezqF32 VclezqF32
//go:noescape
func VclezqF32(r *arm.Uint32X4, v0 *arm.Float32X4)

// Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezqF64 VclezqF64
//go:noescape
func VclezqF64(r *arm.Uint64X2, v0 *arm.Float64X2)

// Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezsF32 VclezsF32
//go:noescape
func VclezsF32(r *arm.Uint32, v0 *arm.Float32)

// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.
//
//go:linkname VclsS8 VclsS8
//go:noescape
func VclsS8(r *arm.Int8X8, v0 *arm.Int8X8)

// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.
//
//go:linkname VclsS16 VclsS16
//go:noescape
func VclsS16(r *arm.Int16X4, v0 *arm.Int16X4)

// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.
//
//go:linkname VclsS32 VclsS32
//go:noescape
func VclsS32(r *arm.Int32X2, v0 *arm.Int32X2)

// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.
//
//go:linkname VclsU8 VclsU8
//go:noescape
func VclsU8(r *arm.Int8X8, v0 *arm.Uint8X8)

// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.
//
//go:linkname VclsU16 VclsU16
//go:noescape
func VclsU16(r *arm.Int16X4, v0 *arm.Uint16X4)

// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.
//
//go:linkname VclsU32 VclsU32
//go:noescape
func VclsU32(r *arm.Int32X2, v0 *arm.Uint32X2)

// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.
//
//go:linkname VclsqS8 VclsqS8
//go:noescape
func VclsqS8(r *arm.Int8X16, v0 *arm.Int8X16)

// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.
//
//go:linkname VclsqS16 VclsqS16
//go:noescape
func VclsqS16(r *arm.Int16X8, v0 *arm.Int16X8)

// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.
//
//go:linkname VclsqS32 VclsqS32
//go:noescape
func VclsqS32(r *arm.Int32X4, v0 *arm.Int32X4)

// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.
//
//go:linkname VclsqU8 VclsqU8
//go:noescape
func VclsqU8(r *arm.Int8X16, v0 *arm.Uint8X16)

// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.
//
//go:linkname VclsqU16 VclsqU16
//go:noescape
func VclsqU16(r *arm.Int16X8, v0 *arm.Uint16X8)

// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.
//
//go:linkname VclsqU32 VclsqU32
//go:noescape
func VclsqU32(r *arm.Int32X4, v0 *arm.Uint32X4)

// Compare signed less than
//
//go:linkname VcltS8 VcltS8
//go:noescape
func VcltS8(r *arm.Uint8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Compare signed less than
//
//go:linkname VcltS16 VcltS16
//go:noescape
func VcltS16(r *arm.Uint16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Compare signed less than
//
//go:linkname VcltS32 VcltS32
//go:noescape
func VcltS32(r *arm.Uint32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Compare signed less than
//
//go:linkname VcltS64 VcltS64
//go:noescape
func VcltS64(r *arm.Uint64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)

// Compare unsigned less than
//
//go:linkname VcltU8 VcltU8
//go:noescape
func VcltU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Compare unsigned less than
//
//go:linkname VcltU16 VcltU16
//go:noescape
func VcltU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Compare unsigned less than
//
//go:linkname VcltU32 VcltU32
//go:noescape
func VcltU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Compare unsigned less than
//
//go:linkname VcltU64 VcltU64
//go:noescape
func VcltU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)

// Floating-point compare less than
//
//go:linkname VcltF32 VcltF32
//go:noescape
func VcltF32(r *arm.Uint32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Floating-point compare less than
//
//go:linkname VcltF64 VcltF64
//go:noescape
func VcltF64(r *arm.Uint64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)

// Compare signed less than
//
//go:linkname VcltdS64 VcltdS64
//go:noescape
func VcltdS64(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64)

// Compare unsigned less than
//
//go:linkname VcltdU64 VcltdU64
//go:noescape
func VcltdU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64)

// Floating-point compare less than
//
//go:linkname VcltdF64 VcltdF64
//go:noescape
func VcltdF64(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64)

// Compare signed less than
//
//go:linkname VcltqS8 VcltqS8
//go:noescape
func VcltqS8(r *arm.Uint8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Compare signed less than
//
//go:linkname VcltqS16 VcltqS16
//go:noescape
func VcltqS16(r *arm.Uint16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Compare signed less than
//
//go:linkname VcltqS32 VcltqS32
//go:noescape
func VcltqS32(r *arm.Uint32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Compare signed less than
//
//go:linkname VcltqS64 VcltqS64
//go:noescape
func VcltqS64(r *arm.Uint64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Compare unsigned less than
//
//go:linkname VcltqU8 VcltqU8
//go:noescape
func VcltqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Compare unsigned less than
//
//go:linkname VcltqU16 VcltqU16
//go:noescape
func VcltqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Compare unsigned less than
//
//go:linkname VcltqU32 VcltqU32
//go:noescape
func VcltqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Compare unsigned less than
//
//go:linkname VcltqU64 VcltqU64
//go:noescape
func VcltqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)

// Floating-point compare less than
//
//go:linkname VcltqF32 VcltqF32
//go:noescape
func VcltqF32(r *arm.Uint32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Floating-point compare less than
//
//go:linkname VcltqF64 VcltqF64
//go:noescape
func VcltqF64(r *arm.Uint64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Floating-point compare less than
//
//go:linkname VcltsF32 VcltsF32
//go:noescape
func VcltsF32(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32)

// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzS8 VcltzS8
//go:noescape
func VcltzS8(r *arm.Uint8X8, v0 *arm.Int8X8)

// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzS16 VcltzS16
//go:noescape
func VcltzS16(r *arm.Uint16X4, v0 *arm.Int16X4)

// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzS32 VcltzS32
//go:noescape
func VcltzS32(r *arm.Uint32X2, v0 *arm.Int32X2)

// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzS64 VcltzS64
//go:noescape
func VcltzS64(r *arm.Uint64X1, v0 *arm.Int64X1)

// Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzF32 VcltzF32
//go:noescape
func VcltzF32(r *arm.Uint32X2, v0 *arm.Float32X2)

// Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzF64 VcltzF64
//go:noescape
func VcltzF64(r *arm.Uint64X1, v0 *arm.Float64X1)

// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzdS64 VcltzdS64
//go:noescape
func VcltzdS64(r *arm.Uint64, v0 *arm.Int64)

// Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzdF64 VcltzdF64
//go:noescape
func VcltzdF64(r *arm.Uint64, v0 *arm.Float64)

// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzqS8 VcltzqS8
//go:noescape
func VcltzqS8(r *arm.Uint8X16, v0 *arm.Int8X16)

// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzqS16 VcltzqS16
//go:noescape
func VcltzqS16(r *arm.Uint16X8, v0 *arm.Int16X8)

// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzqS32 VcltzqS32
//go:noescape
func VcltzqS32(r *arm.Uint32X4, v0 *arm.Int32X4)

// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzqS64 VcltzqS64
//go:noescape
func VcltzqS64(r *arm.Uint64X2, v0 *arm.Int64X2)

// Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzqF32 VcltzqF32
//go:noescape
func VcltzqF32(r *arm.Uint32X4, v0 *arm.Float32X4)

// Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzqF64 VcltzqF64
//go:noescape
func VcltzqF64(r *arm.Uint64X2, v0 *arm.Float64X2)

// Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzsF32 VcltzsF32
//go:noescape
func VcltzsF32(r *arm.Uint32, v0 *arm.Float32)

// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VclzS8 VclzS8
//go:noescape
func VclzS8(r *arm.Int8X8, v0 *arm.Int8X8)

// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VclzS16 VclzS16
//go:noescape
func VclzS16(r *arm.Int16X4, v0 *arm.Int16X4)

// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VclzS32 VclzS32
//go:noescape
func VclzS32(r *arm.Int32X2, v0 *arm.Int32X2)

// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VclzU8 VclzU8
//go:noescape
func VclzU8(r *arm.Uint8X8, v0 *arm.Uint8X8)

// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VclzU16 VclzU16
//go:noescape
func VclzU16(r *arm.Uint16X4, v0 *arm.Uint16X4)

// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VclzU32 VclzU32
//go:noescape
func VclzU32(r *arm.Uint32X2, v0 *arm.Uint32X2)

// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VclzqS8 VclzqS8
//go:noescape
func VclzqS8(r *arm.Int8X16, v0 *arm.Int8X16)

// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VclzqS16 VclzqS16
//go:noescape
func VclzqS16(r *arm.Int16X8, v0 *arm.Int16X8)

// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VclzqS32 VclzqS32
//go:noescape
func VclzqS32(r *arm.Int32X4, v0 *arm.Int32X4)

// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VclzqU8 VclzqU8
//go:noescape
func VclzqU8(r *arm.Uint8X16, v0 *arm.Uint8X16)

// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VclzqU16 VclzqU16
//go:noescape
func VclzqU16(r *arm.Uint16X8, v0 *arm.Uint16X8)

// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VclzqU32 VclzqU32
//go:noescape
func VclzqU32(r *arm.Uint32X4, v0 *arm.Uint32X4)

// Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VcntS8 VcntS8
//go:noescape
func VcntS8(r *arm.Int8X8, v0 *arm.Int8X8)

// Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VcntU8 VcntU8
//go:noescape
func VcntU8(r *arm.Uint8X8, v0 *arm.Uint8X8)

// Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VcntP8 VcntP8
//go:noescape
func VcntP8(r *arm.Poly8X8, v0 *arm.Poly8X8)

// Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VcntqS8 VcntqS8
//go:noescape
func VcntqS8(r *arm.Int8X16, v0 *arm.Int8X16)

// Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VcntqU8 VcntqU8
//go:noescape
func VcntqU8(r *arm.Uint8X16, v0 *arm.Uint8X16)

// Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VcntqP8 VcntqP8
//go:noescape
func VcntqP8(r *arm.Poly8X16, v0 *arm.Poly8X16)

// Join two smaller vectors into a single larger vector
//
//go:linkname VcombineS8 VcombineS8
//go:noescape
func VcombineS8(r *arm.Int8X16, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Join two smaller vectors into a single larger vector
//
//go:linkname VcombineS16 VcombineS16
//go:noescape
func VcombineS16(r *arm.Int16X8, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Join two smaller vectors into a single larger vector
//
//go:linkname VcombineS32 VcombineS32
//go:noescape
func VcombineS32(r *arm.Int32X4, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Join two smaller vectors into a single larger vector
//
//go:linkname VcombineS64 VcombineS64
//go:noescape
func VcombineS64(r *arm.Int64X2, v0 *arm.Int64X1, v1 *arm.Int64X1)

// Join two smaller vectors into a single larger vector
//
//go:linkname VcombineU8 VcombineU8
//go:noescape
func VcombineU8(r *arm.Uint8X16, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Join two smaller vectors into a single larger vector
//
//go:linkname VcombineU16 VcombineU16
//go:noescape
func VcombineU16(r *arm.Uint16X8, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Join two smaller vectors into a single larger vector
//
//go:linkname VcombineU32 VcombineU32
//go:noescape
func VcombineU32(r *arm.Uint32X4, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Join two smaller vectors into a single larger vector
//
//go:linkname VcombineU64 VcombineU64
//go:noescape
func VcombineU64(r *arm.Uint64X2, v0 *arm.Uint64X1, v1 *arm.Uint64X1)

// Join two smaller vectors into a single larger vector
//
//go:linkname VcombineF32 VcombineF32
//go:noescape
func VcombineF32(r *arm.Float32X4, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Join two smaller vectors into a single larger vector
//
//go:linkname VcombineF64 VcombineF64
//go:noescape
func VcombineF64(r *arm.Float64X2, v0 *arm.Float64X1, v1 *arm.Float64X1)

// Join two smaller vectors into a single larger vector
//
//go:linkname VcombineP16 VcombineP16
//go:noescape
func VcombineP16(r *arm.Poly16X8, v0 *arm.Poly16X4, v1 *arm.Poly16X4)

// Join two smaller vectors into a single larger vector
//
//go:linkname VcombineP64 VcombineP64
//go:noescape
func VcombineP64(r *arm.Poly64X2, v0 *arm.Poly64X1, v1 *arm.Poly64X1)

// Join two smaller vectors into a single larger vector
//
//go:linkname VcombineP8 VcombineP8
//go:noescape
func VcombineP8(r *arm.Poly8X16, v0 *arm.Poly8X8, v1 *arm.Poly8X8)

// Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtF32S32 VcvtF32S32
//go:noescape
func VcvtF32S32(r *arm.Float32X2, v0 *arm.Int32X2)

// Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtF32U32 VcvtF32U32
//go:noescape
func VcvtF32U32(r *arm.Float32X2, v0 *arm.Uint32X2)

// Floating-point Convert to lower precision Narrow (vector). This instruction reads each vector element in the SIMD&FP source register, converts each result to half the precision of the source element, writes the final result to a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. The rounding mode is determined by the FPCR.
//
//go:linkname VcvtF32F64 VcvtF32F64
//go:noescape
func VcvtF32F64(r *arm.Float32X2, v0 *arm.Float64X2)

// Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtF64S64 VcvtF64S64
//go:noescape
func VcvtF64S64(r *arm.Float64X1, v0 *arm.Int64X1)

// Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtF64U64 VcvtF64U64
//go:noescape
func VcvtF64U64(r *arm.Float64X1, v0 *arm.Uint64X1)

// Floating-point Convert to higher precision Long (vector). This instruction reads each element in a vector in the SIMD&FP source register, converts each value to double the precision of the source element using the rounding mode that is determined by the FPCR, and writes each result to the equivalent element of the vector in the SIMD&FP destination register.
//
//go:linkname VcvtF64F32 VcvtF64F32
//go:noescape
func VcvtF64F32(r *arm.Float64X2, v0 *arm.Float32X2)

// Floating-point Convert to lower precision Narrow (vector). This instruction reads each vector element in the SIMD&FP source register, converts each result to half the precision of the source element, writes the final result to a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. The rounding mode is determined by the FPCR.
//
//go:linkname VcvtHighF32F64 VcvtHighF32F64
//go:noescape
func VcvtHighF32F64(r *arm.Float32X4, v0 *arm.Float32X2, v1 *arm.Float64X2)

// Floating-point Convert to higher precision Long (vector). This instruction reads each element in a vector in the SIMD&FP source register, converts each value to double the precision of the source element using the rounding mode that is determined by the FPCR, and writes each result to the equivalent element of the vector in the SIMD&FP destination register.
//
//go:linkname VcvtHighF64F32 VcvtHighF64F32
//go:noescape
func VcvtHighF64F32(r *arm.Float64X2, v0 *arm.Float32X4)

// Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtS32F32 VcvtS32F32
//go:noescape
func VcvtS32F32(r *arm.Int32X2, v0 *arm.Float32X2)

// Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtS64F64 VcvtS64F64
//go:noescape
func VcvtS64F64(r *arm.Int64X1, v0 *arm.Float64X1)

// Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.
//
//go:linkname VcvtU32F32 VcvtU32F32
//go:noescape
func VcvtU32F32(r *arm.Uint32X2, v0 *arm.Float32X2)

// Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.
//
//go:linkname VcvtU64F64 VcvtU64F64
//go:noescape
func VcvtU64F64(r *arm.Uint64X1, v0 *arm.Float64X1)

// Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtaS32F32 VcvtaS32F32
//go:noescape
func VcvtaS32F32(r *arm.Int32X2, v0 *arm.Float32X2)

// Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtaS64F64 VcvtaS64F64
//go:noescape
func VcvtaS64F64(r *arm.Int64X1, v0 *arm.Float64X1)

// Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtaU32F32 VcvtaU32F32
//go:noescape
func VcvtaU32F32(r *arm.Uint32X2, v0 *arm.Float32X2)

// Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtaU64F64 VcvtaU64F64
//go:noescape
func VcvtaU64F64(r *arm.Uint64X1, v0 *arm.Float64X1)

// Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtadS64F64 VcvtadS64F64
//go:noescape
func VcvtadS64F64(r *arm.Int64, v0 *arm.Float64)

// Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtadU64F64 VcvtadU64F64
//go:noescape
func VcvtadU64F64(r *arm.Uint64, v0 *arm.Float64)

// Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtaqS32F32 VcvtaqS32F32
//go:noescape
func VcvtaqS32F32(r *arm.Int32X4, v0 *arm.Float32X4)

// Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtaqS64F64 VcvtaqS64F64
//go:noescape
func VcvtaqS64F64(r *arm.Int64X2, v0 *arm.Float64X2)

// Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtaqU32F32 VcvtaqU32F32
//go:noescape
func VcvtaqU32F32(r *arm.Uint32X4, v0 *arm.Float32X4)

// Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtaqU64F64 VcvtaqU64F64
//go:noescape
func VcvtaqU64F64(r *arm.Uint64X2, v0 *arm.Float64X2)

// Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtasS32F32 VcvtasS32F32
//go:noescape
func VcvtasS32F32(r *arm.Int32, v0 *arm.Float32)

// Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtasU32F32 VcvtasU32F32
//go:noescape
func VcvtasU32F32(r *arm.Uint32, v0 *arm.Float32)

// Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtdF64S64 VcvtdF64S64
//go:noescape
func VcvtdF64S64(r *arm.Float64, v0 *arm.Int64)

// Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtdF64U64 VcvtdF64U64
//go:noescape
func VcvtdF64U64(r *arm.Float64, v0 *arm.Uint64)

// Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtdS64F64 VcvtdS64F64
//go:noescape
func VcvtdS64F64(r *arm.Int64, v0 *arm.Float64)

// Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.
//
//go:linkname VcvtdU64F64 VcvtdU64F64
//go:noescape
func VcvtdU64F64(r *arm.Uint64, v0 *arm.Float64)

// Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtmS32F32 VcvtmS32F32
//go:noescape
func VcvtmS32F32(r *arm.Int32X2, v0 *arm.Float32X2)

// Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtmS64F64 VcvtmS64F64
//go:noescape
func VcvtmS64F64(r *arm.Int64X1, v0 *arm.Float64X1)

// Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtmU32F32 VcvtmU32F32
//go:noescape
func VcvtmU32F32(r *arm.Uint32X2, v0 *arm.Float32X2)

// Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtmU64F64 VcvtmU64F64
//go:noescape
func VcvtmU64F64(r *arm.Uint64X1, v0 *arm.Float64X1)

// Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtmdS64F64 VcvtmdS64F64
//go:noescape
func VcvtmdS64F64(r *arm.Int64, v0 *arm.Float64)

// Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtmdU64F64 VcvtmdU64F64
//go:noescape
func VcvtmdU64F64(r *arm.Uint64, v0 *arm.Float64)

// Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtmqS32F32 VcvtmqS32F32
//go:noescape
func VcvtmqS32F32(r *arm.Int32X4, v0 *arm.Float32X4)

// Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtmqS64F64 VcvtmqS64F64
//go:noescape
func VcvtmqS64F64(r *arm.Int64X2, v0 *arm.Float64X2)

// Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtmqU32F32 VcvtmqU32F32
//go:noescape
func VcvtmqU32F32(r *arm.Uint32X4, v0 *arm.Float32X4)

// Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtmqU64F64 VcvtmqU64F64
//go:noescape
func VcvtmqU64F64(r *arm.Uint64X2, v0 *arm.Float64X2)

// Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtmsS32F32 VcvtmsS32F32
//go:noescape
func VcvtmsS32F32(r *arm.Int32, v0 *arm.Float32)

// Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtmsU32F32 VcvtmsU32F32
//go:noescape
func VcvtmsU32F32(r *arm.Uint32, v0 *arm.Float32)

// Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtnS32F32 VcvtnS32F32
//go:noescape
func VcvtnS32F32(r *arm.Int32X2, v0 *arm.Float32X2)

// Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtnS64F64 VcvtnS64F64
//go:noescape
func VcvtnS64F64(r *arm.Int64X1, v0 *arm.Float64X1)

// Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtnU32F32 VcvtnU32F32
//go:noescape
func VcvtnU32F32(r *arm.Uint32X2, v0 *arm.Float32X2)

// Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtnU64F64 VcvtnU64F64
//go:noescape
func VcvtnU64F64(r *arm.Uint64X1, v0 *arm.Float64X1)

// Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtndS64F64 VcvtndS64F64
//go:noescape
func VcvtndS64F64(r *arm.Int64, v0 *arm.Float64)

// Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtndU64F64 VcvtndU64F64
//go:noescape
func VcvtndU64F64(r *arm.Uint64, v0 *arm.Float64)

// Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtnqS32F32 VcvtnqS32F32
//go:noescape
func VcvtnqS32F32(r *arm.Int32X4, v0 *arm.Float32X4)

// Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtnqS64F64 VcvtnqS64F64
//go:noescape
func VcvtnqS64F64(r *arm.Int64X2, v0 *arm.Float64X2)

// Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtnqU32F32 VcvtnqU32F32
//go:noescape
func VcvtnqU32F32(r *arm.Uint32X4, v0 *arm.Float32X4)

// Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtnqU64F64 VcvtnqU64F64
//go:noescape
func VcvtnqU64F64(r *arm.Uint64X2, v0 *arm.Float64X2)

// Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtnsS32F32 VcvtnsS32F32
//go:noescape
func VcvtnsS32F32(r *arm.Int32, v0 *arm.Float32)

// Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtnsU32F32 VcvtnsU32F32
//go:noescape
func VcvtnsU32F32(r *arm.Uint32, v0 *arm.Float32)

// Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtpS32F32 VcvtpS32F32
//go:noescape
func VcvtpS32F32(r *arm.Int32X2, v0 *arm.Float32X2)

// Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtpS64F64 VcvtpS64F64
//go:noescape
func VcvtpS64F64(r *arm.Int64X1, v0 *arm.Float64X1)

// Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtpU32F32 VcvtpU32F32
//go:noescape
func VcvtpU32F32(r *arm.Uint32X2, v0 *arm.Float32X2)

// Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtpU64F64 VcvtpU64F64
//go:noescape
func VcvtpU64F64(r *arm.Uint64X1, v0 *arm.Float64X1)

// Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtpdS64F64 VcvtpdS64F64
//go:noescape
func VcvtpdS64F64(r *arm.Int64, v0 *arm.Float64)

// Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtpdU64F64 VcvtpdU64F64
//go:noescape
func VcvtpdU64F64(r *arm.Uint64, v0 *arm.Float64)

// Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtpqS32F32 VcvtpqS32F32
//go:noescape
func VcvtpqS32F32(r *arm.Int32X4, v0 *arm.Float32X4)

// Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtpqS64F64 VcvtpqS64F64
//go:noescape
func VcvtpqS64F64(r *arm.Int64X2, v0 *arm.Float64X2)

// Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtpqU32F32 VcvtpqU32F32
//go:noescape
func VcvtpqU32F32(r *arm.Uint32X4, v0 *arm.Float32X4)

// Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtpqU64F64 VcvtpqU64F64
//go:noescape
func VcvtpqU64F64(r *arm.Uint64X2, v0 *arm.Float64X2)

// Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtpsS32F32 VcvtpsS32F32
//go:noescape
func VcvtpsS32F32(r *arm.Int32, v0 *arm.Float32)

// Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtpsU32F32 VcvtpsU32F32
//go:noescape
func VcvtpsU32F32(r *arm.Uint32, v0 *arm.Float32)

// Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtqF32S32 VcvtqF32S32
//go:noescape
func VcvtqF32S32(r *arm.Float32X4, v0 *arm.Int32X4)

// Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtqF32U32 VcvtqF32U32
//go:noescape
func VcvtqF32U32(r *arm.Float32X4, v0 *arm.Uint32X4)

// Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtqF64S64 VcvtqF64S64
//go:noescape
func VcvtqF64S64(r *arm.Float64X2, v0 *arm.Int64X2)

// Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtqF64U64 VcvtqF64U64
//go:noescape
func VcvtqF64U64(r *arm.Float64X2, v0 *arm.Uint64X2)

// Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtqS32F32 VcvtqS32F32
//go:noescape
func VcvtqS32F32(r *arm.Int32X4, v0 *arm.Float32X4)

// Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtqS64F64 VcvtqS64F64
//go:noescape
func VcvtqS64F64(r *arm.Int64X2, v0 *arm.Float64X2)

// Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.
//
//go:linkname VcvtqU32F32 VcvtqU32F32
//go:noescape
func VcvtqU32F32(r *arm.Uint32X4, v0 *arm.Float32X4)

// Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.
//
//go:linkname VcvtqU64F64 VcvtqU64F64
//go:noescape
func VcvtqU64F64(r *arm.Uint64X2, v0 *arm.Float64X2)

// Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtsF32S32 VcvtsF32S32
//go:noescape
func VcvtsF32S32(r *arm.Float32, v0 *arm.Int32)

// Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtsF32U32 VcvtsF32U32
//go:noescape
func VcvtsF32U32(r *arm.Float32, v0 *arm.Uint32)

// Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtsS32F32 VcvtsS32F32
//go:noescape
func VcvtsS32F32(r *arm.Int32, v0 *arm.Float32)

// Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.
//
//go:linkname VcvtsU32F32 VcvtsU32F32
//go:noescape
func VcvtsU32F32(r *arm.Uint32, v0 *arm.Float32)

// Floating-point Convert to lower precision Narrow, rounding to odd (vector). This instruction reads each vector element in the source SIMD&FP register, narrows each value to half the precision of the source element using the Round to Odd rounding mode, writes the result to a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VcvtxF32F64 VcvtxF32F64
//go:noescape
func VcvtxF32F64(r *arm.Float32X2, v0 *arm.Float64X2)

// Floating-point Convert to lower precision Narrow, rounding to odd (vector). This instruction reads each vector element in the source SIMD&FP register, narrows each value to half the precision of the source element using the Round to Odd rounding mode, writes the result to a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VcvtxHighF32F64 VcvtxHighF32F64
//go:noescape
func VcvtxHighF32F64(r *arm.Float32X4, v0 *arm.Float32X2, v1 *arm.Float64X2)

// Floating-point Convert to lower precision Narrow, rounding to odd (vector). This instruction reads each vector element in the source SIMD&FP register, narrows each value to half the precision of the source element using the Round to Odd rounding mode, writes the result to a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VcvtxdF32F64 VcvtxdF32F64
//go:noescape
func VcvtxdF32F64(r *arm.Float32, v0 *arm.Float64)

// Floating-point Divide (vector). This instruction divides the floating-point values in the elements in the first source SIMD&FP register, by the floating-point values in the corresponding elements in the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VdivF32 VdivF32
//go:noescape
func VdivF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Floating-point Divide (vector). This instruction divides the floating-point values in the elements in the first source SIMD&FP register, by the floating-point values in the corresponding elements in the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VdivF64 VdivF64
//go:noescape
func VdivF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)

// Floating-point Divide (vector). This instruction divides the floating-point values in the elements in the first source SIMD&FP register, by the floating-point values in the corresponding elements in the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VdivqF32 VdivqF32
//go:noescape
func VdivqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Floating-point Divide (vector). This instruction divides the floating-point values in the elements in the first source SIMD&FP register, by the floating-point values in the corresponding elements in the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VdivqF64 VdivqF64
//go:noescape
func VdivqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Dot Product signed arithmetic (vector). This instruction performs the dot product of the four signed 8-bit elements in each 32-bit element of the first source register with the four signed 8-bit elements of the corresponding 32-bit element in the second source register, accumulating the result into the corresponding 32-bit element of the destination register.
//
//go:linkname VdotS32 VdotS32
//go:noescape
func VdotS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int8X8, v2 *arm.Int8X8)

// Dot Product unsigned arithmetic (vector). This instruction performs the dot product of the four unsigned 8-bit elements in each 32-bit element of the first source register with the four unsigned 8-bit elements of the corresponding 32-bit element in the second source register, accumulating the result into the corresponding 32-bit element of the destination register.
//
//go:linkname VdotU32 VdotU32
//go:noescape
func VdotU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint8X8, v2 *arm.Uint8X8)

// Dot Product signed arithmetic (vector). This instruction performs the dot product of the four signed 8-bit elements in each 32-bit element of the first source register with the four signed 8-bit elements of the corresponding 32-bit element in the second source register, accumulating the result into the corresponding 32-bit element of the destination register.
//
//go:linkname VdotqS32 VdotqS32
//go:noescape
func VdotqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int8X16, v2 *arm.Int8X16)

// Dot Product unsigned arithmetic (vector). This instruction performs the dot product of the four unsigned 8-bit elements in each 32-bit element of the first source register with the four unsigned 8-bit elements of the corresponding 32-bit element in the second source register, accumulating the result into the corresponding 32-bit element of the destination register.
//
//go:linkname VdotqU32 VdotqU32
//go:noescape
func VdotqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint8X16, v2 *arm.Uint8X16)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupNS8 VdupNS8
//go:noescape
func VdupNS8(r *arm.Int8X8, v0 *arm.Int8)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupNS16 VdupNS16
//go:noescape
func VdupNS16(r *arm.Int16X4, v0 *arm.Int16)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupNS32 VdupNS32
//go:noescape
func VdupNS32(r *arm.Int32X2, v0 *arm.Int32)

// Insert vector element from another vector element. This instruction copies the vector element of the source SIMD&FP register to the specified vector element of the destination SIMD&FP register.
//
//go:linkname VdupNS64 VdupNS64
//go:noescape
func VdupNS64(r *arm.Int64X1, v0 *arm.Int64)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupNU8 VdupNU8
//go:noescape
func VdupNU8(r *arm.Uint8X8, v0 *arm.Uint8)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupNU16 VdupNU16
//go:noescape
func VdupNU16(r *arm.Uint16X4, v0 *arm.Uint16)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupNU32 VdupNU32
//go:noescape
func VdupNU32(r *arm.Uint32X2, v0 *arm.Uint32)

// Insert vector element from another vector element. This instruction copies the vector element of the source SIMD&FP register to the specified vector element of the destination SIMD&FP register.
//
//go:linkname VdupNU64 VdupNU64
//go:noescape
func VdupNU64(r *arm.Uint64X1, v0 *arm.Uint64)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupNF32 VdupNF32
//go:noescape
func VdupNF32(r *arm.Float32X2, v0 *arm.Float32)

// Insert vector element from another vector element. This instruction copies the vector element of the source SIMD&FP register to the specified vector element of the destination SIMD&FP register.
//
//go:linkname VdupNF64 VdupNF64
//go:noescape
func VdupNF64(r *arm.Float64X1, v0 *arm.Float64)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupNP16 VdupNP16
//go:noescape
func VdupNP16(r *arm.Poly16X4, v0 *arm.Poly16)

// Insert vector element from another vector element. This instruction copies the vector element of the source SIMD&FP register to the specified vector element of the destination SIMD&FP register.
//
//go:linkname VdupNP64 VdupNP64
//go:noescape
func VdupNP64(r *arm.Poly64X1, v0 *arm.Poly64)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupNP8 VdupNP8
//go:noescape
func VdupNP8(r *arm.Poly8X8, v0 *arm.Poly8)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupqNS8 VdupqNS8
//go:noescape
func VdupqNS8(r *arm.Int8X16, v0 *arm.Int8)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupqNS16 VdupqNS16
//go:noescape
func VdupqNS16(r *arm.Int16X8, v0 *arm.Int16)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupqNS32 VdupqNS32
//go:noescape
func VdupqNS32(r *arm.Int32X4, v0 *arm.Int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupqNS64 VdupqNS64
//go:noescape
func VdupqNS64(r *arm.Int64X2, v0 *arm.Int64)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupqNU8 VdupqNU8
//go:noescape
func VdupqNU8(r *arm.Uint8X16, v0 *arm.Uint8)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupqNU16 VdupqNU16
//go:noescape
func VdupqNU16(r *arm.Uint16X8, v0 *arm.Uint16)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupqNU32 VdupqNU32
//go:noescape
func VdupqNU32(r *arm.Uint32X4, v0 *arm.Uint32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupqNU64 VdupqNU64
//go:noescape
func VdupqNU64(r *arm.Uint64X2, v0 *arm.Uint64)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupqNF32 VdupqNF32
//go:noescape
func VdupqNF32(r *arm.Float32X4, v0 *arm.Float32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupqNF64 VdupqNF64
//go:noescape
func VdupqNF64(r *arm.Float64X2, v0 *arm.Float64)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupqNP16 VdupqNP16
//go:noescape
func VdupqNP16(r *arm.Poly16X8, v0 *arm.Poly16)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupqNP64 VdupqNP64
//go:noescape
func VdupqNP64(r *arm.Poly64X2, v0 *arm.Poly64)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupqNP8 VdupqNP8
//go:noescape
func VdupqNP8(r *arm.Poly8X16, v0 *arm.Poly8)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorS8 VeorS8
//go:noescape
func VeorS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorS16 VeorS16
//go:noescape
func VeorS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorS32 VeorS32
//go:noescape
func VeorS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorS64 VeorS64
//go:noescape
func VeorS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorU8 VeorU8
//go:noescape
func VeorU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorU16 VeorU16
//go:noescape
func VeorU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorU32 VeorU32
//go:noescape
func VeorU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorU64 VeorU64
//go:noescape
func VeorU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)

// Three-way Exclusive OR performs a three-way exclusive OR of the values in the three source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname Veor3QS8 Veor3QS8
//go:noescape
func Veor3QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16, v2 *arm.Int8X16)

// Three-way Exclusive OR performs a three-way exclusive OR of the values in the three source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname Veor3QS16 Veor3QS16
//go:noescape
func Veor3QS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8, v2 *arm.Int16X8)

// Three-way Exclusive OR performs a three-way exclusive OR of the values in the three source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname Veor3QS32 Veor3QS32
//go:noescape
func Veor3QS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4, v2 *arm.Int32X4)

// Three-way Exclusive OR performs a three-way exclusive OR of the values in the three source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname Veor3QS64 Veor3QS64
//go:noescape
func Veor3QS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2, v2 *arm.Int64X2)

// Three-way Exclusive OR performs a three-way exclusive OR of the values in the three source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname Veor3QU8 Veor3QU8
//go:noescape
func Veor3QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16, v2 *arm.Uint8X16)

// Three-way Exclusive OR performs a three-way exclusive OR of the values in the three source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname Veor3QU16 Veor3QU16
//go:noescape
func Veor3QU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8)

// Three-way Exclusive OR performs a three-way exclusive OR of the values in the three source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname Veor3QU32 Veor3QU32
//go:noescape
func Veor3QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)

// Three-way Exclusive OR performs a three-way exclusive OR of the values in the three source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname Veor3QU64 Veor3QU64
//go:noescape
func Veor3QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorqS8 VeorqS8
//go:noescape
func VeorqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorqS16 VeorqS16
//go:noescape
func VeorqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorqS32 VeorqS32
//go:noescape
func VeorqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorqS64 VeorqS64
//go:noescape
func VeorqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorqU8 VeorqU8
//go:noescape
func VeorqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorqU16 VeorqU16
//go:noescape
func VeorqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorqU32 VeorqU32
//go:noescape
func VeorqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorqU64 VeorqU64
//go:noescape
func VeorqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)

// Floating-point fused Multiply-Add to accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, adds the product to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VfmaF32 VfmaF32
//go:noescape
func VfmaF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2, v2 *arm.Float32X2)

// Floating-point fused Multiply-Add (scalar). This instruction multiplies the values of the first two SIMD&FP source registers, adds the product to the value of the third SIMD&FP source register, and writes the result to the SIMD&FP destination register.
//
//go:linkname VfmaF64 VfmaF64
//go:noescape
func VfmaF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1, v2 *arm.Float64X1)

// Floating-point fused Multiply-Add to accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, adds the product to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VfmaNF32 VfmaNF32
//go:noescape
func VfmaNF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2, v2 *arm.Float32)

// Floating-point fused Multiply-Add (scalar). This instruction multiplies the values of the first two SIMD&FP source registers, adds the product to the value of the third SIMD&FP source register, and writes the result to the SIMD&FP destination register.
//
//go:linkname VfmaNF64 VfmaNF64
//go:noescape
func VfmaNF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1, v2 *arm.Float64)

// Floating-point fused Multiply-Add to accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, adds the product to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VfmaqF32 VfmaqF32
//go:noescape
func VfmaqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4, v2 *arm.Float32X4)

// Floating-point fused Multiply-Add to accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, adds the product to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VfmaqF64 VfmaqF64
//go:noescape
func VfmaqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2, v2 *arm.Float64X2)

// Floating-point fused Multiply-Add to accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, adds the product to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VfmaqNF32 VfmaqNF32
//go:noescape
func VfmaqNF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4, v2 *arm.Float32)

// Floating-point fused Multiply-Add to accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, adds the product to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VfmaqNF64 VfmaqNF64
//go:noescape
func VfmaqNF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2, v2 *arm.Float64)

// Floating-point fused Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, negates the product, adds the result to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VfmsF32 VfmsF32
//go:noescape
func VfmsF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2, v2 *arm.Float32X2)

// Floating-point Fused Multiply-Subtract (scalar). This instruction multiplies the values of the first two SIMD&FP source registers, negates the product, adds that to the value of the third SIMD&FP source register, and writes the result to the SIMD&FP destination register.
//
//go:linkname VfmsF64 VfmsF64
//go:noescape
func VfmsF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1, v2 *arm.Float64X1)

// Floating-point fused Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, negates the product, adds the result to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VfmsNF32 VfmsNF32
//go:noescape
func VfmsNF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2, v2 *arm.Float32)

// Floating-point Fused Multiply-Subtract (scalar). This instruction multiplies the values of the first two SIMD&FP source registers, negates the product, adds that to the value of the third SIMD&FP source register, and writes the result to the SIMD&FP destination register.
//
//go:linkname VfmsNF64 VfmsNF64
//go:noescape
func VfmsNF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1, v2 *arm.Float64)

// Floating-point fused Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, negates the product, adds the result to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VfmsqF32 VfmsqF32
//go:noescape
func VfmsqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4, v2 *arm.Float32X4)

// Floating-point fused Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, negates the product, adds the result to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VfmsqF64 VfmsqF64
//go:noescape
func VfmsqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2, v2 *arm.Float64X2)

// Floating-point fused Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, negates the product, adds the result to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VfmsqNF32 VfmsqNF32
//go:noescape
func VfmsqNF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4, v2 *arm.Float32)

// Floating-point fused Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, negates the product, adds the result to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VfmsqNF64 VfmsqNF64
//go:noescape
func VfmsqNF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2, v2 *arm.Float64)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetHighS8 VgetHighS8
//go:noescape
func VgetHighS8(r *arm.Int8X8, v0 *arm.Int8X16)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetHighS16 VgetHighS16
//go:noescape
func VgetHighS16(r *arm.Int16X4, v0 *arm.Int16X8)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetHighS32 VgetHighS32
//go:noescape
func VgetHighS32(r *arm.Int32X2, v0 *arm.Int32X4)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetHighS64 VgetHighS64
//go:noescape
func VgetHighS64(r *arm.Int64X1, v0 *arm.Int64X2)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetHighU8 VgetHighU8
//go:noescape
func VgetHighU8(r *arm.Uint8X8, v0 *arm.Uint8X16)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetHighU16 VgetHighU16
//go:noescape
func VgetHighU16(r *arm.Uint16X4, v0 *arm.Uint16X8)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetHighU32 VgetHighU32
//go:noescape
func VgetHighU32(r *arm.Uint32X2, v0 *arm.Uint32X4)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetHighU64 VgetHighU64
//go:noescape
func VgetHighU64(r *arm.Uint64X1, v0 *arm.Uint64X2)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetHighF32 VgetHighF32
//go:noescape
func VgetHighF32(r *arm.Float32X2, v0 *arm.Float32X4)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetHighF64 VgetHighF64
//go:noescape
func VgetHighF64(r *arm.Float64X1, v0 *arm.Float64X2)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetHighP16 VgetHighP16
//go:noescape
func VgetHighP16(r *arm.Poly16X4, v0 *arm.Poly16X8)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetHighP64 VgetHighP64
//go:noescape
func VgetHighP64(r *arm.Poly64X1, v0 *arm.Poly64X2)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetHighP8 VgetHighP8
//go:noescape
func VgetHighP8(r *arm.Poly8X8, v0 *arm.Poly8X16)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetLowS8 VgetLowS8
//go:noescape
func VgetLowS8(r *arm.Int8X8, v0 *arm.Int8X16)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetLowS16 VgetLowS16
//go:noescape
func VgetLowS16(r *arm.Int16X4, v0 *arm.Int16X8)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetLowS32 VgetLowS32
//go:noescape
func VgetLowS32(r *arm.Int32X2, v0 *arm.Int32X4)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetLowS64 VgetLowS64
//go:noescape
func VgetLowS64(r *arm.Int64X1, v0 *arm.Int64X2)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetLowU8 VgetLowU8
//go:noescape
func VgetLowU8(r *arm.Uint8X8, v0 *arm.Uint8X16)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetLowU16 VgetLowU16
//go:noescape
func VgetLowU16(r *arm.Uint16X4, v0 *arm.Uint16X8)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetLowU32 VgetLowU32
//go:noescape
func VgetLowU32(r *arm.Uint32X2, v0 *arm.Uint32X4)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetLowU64 VgetLowU64
//go:noescape
func VgetLowU64(r *arm.Uint64X1, v0 *arm.Uint64X2)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetLowF32 VgetLowF32
//go:noescape
func VgetLowF32(r *arm.Float32X2, v0 *arm.Float32X4)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetLowF64 VgetLowF64
//go:noescape
func VgetLowF64(r *arm.Float64X1, v0 *arm.Float64X2)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetLowP16 VgetLowP16
//go:noescape
func VgetLowP16(r *arm.Poly16X4, v0 *arm.Poly16X8)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetLowP64 VgetLowP64
//go:noescape
func VgetLowP64(r *arm.Poly64X1, v0 *arm.Poly64X2)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetLowP8 VgetLowP8
//go:noescape
func VgetLowP8(r *arm.Poly8X8, v0 *arm.Poly8X16)

// Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhaddS8 VhaddS8
//go:noescape
func VhaddS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhaddS16 VhaddS16
//go:noescape
func VhaddS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhaddS32 VhaddS32
//go:noescape
func VhaddS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhaddU8 VhaddU8
//go:noescape
func VhaddU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhaddU16 VhaddU16
//go:noescape
func VhaddU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhaddU32 VhaddU32
//go:noescape
func VhaddU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhaddqS8 VhaddqS8
//go:noescape
func VhaddqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhaddqS16 VhaddqS16
//go:noescape
func VhaddqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhaddqS32 VhaddqS32
//go:noescape
func VhaddqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhaddqU8 VhaddqU8
//go:noescape
func VhaddqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhaddqU16 VhaddqU16
//go:noescape
func VhaddqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhaddqU32 VhaddqU32
//go:noescape
func VhaddqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhsubS8 VhsubS8
//go:noescape
func VhsubS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhsubS16 VhsubS16
//go:noescape
func VhsubS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhsubS32 VhsubS32
//go:noescape
func VhsubS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhsubU8 VhsubU8
//go:noescape
func VhsubU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhsubU16 VhsubU16
//go:noescape
func VhsubU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhsubU32 VhsubU32
//go:noescape
func VhsubU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhsubqS8 VhsubqS8
//go:noescape
func VhsubqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhsubqS16 VhsubqS16
//go:noescape
func VhsubqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhsubqS32 VhsubqS32
//go:noescape
func VhsubqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhsubqU8 VhsubqU8
//go:noescape
func VhsubqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhsubqU16 VhsubqU16
//go:noescape
func VhsubqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhsubqU32 VhsubqU32
//go:noescape
func VhsubqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxS8 VmaxS8
//go:noescape
func VmaxS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxS16 VmaxS16
//go:noescape
func VmaxS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxS32 VmaxS32
//go:noescape
func VmaxS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxU8 VmaxU8
//go:noescape
func VmaxU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxU16 VmaxU16
//go:noescape
func VmaxU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxU32 VmaxU32
//go:noescape
func VmaxU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Floating-point Maximum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the larger of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxF32 VmaxF32
//go:noescape
func VmaxF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Floating-point Maximum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the larger of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxF64 VmaxF64
//go:noescape
func VmaxF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)

// Floating-point Maximum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the larger of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxnmF32 VmaxnmF32
//go:noescape
func VmaxnmF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Floating-point Maximum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the larger of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxnmF64 VmaxnmF64
//go:noescape
func VmaxnmF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)

// Floating-point Maximum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the larger of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxnmqF32 VmaxnmqF32
//go:noescape
func VmaxnmqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Floating-point Maximum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the larger of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxnmqF64 VmaxnmqF64
//go:noescape
func VmaxnmqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VmaxnmvF32 VmaxnmvF32
//go:noescape
func VmaxnmvF32(r *arm.Float32, v0 *arm.Float32X2)

// Floating-point Maximum Number across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VmaxnmvqF32 VmaxnmvqF32
//go:noescape
func VmaxnmvqF32(r *arm.Float32, v0 *arm.Float32X4)

// Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VmaxnmvqF64 VmaxnmvqF64
//go:noescape
func VmaxnmvqF64(r *arm.Float64, v0 *arm.Float64X2)

// Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxqS8 VmaxqS8
//go:noescape
func VmaxqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxqS16 VmaxqS16
//go:noescape
func VmaxqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxqS32 VmaxqS32
//go:noescape
func VmaxqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxqU8 VmaxqU8
//go:noescape
func VmaxqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxqU16 VmaxqU16
//go:noescape
func VmaxqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxqU32 VmaxqU32
//go:noescape
func VmaxqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Floating-point Maximum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the larger of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxqF32 VmaxqF32
//go:noescape
func VmaxqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Floating-point Maximum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the larger of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxqF64 VmaxqF64
//go:noescape
func VmaxqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VmaxvS8 VmaxvS8
//go:noescape
func VmaxvS8(r *arm.Int8, v0 *arm.Int8X8)

// Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VmaxvS16 VmaxvS16
//go:noescape
func VmaxvS16(r *arm.Int16, v0 *arm.Int16X4)

// Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxvS32 VmaxvS32
//go:noescape
func VmaxvS32(r *arm.Int32, v0 *arm.Int32X2)

// Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VmaxvU8 VmaxvU8
//go:noescape
func VmaxvU8(r *arm.Uint8, v0 *arm.Uint8X8)

// Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VmaxvU16 VmaxvU16
//go:noescape
func VmaxvU16(r *arm.Uint16, v0 *arm.Uint16X4)

// Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxvU32 VmaxvU32
//go:noescape
func VmaxvU32(r *arm.Uint32, v0 *arm.Uint32X2)

// Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VmaxvF32 VmaxvF32
//go:noescape
func VmaxvF32(r *arm.Float32, v0 *arm.Float32X2)

// Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VmaxvqS8 VmaxvqS8
//go:noescape
func VmaxvqS8(r *arm.Int8, v0 *arm.Int8X16)

// Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VmaxvqS16 VmaxvqS16
//go:noescape
func VmaxvqS16(r *arm.Int16, v0 *arm.Int16X8)

// Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VmaxvqS32 VmaxvqS32
//go:noescape
func VmaxvqS32(r *arm.Int32, v0 *arm.Int32X4)

// Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VmaxvqU8 VmaxvqU8
//go:noescape
func VmaxvqU8(r *arm.Uint8, v0 *arm.Uint8X16)

// Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VmaxvqU16 VmaxvqU16
//go:noescape
func VmaxvqU16(r *arm.Uint16, v0 *arm.Uint16X8)

// Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VmaxvqU32 VmaxvqU32
//go:noescape
func VmaxvqU32(r *arm.Uint32, v0 *arm.Uint32X4)

// Floating-point Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VmaxvqF32 VmaxvqF32
//go:noescape
func VmaxvqF32(r *arm.Float32, v0 *arm.Float32X4)

// Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VmaxvqF64 VmaxvqF64
//go:noescape
func VmaxvqF64(r *arm.Float64, v0 *arm.Float64X2)

// Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminS8 VminS8
//go:noescape
func VminS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminS16 VminS16
//go:noescape
func VminS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminS32 VminS32
//go:noescape
func VminS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminU8 VminU8
//go:noescape
func VminU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminU16 VminU16
//go:noescape
func VminU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminU32 VminU32
//go:noescape
func VminU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Floating-point minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminF32 VminF32
//go:noescape
func VminF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Floating-point minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminF64 VminF64
//go:noescape
func VminF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)

// Floating-point Minimum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the smaller of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminnmF32 VminnmF32
//go:noescape
func VminnmF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Floating-point Minimum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the smaller of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminnmF64 VminnmF64
//go:noescape
func VminnmF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)

// Floating-point Minimum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the smaller of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminnmqF32 VminnmqF32
//go:noescape
func VminnmqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Floating-point Minimum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the smaller of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminnmqF64 VminnmqF64
//go:noescape
func VminnmqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VminnmvF32 VminnmvF32
//go:noescape
func VminnmvF32(r *arm.Float32, v0 *arm.Float32X2)

// Floating-point Minimum Number across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VminnmvqF32 VminnmvqF32
//go:noescape
func VminnmvqF32(r *arm.Float32, v0 *arm.Float32X4)

// Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VminnmvqF64 VminnmvqF64
//go:noescape
func VminnmvqF64(r *arm.Float64, v0 *arm.Float64X2)

// Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminqS8 VminqS8
//go:noescape
func VminqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminqS16 VminqS16
//go:noescape
func VminqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminqS32 VminqS32
//go:noescape
func VminqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminqU8 VminqU8
//go:noescape
func VminqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminqU16 VminqU16
//go:noescape
func VminqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminqU32 VminqU32
//go:noescape
func VminqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Floating-point minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminqF32 VminqF32
//go:noescape
func VminqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Floating-point minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminqF64 VminqF64
//go:noescape
func VminqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VminvS8 VminvS8
//go:noescape
func VminvS8(r *arm.Int8, v0 *arm.Int8X8)

// Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VminvS16 VminvS16
//go:noescape
func VminvS16(r *arm.Int16, v0 *arm.Int16X4)

// Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminvS32 VminvS32
//go:noescape
func VminvS32(r *arm.Int32, v0 *arm.Int32X2)

// Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VminvU8 VminvU8
//go:noescape
func VminvU8(r *arm.Uint8, v0 *arm.Uint8X8)

// Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VminvU16 VminvU16
//go:noescape
func VminvU16(r *arm.Uint16, v0 *arm.Uint16X4)

// Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminvU32 VminvU32
//go:noescape
func VminvU32(r *arm.Uint32, v0 *arm.Uint32X2)

// Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VminvF32 VminvF32
//go:noescape
func VminvF32(r *arm.Float32, v0 *arm.Float32X2)

// Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VminvqS8 VminvqS8
//go:noescape
func VminvqS8(r *arm.Int8, v0 *arm.Int8X16)

// Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VminvqS16 VminvqS16
//go:noescape
func VminvqS16(r *arm.Int16, v0 *arm.Int16X8)

// Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VminvqS32 VminvqS32
//go:noescape
func VminvqS32(r *arm.Int32, v0 *arm.Int32X4)

// Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VminvqU8 VminvqU8
//go:noescape
func VminvqU8(r *arm.Uint8, v0 *arm.Uint8X16)

// Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VminvqU16 VminvqU16
//go:noescape
func VminvqU16(r *arm.Uint16, v0 *arm.Uint16X8)

// Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VminvqU32 VminvqU32
//go:noescape
func VminvqU32(r *arm.Uint32, v0 *arm.Uint32X4)

// Floating-point Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VminvqF32 VminvqF32
//go:noescape
func VminvqF32(r *arm.Float32, v0 *arm.Float32X4)

// Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VminvqF64 VminvqF64
//go:noescape
func VminvqF64(r *arm.Float64, v0 *arm.Float64X2)

// Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register.
//
//go:linkname VmlaS8 VmlaS8
//go:noescape
func VmlaS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8, v2 *arm.Int8X8)

// Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register.
//
//go:linkname VmlaS16 VmlaS16
//go:noescape
func VmlaS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4, v2 *arm.Int16X4)

// Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register.
//
//go:linkname VmlaS32 VmlaS32
//go:noescape
func VmlaS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2, v2 *arm.Int32X2)

// Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register.
//
//go:linkname VmlaU8 VmlaU8
//go:noescape
func VmlaU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8, v2 *arm.Uint8X8)

// Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register.
//
//go:linkname VmlaU16 VmlaU16
//go:noescape
func VmlaU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4, v2 *arm.Uint16X4)

// Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register.
//
//go:linkname VmlaU32 VmlaU32
//go:noescape
func VmlaU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2, v2 *arm.Uint32X2)

// Floating-point multiply-add to accumulator
//
//go:linkname VmlaF32 VmlaF32
//go:noescape
func VmlaF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2, v2 *arm.Float32X2)

// Floating-point multiply-add to accumulator
//
//go:linkname VmlaF64 VmlaF64
//go:noescape
func VmlaF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1, v2 *arm.Float64X1)

// Vector multiply accumulate with scalar
//
//go:linkname VmlaNS16 VmlaNS16
//go:noescape
func VmlaNS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4, v2 *arm.Int16)

// Vector multiply accumulate with scalar
//
//go:linkname VmlaNS32 VmlaNS32
//go:noescape
func VmlaNS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2, v2 *arm.Int32)

// Vector multiply accumulate with scalar
//
//go:linkname VmlaNU16 VmlaNU16
//go:noescape
func VmlaNU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4, v2 *arm.Uint16)

// Vector multiply accumulate with scalar
//
//go:linkname VmlaNU32 VmlaNU32
//go:noescape
func VmlaNU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2, v2 *arm.Uint32)

// Vector multiply accumulate with scalar
//
//go:linkname VmlaNF32 VmlaNF32
//go:noescape
func VmlaNF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2, v2 *arm.Float32)

// Signed Multiply-Add Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VmlalS8 VmlalS8
//go:noescape
func VmlalS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X8, v2 *arm.Int8X8)

// Signed Multiply-Add Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VmlalS16 VmlalS16
//go:noescape
func VmlalS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4, v2 *arm.Int16X4)

// Signed Multiply-Add Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VmlalS32 VmlalS32
//go:noescape
func VmlalS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2, v2 *arm.Int32X2)

// Unsigned Multiply-Add Long (vector). This instruction multiplies the vector elements in the lower or upper half of the first source SIMD&FP register by the corresponding vector elements of the second source SIMD&FP register, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VmlalU8 VmlalU8
//go:noescape
func VmlalU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X8, v2 *arm.Uint8X8)

// Unsigned Multiply-Add Long (vector). This instruction multiplies the vector elements in the lower or upper half of the first source SIMD&FP register by the corresponding vector elements of the second source SIMD&FP register, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VmlalU16 VmlalU16
//go:noescape
func VmlalU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X4, v2 *arm.Uint16X4)

// Unsigned Multiply-Add Long (vector). This instruction multiplies the vector elements in the lower or upper half of the first source SIMD&FP register by the corresponding vector elements of the second source SIMD&FP register, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VmlalU32 VmlalU32
//go:noescape
func VmlalU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X2, v2 *arm.Uint32X2)

// Signed Multiply-Add Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VmlalHighS8 VmlalHighS8
//go:noescape
func VmlalHighS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X16, v2 *arm.Int8X16)

// Signed Multiply-Add Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VmlalHighS16 VmlalHighS16
//go:noescape
func VmlalHighS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8, v2 *arm.Int16X8)

// Signed Multiply-Add Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VmlalHighS32 VmlalHighS32
//go:noescape
func VmlalHighS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4, v2 *arm.Int32X4)

// Unsigned Multiply-Add Long (vector). This instruction multiplies the vector elements in the lower or upper half of the first source SIMD&FP register by the corresponding vector elements of the second source SIMD&FP register, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VmlalHighU8 VmlalHighU8
//go:noescape
func VmlalHighU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X16, v2 *arm.Uint8X16)

// Unsigned Multiply-Add Long (vector). This instruction multiplies the vector elements in the lower or upper half of the first source SIMD&FP register by the corresponding vector elements of the second source SIMD&FP register, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VmlalHighU16 VmlalHighU16
//go:noescape
func VmlalHighU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X8, v2 *arm.Uint16X8)

// Unsigned Multiply-Add Long (vector). This instruction multiplies the vector elements in the lower or upper half of the first source SIMD&FP register by the corresponding vector elements of the second source SIMD&FP register, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VmlalHighU32 VmlalHighU32
//go:noescape
func VmlalHighU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X4, v2 *arm.Uint32X4)

// Signed Multiply-Add Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VmlalHighNS16 VmlalHighNS16
//go:noescape
func VmlalHighNS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8, v2 *arm.Int16)

// Signed Multiply-Add Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VmlalHighNS32 VmlalHighNS32
//go:noescape
func VmlalHighNS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4, v2 *arm.Int32)

// Unsigned Multiply-Add Long (vector). This instruction multiplies the vector elements in the lower or upper half of the first source SIMD&FP register by the corresponding vector elements of the second source SIMD&FP register, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VmlalHighNU16 VmlalHighNU16
//go:noescape
func VmlalHighNU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X8, v2 *arm.Uint16)

// Unsigned Multiply-Add Long (vector). This instruction multiplies the vector elements in the lower or upper half of the first source SIMD&FP register by the corresponding vector elements of the second source SIMD&FP register, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VmlalHighNU32 VmlalHighNU32
//go:noescape
func VmlalHighNU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X4, v2 *arm.Uint32)

// Vector widening multiply accumulate with scalar
//
//go:linkname VmlalNS16 VmlalNS16
//go:noescape
func VmlalNS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4, v2 *arm.Int16)

// Vector widening multiply accumulate with scalar
//
//go:linkname VmlalNS32 VmlalNS32
//go:noescape
func VmlalNS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2, v2 *arm.Int32)

// Vector widening multiply accumulate with scalar
//
//go:linkname VmlalNU16 VmlalNU16
//go:noescape
func VmlalNU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X4, v2 *arm.Uint16)

// Vector widening multiply accumulate with scalar
//
//go:linkname VmlalNU32 VmlalNU32
//go:noescape
func VmlalNU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X2, v2 *arm.Uint32)

// Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register.
//
//go:linkname VmlaqS8 VmlaqS8
//go:noescape
func VmlaqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16, v2 *arm.Int8X16)

// Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register.
//
//go:linkname VmlaqS16 VmlaqS16
//go:noescape
func VmlaqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8, v2 *arm.Int16X8)

// Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register.
//
//go:linkname VmlaqS32 VmlaqS32
//go:noescape
func VmlaqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4, v2 *arm.Int32X4)

// Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register.
//
//go:linkname VmlaqU8 VmlaqU8
//go:noescape
func VmlaqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16, v2 *arm.Uint8X16)

// Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register.
//
//go:linkname VmlaqU16 VmlaqU16
//go:noescape
func VmlaqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8)

// Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register.
//
//go:linkname VmlaqU32 VmlaqU32
//go:noescape
func VmlaqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)

// Floating-point multiply-add to accumulator
//
//go:linkname VmlaqF32 VmlaqF32
//go:noescape
func VmlaqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4, v2 *arm.Float32X4)

// Floating-point multiply-add to accumulator
//
//go:linkname VmlaqF64 VmlaqF64
//go:noescape
func VmlaqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2, v2 *arm.Float64X2)

// Vector multiply accumulate with scalar
//
//go:linkname VmlaqNS16 VmlaqNS16
//go:noescape
func VmlaqNS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8, v2 *arm.Int16)

// Vector multiply accumulate with scalar
//
//go:linkname VmlaqNS32 VmlaqNS32
//go:noescape
func VmlaqNS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4, v2 *arm.Int32)

// Vector multiply accumulate with scalar
//
//go:linkname VmlaqNU16 VmlaqNU16
//go:noescape
func VmlaqNU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8, v2 *arm.Uint16)

// Vector multiply accumulate with scalar
//
//go:linkname VmlaqNU32 VmlaqNU32
//go:noescape
func VmlaqNU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32)

// Vector multiply accumulate with scalar
//
//go:linkname VmlaqNF32 VmlaqNF32
//go:noescape
func VmlaqNF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4, v2 *arm.Float32)

// Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register.
//
//go:linkname VmlsS8 VmlsS8
//go:noescape
func VmlsS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8, v2 *arm.Int8X8)

// Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register.
//
//go:linkname VmlsS16 VmlsS16
//go:noescape
func VmlsS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4, v2 *arm.Int16X4)

// Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register.
//
//go:linkname VmlsS32 VmlsS32
//go:noescape
func VmlsS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2, v2 *arm.Int32X2)

// Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register.
//
//go:linkname VmlsU8 VmlsU8
//go:noescape
func VmlsU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8, v2 *arm.Uint8X8)

// Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register.
//
//go:linkname VmlsU16 VmlsU16
//go:noescape
func VmlsU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4, v2 *arm.Uint16X4)

// Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register.
//
//go:linkname VmlsU32 VmlsU32
//go:noescape
func VmlsU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2, v2 *arm.Uint32X2)

// Multiply-subtract from accumulator
//
//go:linkname VmlsF32 VmlsF32
//go:noescape
func VmlsF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2, v2 *arm.Float32X2)

// Multiply-subtract from accumulator
//
//go:linkname VmlsF64 VmlsF64
//go:noescape
func VmlsF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1, v2 *arm.Float64X1)

// Vector multiply subtract with scalar
//
//go:linkname VmlsNS16 VmlsNS16
//go:noescape
func VmlsNS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4, v2 *arm.Int16)

// Vector multiply subtract with scalar
//
//go:linkname VmlsNS32 VmlsNS32
//go:noescape
func VmlsNS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2, v2 *arm.Int32)

// Vector multiply subtract with scalar
//
//go:linkname VmlsNU16 VmlsNU16
//go:noescape
func VmlsNU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4, v2 *arm.Uint16)

// Vector multiply subtract with scalar
//
//go:linkname VmlsNU32 VmlsNU32
//go:noescape
func VmlsNU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2, v2 *arm.Uint32)

// Vector multiply subtract with scalar
//
//go:linkname VmlsNF32 VmlsNF32
//go:noescape
func VmlsNF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2, v2 *arm.Float32)

// Signed Multiply-Subtract Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VmlslS8 VmlslS8
//go:noescape
func VmlslS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X8, v2 *arm.Int8X8)

// Signed Multiply-Subtract Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VmlslS16 VmlslS16
//go:noescape
func VmlslS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4, v2 *arm.Int16X4)

// Signed Multiply-Subtract Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VmlslS32 VmlslS32
//go:noescape
func VmlslS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2, v2 *arm.Int32X2)

// Unsigned Multiply-Subtract Long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.
//
//go:linkname VmlslU8 VmlslU8
//go:noescape
func VmlslU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X8, v2 *arm.Uint8X8)

// Unsigned Multiply-Subtract Long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.
//
//go:linkname VmlslU16 VmlslU16
//go:noescape
func VmlslU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X4, v2 *arm.Uint16X4)

// Unsigned Multiply-Subtract Long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.
//
//go:linkname VmlslU32 VmlslU32
//go:noescape
func VmlslU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X2, v2 *arm.Uint32X2)

// Signed Multiply-Subtract Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VmlslHighS8 VmlslHighS8
//go:noescape
func VmlslHighS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X16, v2 *arm.Int8X16)

// Signed Multiply-Subtract Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VmlslHighS16 VmlslHighS16
//go:noescape
func VmlslHighS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8, v2 *arm.Int16X8)

// Signed Multiply-Subtract Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VmlslHighS32 VmlslHighS32
//go:noescape
func VmlslHighS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4, v2 *arm.Int32X4)

// Unsigned Multiply-Subtract Long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.
//
//go:linkname VmlslHighU8 VmlslHighU8
//go:noescape
func VmlslHighU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X16, v2 *arm.Uint8X16)

// Unsigned Multiply-Subtract Long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.
//
//go:linkname VmlslHighU16 VmlslHighU16
//go:noescape
func VmlslHighU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X8, v2 *arm.Uint16X8)

// Unsigned Multiply-Subtract Long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.
//
//go:linkname VmlslHighU32 VmlslHighU32
//go:noescape
func VmlslHighU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X4, v2 *arm.Uint32X4)

// Signed Multiply-Subtract Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VmlslHighNS16 VmlslHighNS16
//go:noescape
func VmlslHighNS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8, v2 *arm.Int16)

// Signed Multiply-Subtract Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VmlslHighNS32 VmlslHighNS32
//go:noescape
func VmlslHighNS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4, v2 *arm.Int32)

// Unsigned Multiply-Subtract Long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.
//
//go:linkname VmlslHighNU16 VmlslHighNU16
//go:noescape
func VmlslHighNU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X8, v2 *arm.Uint16)

// Unsigned Multiply-Subtract Long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.
//
//go:linkname VmlslHighNU32 VmlslHighNU32
//go:noescape
func VmlslHighNU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X4, v2 *arm.Uint32)

// Vector widening multiply subtract with scalar
//
//go:linkname VmlslNS16 VmlslNS16
//go:noescape
func VmlslNS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4, v2 *arm.Int16)

// Vector widening multiply subtract with scalar
//
//go:linkname VmlslNS32 VmlslNS32
//go:noescape
func VmlslNS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2, v2 *arm.Int32)

// Vector widening multiply subtract with scalar
//
//go:linkname VmlslNU16 VmlslNU16
//go:noescape
func VmlslNU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X4, v2 *arm.Uint16)

// Vector widening multiply subtract with scalar
//
//go:linkname VmlslNU32 VmlslNU32
//go:noescape
func VmlslNU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X2, v2 *arm.Uint32)

// Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register.
//
//go:linkname VmlsqS8 VmlsqS8
//go:noescape
func VmlsqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16, v2 *arm.Int8X16)

// Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register.
//
//go:linkname VmlsqS16 VmlsqS16
//go:noescape
func VmlsqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8, v2 *arm.Int16X8)

// Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register.
//
//go:linkname VmlsqS32 VmlsqS32
//go:noescape
func VmlsqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4, v2 *arm.Int32X4)

// Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register.
//
//go:linkname VmlsqU8 VmlsqU8
//go:noescape
func VmlsqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16, v2 *arm.Uint8X16)

// Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register.
//
//go:linkname VmlsqU16 VmlsqU16
//go:noescape
func VmlsqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8)

// Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register.
//
//go:linkname VmlsqU32 VmlsqU32
//go:noescape
func VmlsqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)

// Multiply-subtract from accumulator
//
//go:linkname VmlsqF32 VmlsqF32
//go:noescape
func VmlsqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4, v2 *arm.Float32X4)

// Multiply-subtract from accumulator
//
//go:linkname VmlsqF64 VmlsqF64
//go:noescape
func VmlsqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2, v2 *arm.Float64X2)

// Vector multiply subtract with scalar
//
//go:linkname VmlsqNS16 VmlsqNS16
//go:noescape
func VmlsqNS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8, v2 *arm.Int16)

// Vector multiply subtract with scalar
//
//go:linkname VmlsqNS32 VmlsqNS32
//go:noescape
func VmlsqNS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4, v2 *arm.Int32)

// Vector multiply subtract with scalar
//
//go:linkname VmlsqNU16 VmlsqNU16
//go:noescape
func VmlsqNU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8, v2 *arm.Uint16)

// Vector multiply subtract with scalar
//
//go:linkname VmlsqNU32 VmlsqNU32
//go:noescape
func VmlsqNU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32)

// Vector multiply subtract with scalar
//
//go:linkname VmlsqNF32 VmlsqNF32
//go:noescape
func VmlsqNF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4, v2 *arm.Float32)

// Signed 8-bit integer matrix multiply-accumulate. This instruction multiplies the 2x8 matrix of signed 8-bit integer values in the first source vector by the 8x2 matrix of signed 8-bit integer values in the second source vector. The resulting 2x2 32-bit integer matrix product is destructively added to the 32-bit integer matrix accumulator in the destination vector. This is equivalent to performing an 8-way dot product per destination element.
//
//go:linkname VmmlaqS32 VmmlaqS32
//go:noescape
func VmmlaqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int8X16, v2 *arm.Int8X16)

// Unsigned 8-bit integer matrix multiply-accumulate. This instruction multiplies the 2x8 matrix of unsigned 8-bit integer values in the first source vector by the 8x2 matrix of unsigned 8-bit integer values in the second source vector. The resulting 2x2 32-bit integer matrix product is destructively added to the 32-bit integer matrix accumulator in the destination vector. This is equivalent to performing an 8-way dot product per destination element.
//
//go:linkname VmmlaqU32 VmmlaqU32
//go:noescape
func VmmlaqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint8X16, v2 *arm.Uint8X16)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovNS8 VmovNS8
//go:noescape
func VmovNS8(r *arm.Int8X8, v0 *arm.Int8)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovNS16 VmovNS16
//go:noescape
func VmovNS16(r *arm.Int16X4, v0 *arm.Int16)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovNS32 VmovNS32
//go:noescape
func VmovNS32(r *arm.Int32X2, v0 *arm.Int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovNS64 VmovNS64
//go:noescape
func VmovNS64(r *arm.Int64X1, v0 *arm.Int64)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovNU8 VmovNU8
//go:noescape
func VmovNU8(r *arm.Uint8X8, v0 *arm.Uint8)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovNU16 VmovNU16
//go:noescape
func VmovNU16(r *arm.Uint16X4, v0 *arm.Uint16)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovNU32 VmovNU32
//go:noescape
func VmovNU32(r *arm.Uint32X2, v0 *arm.Uint32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovNU64 VmovNU64
//go:noescape
func VmovNU64(r *arm.Uint64X1, v0 *arm.Uint64)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovNF32 VmovNF32
//go:noescape
func VmovNF32(r *arm.Float32X2, v0 *arm.Float32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovNF64 VmovNF64
//go:noescape
func VmovNF64(r *arm.Float64X1, v0 *arm.Float64)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovNP16 VmovNP16
//go:noescape
func VmovNP16(r *arm.Poly16X4, v0 *arm.Poly16)

// vmov_n_p64
//
//go:linkname VmovNP64 VmovNP64
//go:noescape
func VmovNP64(r *arm.Poly64X1, v0 *arm.Poly64)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovNP8 VmovNP8
//go:noescape
func VmovNP8(r *arm.Poly8X8, v0 *arm.Poly8)

// Vector move
//
//go:linkname VmovlS8 VmovlS8
//go:noescape
func VmovlS8(r *arm.Int16X8, v0 *arm.Int8X8)

// Vector move
//
//go:linkname VmovlS16 VmovlS16
//go:noescape
func VmovlS16(r *arm.Int32X4, v0 *arm.Int16X4)

// Vector move
//
//go:linkname VmovlS32 VmovlS32
//go:noescape
func VmovlS32(r *arm.Int64X2, v0 *arm.Int32X2)

// Vector move
//
//go:linkname VmovlU8 VmovlU8
//go:noescape
func VmovlU8(r *arm.Uint16X8, v0 *arm.Uint8X8)

// Vector move
//
//go:linkname VmovlU16 VmovlU16
//go:noescape
func VmovlU16(r *arm.Uint32X4, v0 *arm.Uint16X4)

// Vector move
//
//go:linkname VmovlU32 VmovlU32
//go:noescape
func VmovlU32(r *arm.Uint64X2, v0 *arm.Uint32X2)

// Vector move
//
//go:linkname VmovlHighS8 VmovlHighS8
//go:noescape
func VmovlHighS8(r *arm.Int16X8, v0 *arm.Int8X16)

// Vector move
//
//go:linkname VmovlHighS16 VmovlHighS16
//go:noescape
func VmovlHighS16(r *arm.Int32X4, v0 *arm.Int16X8)

// Vector move
//
//go:linkname VmovlHighS32 VmovlHighS32
//go:noescape
func VmovlHighS32(r *arm.Int64X2, v0 *arm.Int32X4)

// Vector move
//
//go:linkname VmovlHighU8 VmovlHighU8
//go:noescape
func VmovlHighU8(r *arm.Uint16X8, v0 *arm.Uint8X16)

// Vector move
//
//go:linkname VmovlHighU16 VmovlHighU16
//go:noescape
func VmovlHighU16(r *arm.Uint32X4, v0 *arm.Uint16X8)

// Vector move
//
//go:linkname VmovlHighU32 VmovlHighU32
//go:noescape
func VmovlHighU32(r *arm.Uint64X2, v0 *arm.Uint32X4)

// Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.
//
//go:linkname VmovnS16 VmovnS16
//go:noescape
func VmovnS16(r *arm.Int8X8, v0 *arm.Int16X8)

// Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.
//
//go:linkname VmovnS32 VmovnS32
//go:noescape
func VmovnS32(r *arm.Int16X4, v0 *arm.Int32X4)

// Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.
//
//go:linkname VmovnS64 VmovnS64
//go:noescape
func VmovnS64(r *arm.Int32X2, v0 *arm.Int64X2)

// Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.
//
//go:linkname VmovnU16 VmovnU16
//go:noescape
func VmovnU16(r *arm.Uint8X8, v0 *arm.Uint16X8)

// Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.
//
//go:linkname VmovnU32 VmovnU32
//go:noescape
func VmovnU32(r *arm.Uint16X4, v0 *arm.Uint32X4)

// Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.
//
//go:linkname VmovnU64 VmovnU64
//go:noescape
func VmovnU64(r *arm.Uint32X2, v0 *arm.Uint64X2)

// Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.
//
//go:linkname VmovnHighS16 VmovnHighS16
//go:noescape
func VmovnHighS16(r *arm.Int8X16, v0 *arm.Int8X8, v1 *arm.Int16X8)

// Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.
//
//go:linkname VmovnHighS32 VmovnHighS32
//go:noescape
func VmovnHighS32(r *arm.Int16X8, v0 *arm.Int16X4, v1 *arm.Int32X4)

// Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.
//
//go:linkname VmovnHighS64 VmovnHighS64
//go:noescape
func VmovnHighS64(r *arm.Int32X4, v0 *arm.Int32X2, v1 *arm.Int64X2)

// Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.
//
//go:linkname VmovnHighU16 VmovnHighU16
//go:noescape
func VmovnHighU16(r *arm.Uint8X16, v0 *arm.Uint8X8, v1 *arm.Uint16X8)

// Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.
//
//go:linkname VmovnHighU32 VmovnHighU32
//go:noescape
func VmovnHighU32(r *arm.Uint16X8, v0 *arm.Uint16X4, v1 *arm.Uint32X4)

// Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.
//
//go:linkname VmovnHighU64 VmovnHighU64
//go:noescape
func VmovnHighU64(r *arm.Uint32X4, v0 *arm.Uint32X2, v1 *arm.Uint64X2)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovqNS8 VmovqNS8
//go:noescape
func VmovqNS8(r *arm.Int8X16, v0 *arm.Int8)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovqNS16 VmovqNS16
//go:noescape
func VmovqNS16(r *arm.Int16X8, v0 *arm.Int16)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovqNS32 VmovqNS32
//go:noescape
func VmovqNS32(r *arm.Int32X4, v0 *arm.Int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovqNS64 VmovqNS64
//go:noescape
func VmovqNS64(r *arm.Int64X2, v0 *arm.Int64)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovqNU8 VmovqNU8
//go:noescape
func VmovqNU8(r *arm.Uint8X16, v0 *arm.Uint8)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovqNU16 VmovqNU16
//go:noescape
func VmovqNU16(r *arm.Uint16X8, v0 *arm.Uint16)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovqNU32 VmovqNU32
//go:noescape
func VmovqNU32(r *arm.Uint32X4, v0 *arm.Uint32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovqNU64 VmovqNU64
//go:noescape
func VmovqNU64(r *arm.Uint64X2, v0 *arm.Uint64)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovqNF32 VmovqNF32
//go:noescape
func VmovqNF32(r *arm.Float32X4, v0 *arm.Float32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovqNF64 VmovqNF64
//go:noescape
func VmovqNF64(r *arm.Float64X2, v0 *arm.Float64)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovqNP16 VmovqNP16
//go:noescape
func VmovqNP16(r *arm.Poly16X8, v0 *arm.Poly16)

// vmovq_n_p64
//
//go:linkname VmovqNP64 VmovqNP64
//go:noescape
func VmovqNP64(r *arm.Poly64X2, v0 *arm.Poly64)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovqNP8 VmovqNP8
//go:noescape
func VmovqNP8(r *arm.Poly8X16, v0 *arm.Poly8)

// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulS8 VmulS8
//go:noescape
func VmulS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulS16 VmulS16
//go:noescape
func VmulS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulS32 VmulS32
//go:noescape
func VmulS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulU8 VmulU8
//go:noescape
func VmulU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulU16 VmulU16
//go:noescape
func VmulU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulU32 VmulU32
//go:noescape
func VmulU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulF32 VmulF32
//go:noescape
func VmulF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulF64 VmulF64
//go:noescape
func VmulF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)

// Vector multiply by scalar
//
//go:linkname VmulNS16 VmulNS16
//go:noescape
func VmulNS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16)

// Vector multiply by scalar
//
//go:linkname VmulNS32 VmulNS32
//go:noescape
func VmulNS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32)

// Vector multiply by scalar
//
//go:linkname VmulNU16 VmulNU16
//go:noescape
func VmulNU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16)

// Vector multiply by scalar
//
//go:linkname VmulNU32 VmulNU32
//go:noescape
func VmulNU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32)

// Vector multiply by scalar
//
//go:linkname VmulNF32 VmulNF32
//go:noescape
func VmulNF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32)

// Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulNF64 VmulNF64
//go:noescape
func VmulNF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64)

// Polynomial Multiply. This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulP8 VmulP8
//go:noescape
func VmulP8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8)

// Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmullS8 VmullS8
//go:noescape
func VmullS8(r *arm.Int16X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmullS16 VmullS16
//go:noescape
func VmullS16(r *arm.Int32X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmullS32 VmullS32
//go:noescape
func VmullS32(r *arm.Int64X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Unsigned Multiply long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.
//
//go:linkname VmullU8 VmullU8
//go:noescape
func VmullU8(r *arm.Uint16X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Unsigned Multiply long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.
//
//go:linkname VmullU16 VmullU16
//go:noescape
func VmullU16(r *arm.Uint32X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Unsigned Multiply long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.
//
//go:linkname VmullU32 VmullU32
//go:noescape
func VmullU32(r *arm.Uint64X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmullHighS8 VmullHighS8
//go:noescape
func VmullHighS8(r *arm.Int16X8, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmullHighS16 VmullHighS16
//go:noescape
func VmullHighS16(r *arm.Int32X4, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmullHighS32 VmullHighS32
//go:noescape
func VmullHighS32(r *arm.Int64X2, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Unsigned Multiply long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.
//
//go:linkname VmullHighU8 VmullHighU8
//go:noescape
func VmullHighU8(r *arm.Uint16X8, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Unsigned Multiply long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.
//
//go:linkname VmullHighU16 VmullHighU16
//go:noescape
func VmullHighU16(r *arm.Uint32X4, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Unsigned Multiply long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.
//
//go:linkname VmullHighU32 VmullHighU32
//go:noescape
func VmullHighU32(r *arm.Uint64X2, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmullHighNS16 VmullHighNS16
//go:noescape
func VmullHighNS16(r *arm.Int32X4, v0 *arm.Int16X8, v1 *arm.Int16)

// Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmullHighNS32 VmullHighNS32
//go:noescape
func VmullHighNS32(r *arm.Int64X2, v0 *arm.Int32X4, v1 *arm.Int32)

// Unsigned Multiply long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.
//
//go:linkname VmullHighNU16 VmullHighNU16
//go:noescape
func VmullHighNU16(r *arm.Uint32X4, v0 *arm.Uint16X8, v1 *arm.Uint16)

// Unsigned Multiply long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.
//
//go:linkname VmullHighNU32 VmullHighNU32
//go:noescape
func VmullHighNU32(r *arm.Uint64X2, v0 *arm.Uint32X4, v1 *arm.Uint32)

// Polynomial Multiply Long. This instruction multiplies corresponding elements in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VmullHighP64 VmullHighP64
//go:noescape
func VmullHighP64(r *arm.Poly128, v0 *arm.Poly64X2, v1 *arm.Poly64X2)

// Polynomial Multiply Long. This instruction multiplies corresponding elements in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VmullHighP8 VmullHighP8
//go:noescape
func VmullHighP8(r *arm.Poly16X8, v0 *arm.Poly8X16, v1 *arm.Poly8X16)

// Vector long multiply with scalar
//
//go:linkname VmullNS16 VmullNS16
//go:noescape
func VmullNS16(r *arm.Int32X4, v0 *arm.Int16X4, v1 *arm.Int16)

// Vector long multiply with scalar
//
//go:linkname VmullNS32 VmullNS32
//go:noescape
func VmullNS32(r *arm.Int64X2, v0 *arm.Int32X2, v1 *arm.Int32)

// Vector long multiply with scalar
//
//go:linkname VmullNU16 VmullNU16
//go:noescape
func VmullNU16(r *arm.Uint32X4, v0 *arm.Uint16X4, v1 *arm.Uint16)

// Vector long multiply with scalar
//
//go:linkname VmullNU32 VmullNU32
//go:noescape
func VmullNU32(r *arm.Uint64X2, v0 *arm.Uint32X2, v1 *arm.Uint32)

// Polynomial Multiply Long. This instruction multiplies corresponding elements in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VmullP64 VmullP64
//go:noescape
func VmullP64(r *arm.Poly128, v0 *arm.Poly64, v1 *arm.Poly64)

// Polynomial Multiply Long. This instruction multiplies corresponding elements in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VmullP8 VmullP8
//go:noescape
func VmullP8(r *arm.Poly16X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8)

// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulqS8 VmulqS8
//go:noescape
func VmulqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulqS16 VmulqS16
//go:noescape
func VmulqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulqS32 VmulqS32
//go:noescape
func VmulqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulqU8 VmulqU8
//go:noescape
func VmulqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulqU16 VmulqU16
//go:noescape
func VmulqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulqU32 VmulqU32
//go:noescape
func VmulqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulqF32 VmulqF32
//go:noescape
func VmulqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulqF64 VmulqF64
//go:noescape
func VmulqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Vector multiply by scalar
//
//go:linkname VmulqNS16 VmulqNS16
//go:noescape
func VmulqNS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16)

// Vector multiply by scalar
//
//go:linkname VmulqNS32 VmulqNS32
//go:noescape
func VmulqNS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32)

// Vector multiply by scalar
//
//go:linkname VmulqNU16 VmulqNU16
//go:noescape
func VmulqNU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16)

// Vector multiply by scalar
//
//go:linkname VmulqNU32 VmulqNU32
//go:noescape
func VmulqNU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32)

// Vector multiply by scalar
//
//go:linkname VmulqNF32 VmulqNF32
//go:noescape
func VmulqNF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32)

// Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulqNF64 VmulqNF64
//go:noescape
func VmulqNF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64)

// Polynomial Multiply. This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulqP8 VmulqP8
//go:noescape
func VmulqP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16)

// Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulxF32 VmulxF32
//go:noescape
func VmulxF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulxF64 VmulxF64
//go:noescape
func VmulxF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)

// Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulxdF64 VmulxdF64
//go:noescape
func VmulxdF64(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64)

// Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulxqF32 VmulxqF32
//go:noescape
func VmulxqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulxqF64 VmulxqF64
//go:noescape
func VmulxqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulxsF32 VmulxsF32
//go:noescape
func VmulxsF32(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32)

// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmvnS8 VmvnS8
//go:noescape
func VmvnS8(r *arm.Int8X8, v0 *arm.Int8X8)

// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmvnS16 VmvnS16
//go:noescape
func VmvnS16(r *arm.Int16X4, v0 *arm.Int16X4)

// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmvnS32 VmvnS32
//go:noescape
func VmvnS32(r *arm.Int32X2, v0 *arm.Int32X2)

// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmvnU8 VmvnU8
//go:noescape
func VmvnU8(r *arm.Uint8X8, v0 *arm.Uint8X8)

// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmvnU16 VmvnU16
//go:noescape
func VmvnU16(r *arm.Uint16X4, v0 *arm.Uint16X4)

// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmvnU32 VmvnU32
//go:noescape
func VmvnU32(r *arm.Uint32X2, v0 *arm.Uint32X2)

// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmvnP8 VmvnP8
//go:noescape
func VmvnP8(r *arm.Poly8X8, v0 *arm.Poly8X8)

// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmvnqS8 VmvnqS8
//go:noescape
func VmvnqS8(r *arm.Int8X16, v0 *arm.Int8X16)

// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmvnqS16 VmvnqS16
//go:noescape
func VmvnqS16(r *arm.Int16X8, v0 *arm.Int16X8)

// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmvnqS32 VmvnqS32
//go:noescape
func VmvnqS32(r *arm.Int32X4, v0 *arm.Int32X4)

// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmvnqU8 VmvnqU8
//go:noescape
func VmvnqU8(r *arm.Uint8X16, v0 *arm.Uint8X16)

// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmvnqU16 VmvnqU16
//go:noescape
func VmvnqU16(r *arm.Uint16X8, v0 *arm.Uint16X8)

// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmvnqU32 VmvnqU32
//go:noescape
func VmvnqU32(r *arm.Uint32X4, v0 *arm.Uint32X4)

// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmvnqP8 VmvnqP8
//go:noescape
func VmvnqP8(r *arm.Poly8X16, v0 *arm.Poly8X16)

// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VnegS8 VnegS8
//go:noescape
func VnegS8(r *arm.Int8X8, v0 *arm.Int8X8)

// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VnegS16 VnegS16
//go:noescape
func VnegS16(r *arm.Int16X4, v0 *arm.Int16X4)

// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VnegS32 VnegS32
//go:noescape
func VnegS32(r *arm.Int32X2, v0 *arm.Int32X2)

// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VnegS64 VnegS64
//go:noescape
func VnegS64(r *arm.Int64X1, v0 *arm.Int64X1)

// Floating-point Negate (vector). This instruction negates the value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VnegF32 VnegF32
//go:noescape
func VnegF32(r *arm.Float32X2, v0 *arm.Float32X2)

// Floating-point Negate (vector). This instruction negates the value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VnegF64 VnegF64
//go:noescape
func VnegF64(r *arm.Float64X1, v0 *arm.Float64X1)

// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VnegdS64 VnegdS64
//go:noescape
func VnegdS64(r *arm.Int64, v0 *arm.Int64)

// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VnegqS8 VnegqS8
//go:noescape
func VnegqS8(r *arm.Int8X16, v0 *arm.Int8X16)

// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VnegqS16 VnegqS16
//go:noescape
func VnegqS16(r *arm.Int16X8, v0 *arm.Int16X8)

// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VnegqS32 VnegqS32
//go:noescape
func VnegqS32(r *arm.Int32X4, v0 *arm.Int32X4)

// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VnegqS64 VnegqS64
//go:noescape
func VnegqS64(r *arm.Int64X2, v0 *arm.Int64X2)

// Floating-point Negate (vector). This instruction negates the value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VnegqF32 VnegqF32
//go:noescape
func VnegqF32(r *arm.Float32X4, v0 *arm.Float32X4)

// Floating-point Negate (vector). This instruction negates the value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VnegqF64 VnegqF64
//go:noescape
func VnegqF64(r *arm.Float64X2, v0 *arm.Float64X2)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornS8 VornS8
//go:noescape
func VornS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornS16 VornS16
//go:noescape
func VornS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornS32 VornS32
//go:noescape
func VornS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornS64 VornS64
//go:noescape
func VornS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornU8 VornU8
//go:noescape
func VornU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornU16 VornU16
//go:noescape
func VornU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornU32 VornU32
//go:noescape
func VornU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornU64 VornU64
//go:noescape
func VornU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornqS8 VornqS8
//go:noescape
func VornqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornqS16 VornqS16
//go:noescape
func VornqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornqS32 VornqS32
//go:noescape
func VornqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornqS64 VornqS64
//go:noescape
func VornqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornqU8 VornqU8
//go:noescape
func VornqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornqU16 VornqU16
//go:noescape
func VornqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornqU32 VornqU32
//go:noescape
func VornqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornqU64 VornqU64
//go:noescape
func VornqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrS8 VorrS8
//go:noescape
func VorrS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrS16 VorrS16
//go:noescape
func VorrS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrS32 VorrS32
//go:noescape
func VorrS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrS64 VorrS64
//go:noescape
func VorrS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrU8 VorrU8
//go:noescape
func VorrU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrU16 VorrU16
//go:noescape
func VorrU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrU32 VorrU32
//go:noescape
func VorrU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrU64 VorrU64
//go:noescape
func VorrU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrqS8 VorrqS8
//go:noescape
func VorrqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrqS16 VorrqS16
//go:noescape
func VorrqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrqS32 VorrqS32
//go:noescape
func VorrqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrqS64 VorrqS64
//go:noescape
func VorrqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrqU8 VorrqU8
//go:noescape
func VorrqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrqU16 VorrqU16
//go:noescape
func VorrqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrqU32 VorrqU32
//go:noescape
func VorrqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrqU64 VorrqU64
//go:noescape
func VorrqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)

// Signed Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register and accumulates the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VpadalS8 VpadalS8
//go:noescape
func VpadalS8(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int8X8)

// Signed Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register and accumulates the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VpadalS16 VpadalS16
//go:noescape
func VpadalS16(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int16X4)

// Signed Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register and accumulates the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VpadalS32 VpadalS32
//go:noescape
func VpadalS32(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int32X2)

// Unsigned Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VpadalU8 VpadalU8
//go:noescape
func VpadalU8(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint8X8)

// Unsigned Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VpadalU16 VpadalU16
//go:noescape
func VpadalU16(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint16X4)

// Unsigned Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VpadalU32 VpadalU32
//go:noescape
func VpadalU32(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint32X2)

// Signed Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register and accumulates the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VpadalqS8 VpadalqS8
//go:noescape
func VpadalqS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X16)

// Signed Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register and accumulates the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VpadalqS16 VpadalqS16
//go:noescape
func VpadalqS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8)

// Signed Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register and accumulates the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VpadalqS32 VpadalqS32
//go:noescape
func VpadalqS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4)

// Unsigned Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VpadalqU8 VpadalqU8
//go:noescape
func VpadalqU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X16)

// Unsigned Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VpadalqU16 VpadalqU16
//go:noescape
func VpadalqU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X8)

// Unsigned Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VpadalqU32 VpadalqU32
//go:noescape
func VpadalqU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X4)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpaddS8 VpaddS8
//go:noescape
func VpaddS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpaddS16 VpaddS16
//go:noescape
func VpaddS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpaddS32 VpaddS32
//go:noescape
func VpaddS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpaddU8 VpaddU8
//go:noescape
func VpaddU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpaddU16 VpaddU16
//go:noescape
func VpaddU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpaddU32 VpaddU32
//go:noescape
func VpaddU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpaddF32 VpaddF32
//go:noescape
func VpaddF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpadddS64 VpadddS64
//go:noescape
func VpadddS64(r *arm.Int64, v0 *arm.Int64X2)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpadddU64 VpadddU64
//go:noescape
func VpadddU64(r *arm.Uint64, v0 *arm.Uint64X2)

// Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpadddF64 VpadddF64
//go:noescape
func VpadddF64(r *arm.Float64, v0 *arm.Float64X2)

// Signed Add Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VpaddlS8 VpaddlS8
//go:noescape
func VpaddlS8(r *arm.Int16X4, v0 *arm.Int8X8)

// Signed Add Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VpaddlS16 VpaddlS16
//go:noescape
func VpaddlS16(r *arm.Int32X2, v0 *arm.Int16X4)

// Signed Add Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VpaddlS32 VpaddlS32
//go:noescape
func VpaddlS32(r *arm.Int64X1, v0 *arm.Int32X2)

// Unsigned Add Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VpaddlU8 VpaddlU8
//go:noescape
func VpaddlU8(r *arm.Uint16X4, v0 *arm.Uint8X8)

// Unsigned Add Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VpaddlU16 VpaddlU16
//go:noescape
func VpaddlU16(r *arm.Uint32X2, v0 *arm.Uint16X4)

// Unsigned Add Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VpaddlU32 VpaddlU32
//go:noescape
func VpaddlU32(r *arm.Uint64X1, v0 *arm.Uint32X2)

// Signed Add Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VpaddlqS8 VpaddlqS8
//go:noescape
func VpaddlqS8(r *arm.Int16X8, v0 *arm.Int8X16)

// Signed Add Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VpaddlqS16 VpaddlqS16
//go:noescape
func VpaddlqS16(r *arm.Int32X4, v0 *arm.Int16X8)

// Signed Add Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VpaddlqS32 VpaddlqS32
//go:noescape
func VpaddlqS32(r *arm.Int64X2, v0 *arm.Int32X4)

// Unsigned Add Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VpaddlqU8 VpaddlqU8
//go:noescape
func VpaddlqU8(r *arm.Uint16X8, v0 *arm.Uint8X16)

// Unsigned Add Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VpaddlqU16 VpaddlqU16
//go:noescape
func VpaddlqU16(r *arm.Uint32X4, v0 *arm.Uint16X8)

// Unsigned Add Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VpaddlqU32 VpaddlqU32
//go:noescape
func VpaddlqU32(r *arm.Uint64X2, v0 *arm.Uint32X4)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpaddqS8 VpaddqS8
//go:noescape
func VpaddqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpaddqS16 VpaddqS16
//go:noescape
func VpaddqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpaddqS32 VpaddqS32
//go:noescape
func VpaddqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpaddqS64 VpaddqS64
//go:noescape
func VpaddqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpaddqU8 VpaddqU8
//go:noescape
func VpaddqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpaddqU16 VpaddqU16
//go:noescape
func VpaddqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpaddqU32 VpaddqU32
//go:noescape
func VpaddqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpaddqU64 VpaddqU64
//go:noescape
func VpaddqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)

// Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpaddqF32 VpaddqF32
//go:noescape
func VpaddqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpaddqF64 VpaddqF64
//go:noescape
func VpaddqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpaddsF32 VpaddsF32
//go:noescape
func VpaddsF32(r *arm.Float32, v0 *arm.Float32X2)

// Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpmaxS8 VpmaxS8
//go:noescape
func VpmaxS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpmaxS16 VpmaxS16
//go:noescape
func VpmaxS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpmaxS32 VpmaxS32
//go:noescape
func VpmaxS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpmaxU8 VpmaxU8
//go:noescape
func VpmaxU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpmaxU16 VpmaxU16
//go:noescape
func VpmaxU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpmaxU32 VpmaxU32
//go:noescape
func VpmaxU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpmaxF32 VpmaxF32
//go:noescape
func VpmaxF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpmaxnmF32 VpmaxnmF32
//go:noescape
func VpmaxnmF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpmaxnmqF32 VpmaxnmqF32
//go:noescape
func VpmaxnmqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpmaxnmqF64 VpmaxnmqF64
//go:noescape
func VpmaxnmqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpmaxnmqdF64 VpmaxnmqdF64
//go:noescape
func VpmaxnmqdF64(r *arm.Float64, v0 *arm.Float64X2)

// Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpmaxnmsF32 VpmaxnmsF32
//go:noescape
func VpmaxnmsF32(r *arm.Float32, v0 *arm.Float32X2)

// Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpmaxqS8 VpmaxqS8
//go:noescape
func VpmaxqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpmaxqS16 VpmaxqS16
//go:noescape
func VpmaxqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpmaxqS32 VpmaxqS32
//go:noescape
func VpmaxqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpmaxqU8 VpmaxqU8
//go:noescape
func VpmaxqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpmaxqU16 VpmaxqU16
//go:noescape
func VpmaxqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpmaxqU32 VpmaxqU32
//go:noescape
func VpmaxqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpmaxqF32 VpmaxqF32
//go:noescape
func VpmaxqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpmaxqF64 VpmaxqF64
//go:noescape
func VpmaxqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpmaxqdF64 VpmaxqdF64
//go:noescape
func VpmaxqdF64(r *arm.Float64, v0 *arm.Float64X2)

// Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpmaxsF32 VpmaxsF32
//go:noescape
func VpmaxsF32(r *arm.Float32, v0 *arm.Float32X2)

// Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpminS8 VpminS8
//go:noescape
func VpminS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpminS16 VpminS16
//go:noescape
func VpminS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpminS32 VpminS32
//go:noescape
func VpminS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpminU8 VpminU8
//go:noescape
func VpminU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpminU16 VpminU16
//go:noescape
func VpminU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpminU32 VpminU32
//go:noescape
func VpminU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpminF32 VpminF32
//go:noescape
func VpminF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpminnmF32 VpminnmF32
//go:noescape
func VpminnmF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpminnmqF32 VpminnmqF32
//go:noescape
func VpminnmqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpminnmqF64 VpminnmqF64
//go:noescape
func VpminnmqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpminnmqdF64 VpminnmqdF64
//go:noescape
func VpminnmqdF64(r *arm.Float64, v0 *arm.Float64X2)

// Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpminnmsF32 VpminnmsF32
//go:noescape
func VpminnmsF32(r *arm.Float32, v0 *arm.Float32X2)

// Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpminqS8 VpminqS8
//go:noescape
func VpminqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpminqS16 VpminqS16
//go:noescape
func VpminqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpminqS32 VpminqS32
//go:noescape
func VpminqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpminqU8 VpminqU8
//go:noescape
func VpminqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpminqU16 VpminqU16
//go:noescape
func VpminqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpminqU32 VpminqU32
//go:noescape
func VpminqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpminqF32 VpminqF32
//go:noescape
func VpminqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpminqF64 VpminqF64
//go:noescape
func VpminqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpminqdF64 VpminqdF64
//go:noescape
func VpminqdF64(r *arm.Float64, v0 *arm.Float64X2)

// Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpminsF32 VpminsF32
//go:noescape
func VpminsF32(r *arm.Float32, v0 *arm.Float32X2)

// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqabsS8 VqabsS8
//go:noescape
func VqabsS8(r *arm.Int8X8, v0 *arm.Int8X8)

// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqabsS16 VqabsS16
//go:noescape
func VqabsS16(r *arm.Int16X4, v0 *arm.Int16X4)

// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqabsS32 VqabsS32
//go:noescape
func VqabsS32(r *arm.Int32X2, v0 *arm.Int32X2)

// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqabsS64 VqabsS64
//go:noescape
func VqabsS64(r *arm.Int64X1, v0 *arm.Int64X1)

// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqabsbS8 VqabsbS8
//go:noescape
func VqabsbS8(r *arm.Int8, v0 *arm.Int8)

// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqabsdS64 VqabsdS64
//go:noescape
func VqabsdS64(r *arm.Int64, v0 *arm.Int64)

// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqabshS16 VqabshS16
//go:noescape
func VqabshS16(r *arm.Int16, v0 *arm.Int16)

// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqabsqS8 VqabsqS8
//go:noescape
func VqabsqS8(r *arm.Int8X16, v0 *arm.Int8X16)

// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqabsqS16 VqabsqS16
//go:noescape
func VqabsqS16(r *arm.Int16X8, v0 *arm.Int16X8)

// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqabsqS32 VqabsqS32
//go:noescape
func VqabsqS32(r *arm.Int32X4, v0 *arm.Int32X4)

// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqabsqS64 VqabsqS64
//go:noescape
func VqabsqS64(r *arm.Int64X2, v0 *arm.Int64X2)

// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqabssS32 VqabssS32
//go:noescape
func VqabssS32(r *arm.Int32, v0 *arm.Int32)

// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddS8 VqaddS8
//go:noescape
func VqaddS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddS16 VqaddS16
//go:noescape
func VqaddS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddS32 VqaddS32
//go:noescape
func VqaddS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddS64 VqaddS64
//go:noescape
func VqaddS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)

// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddU8 VqaddU8
//go:noescape
func VqaddU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddU16 VqaddU16
//go:noescape
func VqaddU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddU32 VqaddU32
//go:noescape
func VqaddU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddU64 VqaddU64
//go:noescape
func VqaddU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)

// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddbS8 VqaddbS8
//go:noescape
func VqaddbS8(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8)

// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddbU8 VqaddbU8
//go:noescape
func VqaddbU8(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8)

// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqadddS64 VqadddS64
//go:noescape
func VqadddS64(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64)

// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqadddU64 VqadddU64
//go:noescape
func VqadddU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64)

// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddhS16 VqaddhS16
//go:noescape
func VqaddhS16(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16)

// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddhU16 VqaddhU16
//go:noescape
func VqaddhU16(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16)

// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddqS8 VqaddqS8
//go:noescape
func VqaddqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddqS16 VqaddqS16
//go:noescape
func VqaddqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddqS32 VqaddqS32
//go:noescape
func VqaddqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddqS64 VqaddqS64
//go:noescape
func VqaddqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddqU8 VqaddqU8
//go:noescape
func VqaddqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddqU16 VqaddqU16
//go:noescape
func VqaddqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddqU32 VqaddqU32
//go:noescape
func VqaddqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddqU64 VqaddqU64
//go:noescape
func VqaddqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)

// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddsS32 VqaddsS32
//go:noescape
func VqaddsS32(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32)

// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddsU32 VqaddsU32
//go:noescape
func VqaddsU32(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32)

// Signed saturating Doubling Multiply-Add Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and accumulates the final results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VqdmlalS16 VqdmlalS16
//go:noescape
func VqdmlalS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4, v2 *arm.Int16X4)

// Signed saturating Doubling Multiply-Add Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and accumulates the final results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VqdmlalS32 VqdmlalS32
//go:noescape
func VqdmlalS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2, v2 *arm.Int32X2)

// Signed saturating Doubling Multiply-Add Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and accumulates the final results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VqdmlalHighS16 VqdmlalHighS16
//go:noescape
func VqdmlalHighS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8, v2 *arm.Int16X8)

// Signed saturating Doubling Multiply-Add Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and accumulates the final results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VqdmlalHighS32 VqdmlalHighS32
//go:noescape
func VqdmlalHighS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4, v2 *arm.Int32X4)

// Signed saturating Doubling Multiply-Add Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and accumulates the final results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VqdmlalHighNS16 VqdmlalHighNS16
//go:noescape
func VqdmlalHighNS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8, v2 *arm.Int16)

// Signed saturating Doubling Multiply-Add Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and accumulates the final results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VqdmlalHighNS32 VqdmlalHighNS32
//go:noescape
func VqdmlalHighNS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4, v2 *arm.Int32)

// Vector widening saturating doubling multiply accumulate with scalar
//
//go:linkname VqdmlalNS16 VqdmlalNS16
//go:noescape
func VqdmlalNS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4, v2 *arm.Int16)

// Vector widening saturating doubling multiply accumulate with scalar
//
//go:linkname VqdmlalNS32 VqdmlalNS32
//go:noescape
func VqdmlalNS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2, v2 *arm.Int32)

// Signed saturating Doubling Multiply-Add Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and accumulates the final results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VqdmlalhS16 VqdmlalhS16
//go:noescape
func VqdmlalhS16(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int16, v2 *arm.Int16)

// Signed saturating Doubling Multiply-Add Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and accumulates the final results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VqdmlalsS32 VqdmlalsS32
//go:noescape
func VqdmlalsS32(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int32, v2 *arm.Int32)

// Signed saturating Doubling Multiply-Subtract Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and subtracts the final results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VqdmlslS16 VqdmlslS16
//go:noescape
func VqdmlslS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4, v2 *arm.Int16X4)

// Signed saturating Doubling Multiply-Subtract Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and subtracts the final results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VqdmlslS32 VqdmlslS32
//go:noescape
func VqdmlslS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2, v2 *arm.Int32X2)

// Signed saturating Doubling Multiply-Subtract Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and subtracts the final results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VqdmlslHighS16 VqdmlslHighS16
//go:noescape
func VqdmlslHighS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8, v2 *arm.Int16X8)

// Signed saturating Doubling Multiply-Subtract Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and subtracts the final results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VqdmlslHighS32 VqdmlslHighS32
//go:noescape
func VqdmlslHighS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4, v2 *arm.Int32X4)

// Signed saturating Doubling Multiply-Subtract Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and subtracts the final results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VqdmlslHighNS16 VqdmlslHighNS16
//go:noescape
func VqdmlslHighNS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8, v2 *arm.Int16)

// Signed saturating Doubling Multiply-Subtract Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and subtracts the final results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VqdmlslHighNS32 VqdmlslHighNS32
//go:noescape
func VqdmlslHighNS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4, v2 *arm.Int32)

// Vector widening saturating doubling multiply subtract with scalar
//
//go:linkname VqdmlslNS16 VqdmlslNS16
//go:noescape
func VqdmlslNS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4, v2 *arm.Int16)

// Vector widening saturating doubling multiply subtract with scalar
//
//go:linkname VqdmlslNS32 VqdmlslNS32
//go:noescape
func VqdmlslNS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2, v2 *arm.Int32)

// Signed saturating Doubling Multiply-Subtract Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and subtracts the final results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VqdmlslhS16 VqdmlslhS16
//go:noescape
func VqdmlslhS16(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int16, v2 *arm.Int16)

// Signed saturating Doubling Multiply-Subtract Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and subtracts the final results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
//
//go:linkname VqdmlslsS32 VqdmlslsS32
//go:noescape
func VqdmlslsS32(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int32, v2 *arm.Int32)

// Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqdmulhS16 VqdmulhS16
//go:noescape
func VqdmulhS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqdmulhS32 VqdmulhS32
//go:noescape
func VqdmulhS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Vector saturating doubling multiply high with scalar
//
//go:linkname VqdmulhNS16 VqdmulhNS16
//go:noescape
func VqdmulhNS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16)

// Vector saturating doubling multiply high with scalar
//
//go:linkname VqdmulhNS32 VqdmulhNS32
//go:noescape
func VqdmulhNS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32)

// Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqdmulhhS16 VqdmulhhS16
//go:noescape
func VqdmulhhS16(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16)

// Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqdmulhqS16 VqdmulhqS16
//go:noescape
func VqdmulhqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqdmulhqS32 VqdmulhqS32
//go:noescape
func VqdmulhqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Vector saturating doubling multiply high with scalar
//
//go:linkname VqdmulhqNS16 VqdmulhqNS16
//go:noescape
func VqdmulhqNS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16)

// Vector saturating doubling multiply high with scalar
//
//go:linkname VqdmulhqNS32 VqdmulhqNS32
//go:noescape
func VqdmulhqNS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32)

// Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqdmulhsS32 VqdmulhsS32
//go:noescape
func VqdmulhsS32(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32)

// Signed saturating Doubling Multiply Long. This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqdmullS16 VqdmullS16
//go:noescape
func VqdmullS16(r *arm.Int32X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Signed saturating Doubling Multiply Long. This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqdmullS32 VqdmullS32
//go:noescape
func VqdmullS32(r *arm.Int64X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Signed saturating Doubling Multiply Long. This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqdmullHighS16 VqdmullHighS16
//go:noescape
func VqdmullHighS16(r *arm.Int32X4, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Signed saturating Doubling Multiply Long. This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqdmullHighS32 VqdmullHighS32
//go:noescape
func VqdmullHighS32(r *arm.Int64X2, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Signed saturating Doubling Multiply Long. This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqdmullHighNS16 VqdmullHighNS16
//go:noescape
func VqdmullHighNS16(r *arm.Int32X4, v0 *arm.Int16X8, v1 *arm.Int16)

// Signed saturating Doubling Multiply Long. This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqdmullHighNS32 VqdmullHighNS32
//go:noescape
func VqdmullHighNS32(r *arm.Int64X2, v0 *arm.Int32X4, v1 *arm.Int32)

// Vector saturating doubling long multiply with scalar
//
//go:linkname VqdmullNS16 VqdmullNS16
//go:noescape
func VqdmullNS16(r *arm.Int32X4, v0 *arm.Int16X4, v1 *arm.Int16)

// Vector saturating doubling long multiply with scalar
//
//go:linkname VqdmullNS32 VqdmullNS32
//go:noescape
func VqdmullNS32(r *arm.Int64X2, v0 *arm.Int32X2, v1 *arm.Int32)

// Signed saturating Doubling Multiply Long. This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqdmullhS16 VqdmullhS16
//go:noescape
func VqdmullhS16(r *arm.Int32, v0 *arm.Int16, v1 *arm.Int16)

// Signed saturating Doubling Multiply Long. This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqdmullsS32 VqdmullsS32
//go:noescape
func VqdmullsS32(r *arm.Int64, v0 *arm.Int32, v1 *arm.Int32)

// Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. All the values in this instruction are signed integer values.
//
//go:linkname VqmovnS16 VqmovnS16
//go:noescape
func VqmovnS16(r *arm.Int8X8, v0 *arm.Int16X8)

// Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. All the values in this instruction are signed integer values.
//
//go:linkname VqmovnS32 VqmovnS32
//go:noescape
func VqmovnS32(r *arm.Int16X4, v0 *arm.Int32X4)

// Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. All the values in this instruction are signed integer values.
//
//go:linkname VqmovnS64 VqmovnS64
//go:noescape
func VqmovnS64(r *arm.Int32X2, v0 *arm.Int64X2)

// Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates each value to half the original width, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VqmovnU16 VqmovnU16
//go:noescape
func VqmovnU16(r *arm.Uint8X8, v0 *arm.Uint16X8)

// Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates each value to half the original width, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VqmovnU32 VqmovnU32
//go:noescape
func VqmovnU32(r *arm.Uint16X4, v0 *arm.Uint32X4)

// Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates each value to half the original width, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VqmovnU64 VqmovnU64
//go:noescape
func VqmovnU64(r *arm.Uint32X2, v0 *arm.Uint64X2)

// Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. All the values in this instruction are signed integer values.
//
//go:linkname VqmovnHighS16 VqmovnHighS16
//go:noescape
func VqmovnHighS16(r *arm.Int8X16, v0 *arm.Int8X8, v1 *arm.Int16X8)

// Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. All the values in this instruction are signed integer values.
//
//go:linkname VqmovnHighS32 VqmovnHighS32
//go:noescape
func VqmovnHighS32(r *arm.Int16X8, v0 *arm.Int16X4, v1 *arm.Int32X4)

// Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. All the values in this instruction are signed integer values.
//
//go:linkname VqmovnHighS64 VqmovnHighS64
//go:noescape
func VqmovnHighS64(r *arm.Int32X4, v0 *arm.Int32X2, v1 *arm.Int64X2)

// Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates each value to half the original width, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VqmovnHighU16 VqmovnHighU16
//go:noescape
func VqmovnHighU16(r *arm.Uint8X16, v0 *arm.Uint8X8, v1 *arm.Uint16X8)

// Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates each value to half the original width, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VqmovnHighU32 VqmovnHighU32
//go:noescape
func VqmovnHighU32(r *arm.Uint16X8, v0 *arm.Uint16X4, v1 *arm.Uint32X4)

// Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates each value to half the original width, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VqmovnHighU64 VqmovnHighU64
//go:noescape
func VqmovnHighU64(r *arm.Uint32X4, v0 *arm.Uint32X2, v1 *arm.Uint64X2)

// Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. All the values in this instruction are signed integer values.
//
//go:linkname VqmovndS64 VqmovndS64
//go:noescape
func VqmovndS64(r *arm.Int32, v0 *arm.Int64)

// Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates each value to half the original width, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VqmovndU64 VqmovndU64
//go:noescape
func VqmovndU64(r *arm.Uint32, v0 *arm.Uint64)

// Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. All the values in this instruction are signed integer values.
//
//go:linkname VqmovnhS16 VqmovnhS16
//go:noescape
func VqmovnhS16(r *arm.Int8, v0 *arm.Int16)

// Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates each value to half the original width, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VqmovnhU16 VqmovnhU16
//go:noescape
func VqmovnhU16(r *arm.Uint8, v0 *arm.Uint16)

// Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. All the values in this instruction are signed integer values.
//
//go:linkname VqmovnsS32 VqmovnsS32
//go:noescape
func VqmovnsS32(r *arm.Int16, v0 *arm.Int32)

// Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates each value to half the original width, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VqmovnsU32 VqmovnsU32
//go:noescape
func VqmovnsU32(r *arm.Uint16, v0 *arm.Uint32)

// Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.
//
//go:linkname VqmovunS16 VqmovunS16
//go:noescape
func VqmovunS16(r *arm.Uint8X8, v0 *arm.Int16X8)

// Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.
//
//go:linkname VqmovunS32 VqmovunS32
//go:noescape
func VqmovunS32(r *arm.Uint16X4, v0 *arm.Int32X4)

// Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.
//
//go:linkname VqmovunS64 VqmovunS64
//go:noescape
func VqmovunS64(r *arm.Uint32X2, v0 *arm.Int64X2)

// Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.
//
//go:linkname VqmovunHighS16 VqmovunHighS16
//go:noescape
func VqmovunHighS16(r *arm.Uint8X16, v0 *arm.Uint8X8, v1 *arm.Int16X8)

// Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.
//
//go:linkname VqmovunHighS32 VqmovunHighS32
//go:noescape
func VqmovunHighS32(r *arm.Uint16X8, v0 *arm.Uint16X4, v1 *arm.Int32X4)

// Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.
//
//go:linkname VqmovunHighS64 VqmovunHighS64
//go:noescape
func VqmovunHighS64(r *arm.Uint32X4, v0 *arm.Uint32X2, v1 *arm.Int64X2)

// Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.
//
//go:linkname VqmovundS64 VqmovundS64
//go:noescape
func VqmovundS64(r *arm.Uint32, v0 *arm.Int64)

// Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.
//
//go:linkname VqmovunhS16 VqmovunhS16
//go:noescape
func VqmovunhS16(r *arm.Uint8, v0 *arm.Int16)

// Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.
//
//go:linkname VqmovunsS32 VqmovunsS32
//go:noescape
func VqmovunsS32(r *arm.Uint16, v0 *arm.Int32)

// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqnegS8 VqnegS8
//go:noescape
func VqnegS8(r *arm.Int8X8, v0 *arm.Int8X8)

// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqnegS16 VqnegS16
//go:noescape
func VqnegS16(r *arm.Int16X4, v0 *arm.Int16X4)

// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqnegS32 VqnegS32
//go:noescape
func VqnegS32(r *arm.Int32X2, v0 *arm.Int32X2)

// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqnegS64 VqnegS64
//go:noescape
func VqnegS64(r *arm.Int64X1, v0 *arm.Int64X1)

// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqnegbS8 VqnegbS8
//go:noescape
func VqnegbS8(r *arm.Int8, v0 *arm.Int8)

// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqnegdS64 VqnegdS64
//go:noescape
func VqnegdS64(r *arm.Int64, v0 *arm.Int64)

// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqneghS16 VqneghS16
//go:noescape
func VqneghS16(r *arm.Int16, v0 *arm.Int16)

// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqnegqS8 VqnegqS8
//go:noescape
func VqnegqS8(r *arm.Int8X16, v0 *arm.Int8X16)

// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqnegqS16 VqnegqS16
//go:noescape
func VqnegqS16(r *arm.Int16X8, v0 *arm.Int16X8)

// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqnegqS32 VqnegqS32
//go:noescape
func VqnegqS32(r *arm.Int32X4, v0 *arm.Int32X4)

// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqnegqS64 VqnegqS64
//go:noescape
func VqnegqS64(r *arm.Int64X2, v0 *arm.Int64X2)

// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqnegsS32 VqnegsS32
//go:noescape
func VqnegsS32(r *arm.Int32, v0 *arm.Int32)

// Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and accumulates the most significant half of the final results with the vector elements of the destination SIMD&FP register. The results are rounded.
//
//go:linkname VqrdmlahS16 VqrdmlahS16
//go:noescape
func VqrdmlahS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4, v2 *arm.Int16X4)

// Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and accumulates the most significant half of the final results with the vector elements of the destination SIMD&FP register. The results are rounded.
//
//go:linkname VqrdmlahS32 VqrdmlahS32
//go:noescape
func VqrdmlahS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2, v2 *arm.Int32X2)

// Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half of the final results from the vector elements of the destination SIMD&FP register. The results are rounded.
//
//go:linkname VqrdmlahhS16 VqrdmlahhS16
//go:noescape
func VqrdmlahhS16(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, v2 *arm.Int16)

// Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and accumulates the most significant half of the final results with the vector elements of the destination SIMD&FP register. The results are rounded.
//
//go:linkname VqrdmlahqS16 VqrdmlahqS16
//go:noescape
func VqrdmlahqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8, v2 *arm.Int16X8)

// Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and accumulates the most significant half of the final results with the vector elements of the destination SIMD&FP register. The results are rounded.
//
//go:linkname VqrdmlahqS32 VqrdmlahqS32
//go:noescape
func VqrdmlahqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4, v2 *arm.Int32X4)

// Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half of the final results from the vector elements of the destination SIMD&FP register. The results are rounded.
//
//go:linkname VqrdmlahsS32 VqrdmlahsS32
//go:noescape
func VqrdmlahsS32(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, v2 *arm.Int32)

// Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half of the final results from the vector elements of the destination SIMD&FP register. The results are rounded.
//
//go:linkname VqrdmlshS16 VqrdmlshS16
//go:noescape
func VqrdmlshS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4, v2 *arm.Int16X4)

// Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half of the final results from the vector elements of the destination SIMD&FP register. The results are rounded.
//
//go:linkname VqrdmlshS32 VqrdmlshS32
//go:noescape
func VqrdmlshS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2, v2 *arm.Int32X2)

// Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half of the final results from the vector elements of the destination SIMD&FP register. The results are rounded.
//
//go:linkname VqrdmlshhS16 VqrdmlshhS16
//go:noescape
func VqrdmlshhS16(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, v2 *arm.Int16)

// Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half of the final results from the vector elements of the destination SIMD&FP register. The results are rounded.
//
//go:linkname VqrdmlshqS16 VqrdmlshqS16
//go:noescape
func VqrdmlshqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8, v2 *arm.Int16X8)

// Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half of the final results from the vector elements of the destination SIMD&FP register. The results are rounded.
//
//go:linkname VqrdmlshqS32 VqrdmlshqS32
//go:noescape
func VqrdmlshqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4, v2 *arm.Int32X4)

// Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half of the final results from the vector elements of the destination SIMD&FP register. The results are rounded.
//
//go:linkname VqrdmlshsS32 VqrdmlshsS32
//go:noescape
func VqrdmlshsS32(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, v2 *arm.Int32)

// Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrdmulhS16 VqrdmulhS16
//go:noescape
func VqrdmulhS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrdmulhS32 VqrdmulhS32
//go:noescape
func VqrdmulhS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Vector saturating rounding doubling multiply high with scalar
//
//go:linkname VqrdmulhNS16 VqrdmulhNS16
//go:noescape
func VqrdmulhNS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16)

// Vector saturating rounding doubling multiply high with scalar
//
//go:linkname VqrdmulhNS32 VqrdmulhNS32
//go:noescape
func VqrdmulhNS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32)

// Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrdmulhhS16 VqrdmulhhS16
//go:noescape
func VqrdmulhhS16(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16)

// Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrdmulhqS16 VqrdmulhqS16
//go:noescape
func VqrdmulhqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrdmulhqS32 VqrdmulhqS32
//go:noescape
func VqrdmulhqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Vector saturating rounding doubling multiply high with scalar
//
//go:linkname VqrdmulhqNS16 VqrdmulhqNS16
//go:noescape
func VqrdmulhqNS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16)

// Vector saturating rounding doubling multiply high with scalar
//
//go:linkname VqrdmulhqNS32 VqrdmulhqNS32
//go:noescape
func VqrdmulhqNS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32)

// Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrdmulhsS32 VqrdmulhsS32
//go:noescape
func VqrdmulhsS32(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32)

// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlS8 VqrshlS8
//go:noescape
func VqrshlS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlS16 VqrshlS16
//go:noescape
func VqrshlS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlS32 VqrshlS32
//go:noescape
func VqrshlS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlS64 VqrshlS64
//go:noescape
func VqrshlS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)

// Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlU8 VqrshlU8
//go:noescape
func VqrshlU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Int8X8)

// Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlU16 VqrshlU16
//go:noescape
func VqrshlU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Int16X4)

// Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlU32 VqrshlU32
//go:noescape
func VqrshlU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Int32X2)

// Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlU64 VqrshlU64
//go:noescape
func VqrshlU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Int64X1)

// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlbS8 VqrshlbS8
//go:noescape
func VqrshlbS8(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8)

// Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlbU8 VqrshlbU8
//go:noescape
func VqrshlbU8(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Int8)

// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshldS64 VqrshldS64
//go:noescape
func VqrshldS64(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64)

// Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshldU64 VqrshldU64
//go:noescape
func VqrshldU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Int64)

// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlhS16 VqrshlhS16
//go:noescape
func VqrshlhS16(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16)

// Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlhU16 VqrshlhU16
//go:noescape
func VqrshlhU16(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Int16)

// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlqS8 VqrshlqS8
//go:noescape
func VqrshlqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlqS16 VqrshlqS16
//go:noescape
func VqrshlqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlqS32 VqrshlqS32
//go:noescape
func VqrshlqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlqS64 VqrshlqS64
//go:noescape
func VqrshlqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlqU8 VqrshlqU8
//go:noescape
func VqrshlqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Int8X16)

// Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlqU16 VqrshlqU16
//go:noescape
func VqrshlqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Int16X8)

// Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlqU32 VqrshlqU32
//go:noescape
func VqrshlqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Int32X4)

// Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlqU64 VqrshlqU64
//go:noescape
func VqrshlqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Int64X2)

// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlsS32 VqrshlsS32
//go:noescape
func VqrshlsS32(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32)

// Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlsU32 VqrshlsU32
//go:noescape
func VqrshlsU32(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Int32)

// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlS8 VqshlS8
//go:noescape
func VqshlS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlS16 VqshlS16
//go:noescape
func VqshlS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlS32 VqshlS32
//go:noescape
func VqshlS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlS64 VqshlS64
//go:noescape
func VqshlS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)

// Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlU8 VqshlU8
//go:noescape
func VqshlU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Int8X8)

// Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlU16 VqshlU16
//go:noescape
func VqshlU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Int16X4)

// Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlU32 VqshlU32
//go:noescape
func VqshlU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Int32X2)

// Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlU64 VqshlU64
//go:noescape
func VqshlU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Int64X1)

// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlbS8 VqshlbS8
//go:noescape
func VqshlbS8(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8)

// Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlbU8 VqshlbU8
//go:noescape
func VqshlbU8(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Int8)

// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshldS64 VqshldS64
//go:noescape
func VqshldS64(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64)

// Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshldU64 VqshldU64
//go:noescape
func VqshldU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Int64)

// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlhS16 VqshlhS16
//go:noescape
func VqshlhS16(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16)

// Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlhU16 VqshlhU16
//go:noescape
func VqshlhU16(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Int16)

// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlqS8 VqshlqS8
//go:noescape
func VqshlqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlqS16 VqshlqS16
//go:noescape
func VqshlqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlqS32 VqshlqS32
//go:noescape
func VqshlqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlqS64 VqshlqS64
//go:noescape
func VqshlqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlqU8 VqshlqU8
//go:noescape
func VqshlqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Int8X16)

// Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlqU16 VqshlqU16
//go:noescape
func VqshlqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Int16X8)

// Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlqU32 VqshlqU32
//go:noescape
func VqshlqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Int32X4)

// Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlqU64 VqshlqU64
//go:noescape
func VqshlqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Int64X2)

// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlsS32 VqshlsS32
//go:noescape
func VqshlsS32(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32)

// Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlsU32 VqshlsU32
//go:noescape
func VqshlsU32(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Int32)

// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubS8 VqsubS8
//go:noescape
func VqsubS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubS16 VqsubS16
//go:noescape
func VqsubS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubS32 VqsubS32
//go:noescape
func VqsubS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubS64 VqsubS64
//go:noescape
func VqsubS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)

// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubU8 VqsubU8
//go:noescape
func VqsubU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubU16 VqsubU16
//go:noescape
func VqsubU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubU32 VqsubU32
//go:noescape
func VqsubU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubU64 VqsubU64
//go:noescape
func VqsubU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)

// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubbS8 VqsubbS8
//go:noescape
func VqsubbS8(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8)

// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubbU8 VqsubbU8
//go:noescape
func VqsubbU8(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8)

// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubdS64 VqsubdS64
//go:noescape
func VqsubdS64(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64)

// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubdU64 VqsubdU64
//go:noescape
func VqsubdU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64)

// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubhS16 VqsubhS16
//go:noescape
func VqsubhS16(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16)

// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubhU16 VqsubhU16
//go:noescape
func VqsubhU16(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16)

// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubqS8 VqsubqS8
//go:noescape
func VqsubqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubqS16 VqsubqS16
//go:noescape
func VqsubqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubqS32 VqsubqS32
//go:noescape
func VqsubqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubqS64 VqsubqS64
//go:noescape
func VqsubqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubqU8 VqsubqU8
//go:noescape
func VqsubqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubqU16 VqsubqU16
//go:noescape
func VqsubqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubqU32 VqsubqU32
//go:noescape
func VqsubqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubqU64 VqsubqU64
//go:noescape
func VqsubqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)

// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubsS32 VqsubsS32
//go:noescape
func VqsubsS32(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32)

// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubsU32 VqsubsU32
//go:noescape
func VqsubsU32(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbl1S8 Vqtbl1S8
//go:noescape
func Vqtbl1S8(r *arm.Int8X8, v0 *arm.Int8X16, v1 *arm.Uint8X8)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbl1U8 Vqtbl1U8
//go:noescape
func Vqtbl1U8(r *arm.Uint8X8, v0 *arm.Uint8X16, v1 *arm.Uint8X8)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbl1P8 Vqtbl1P8
//go:noescape
func Vqtbl1P8(r *arm.Poly8X8, v0 *arm.Poly8X16, v1 *arm.Uint8X8)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbl1QS8 Vqtbl1QS8
//go:noescape
func Vqtbl1QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Uint8X16)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbl1QU8 Vqtbl1QU8
//go:noescape
func Vqtbl1QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbl1QP8 Vqtbl1QP8
//go:noescape
func Vqtbl1QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Uint8X16)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbl2S8 Vqtbl2S8
//go:noescape
func Vqtbl2S8(r *arm.Int8X8, v0 *arm.Int8X16X2, v1 *arm.Uint8X8)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbl2U8 Vqtbl2U8
//go:noescape
func Vqtbl2U8(r *arm.Uint8X8, v0 *arm.Uint8X16X2, v1 *arm.Uint8X8)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbl2P8 Vqtbl2P8
//go:noescape
func Vqtbl2P8(r *arm.Poly8X8, v0 *arm.Poly8X16X2, v1 *arm.Uint8X8)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbl2QS8 Vqtbl2QS8
//go:noescape
func Vqtbl2QS8(r *arm.Int8X16, v0 *arm.Int8X16X2, v1 *arm.Uint8X16)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbl2QU8 Vqtbl2QU8
//go:noescape
func Vqtbl2QU8(r *arm.Uint8X16, v0 *arm.Uint8X16X2, v1 *arm.Uint8X16)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbl2QP8 Vqtbl2QP8
//go:noescape
func Vqtbl2QP8(r *arm.Poly8X16, v0 *arm.Poly8X16X2, v1 *arm.Uint8X16)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbl3S8 Vqtbl3S8
//go:noescape
func Vqtbl3S8(r *arm.Int8X8, v0 *arm.Int8X16X3, v1 *arm.Uint8X8)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbl3U8 Vqtbl3U8
//go:noescape
func Vqtbl3U8(r *arm.Uint8X8, v0 *arm.Uint8X16X3, v1 *arm.Uint8X8)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbl3P8 Vqtbl3P8
//go:noescape
func Vqtbl3P8(r *arm.Poly8X8, v0 *arm.Poly8X16X3, v1 *arm.Uint8X8)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbl3QS8 Vqtbl3QS8
//go:noescape
func Vqtbl3QS8(r *arm.Int8X16, v0 *arm.Int8X16X3, v1 *arm.Uint8X16)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbl3QU8 Vqtbl3QU8
//go:noescape
func Vqtbl3QU8(r *arm.Uint8X16, v0 *arm.Uint8X16X3, v1 *arm.Uint8X16)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbl3QP8 Vqtbl3QP8
//go:noescape
func Vqtbl3QP8(r *arm.Poly8X16, v0 *arm.Poly8X16X3, v1 *arm.Uint8X16)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbl4S8 Vqtbl4S8
//go:noescape
func Vqtbl4S8(r *arm.Int8X8, v0 *arm.Int8X16X4, v1 *arm.Uint8X8)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbl4U8 Vqtbl4U8
//go:noescape
func Vqtbl4U8(r *arm.Uint8X8, v0 *arm.Uint8X16X4, v1 *arm.Uint8X8)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbl4P8 Vqtbl4P8
//go:noescape
func Vqtbl4P8(r *arm.Poly8X8, v0 *arm.Poly8X16X4, v1 *arm.Uint8X8)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbl4QS8 Vqtbl4QS8
//go:noescape
func Vqtbl4QS8(r *arm.Int8X16, v0 *arm.Int8X16X4, v1 *arm.Uint8X16)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbl4QU8 Vqtbl4QU8
//go:noescape
func Vqtbl4QU8(r *arm.Uint8X16, v0 *arm.Uint8X16X4, v1 *arm.Uint8X16)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbl4QP8 Vqtbl4QP8
//go:noescape
func Vqtbl4QP8(r *arm.Poly8X16, v0 *arm.Poly8X16X4, v1 *arm.Uint8X16)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbx1S8 Vqtbx1S8
//go:noescape
func Vqtbx1S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X16, v2 *arm.Uint8X8)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbx1U8 Vqtbx1U8
//go:noescape
func Vqtbx1U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X16, v2 *arm.Uint8X8)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbx1P8 Vqtbx1P8
//go:noescape
func Vqtbx1P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X16, v2 *arm.Uint8X8)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbx1QS8 Vqtbx1QS8
//go:noescape
func Vqtbx1QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16, v2 *arm.Uint8X16)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbx1QU8 Vqtbx1QU8
//go:noescape
func Vqtbx1QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16, v2 *arm.Uint8X16)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbx1QP8 Vqtbx1QP8
//go:noescape
func Vqtbx1QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16, v2 *arm.Uint8X16)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbx2S8 Vqtbx2S8
//go:noescape
func Vqtbx2S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X16X2, v2 *arm.Uint8X8)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbx2U8 Vqtbx2U8
//go:noescape
func Vqtbx2U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X16X2, v2 *arm.Uint8X8)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbx2P8 Vqtbx2P8
//go:noescape
func Vqtbx2P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X16X2, v2 *arm.Uint8X8)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbx2QS8 Vqtbx2QS8
//go:noescape
func Vqtbx2QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16X2, v2 *arm.Uint8X16)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbx2QU8 Vqtbx2QU8
//go:noescape
func Vqtbx2QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16X2, v2 *arm.Uint8X16)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbx2QP8 Vqtbx2QP8
//go:noescape
func Vqtbx2QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16X2, v2 *arm.Uint8X16)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbx3S8 Vqtbx3S8
//go:noescape
func Vqtbx3S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X16X3, v2 *arm.Uint8X8)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbx3U8 Vqtbx3U8
//go:noescape
func Vqtbx3U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X16X3, v2 *arm.Uint8X8)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbx3P8 Vqtbx3P8
//go:noescape
func Vqtbx3P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X16X3, v2 *arm.Uint8X8)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbx3QS8 Vqtbx3QS8
//go:noescape
func Vqtbx3QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16X3, v2 *arm.Uint8X16)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbx3QU8 Vqtbx3QU8
//go:noescape
func Vqtbx3QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16X3, v2 *arm.Uint8X16)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbx3QP8 Vqtbx3QP8
//go:noescape
func Vqtbx3QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16X3, v2 *arm.Uint8X16)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbx4S8 Vqtbx4S8
//go:noescape
func Vqtbx4S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X16X4, v2 *arm.Uint8X8)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbx4U8 Vqtbx4U8
//go:noescape
func Vqtbx4U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X16X4, v2 *arm.Uint8X8)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbx4P8 Vqtbx4P8
//go:noescape
func Vqtbx4P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X16X4, v2 *arm.Uint8X8)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbx4QS8 Vqtbx4QS8
//go:noescape
func Vqtbx4QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16X4, v2 *arm.Uint8X16)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbx4QU8 Vqtbx4QU8
//go:noescape
func Vqtbx4QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16X4, v2 *arm.Uint8X16)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbx4QP8 Vqtbx4QP8
//go:noescape
func Vqtbx4QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16X4, v2 *arm.Uint8X16)

// Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VraddhnS16 VraddhnS16
//go:noescape
func VraddhnS16(r *arm.Int8X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VraddhnS32 VraddhnS32
//go:noescape
func VraddhnS32(r *arm.Int16X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VraddhnS64 VraddhnS64
//go:noescape
func VraddhnS64(r *arm.Int32X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VraddhnU16 VraddhnU16
//go:noescape
func VraddhnU16(r *arm.Uint8X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VraddhnU32 VraddhnU32
//go:noescape
func VraddhnU32(r *arm.Uint16X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VraddhnU64 VraddhnU64
//go:noescape
func VraddhnU64(r *arm.Uint32X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)

// Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VraddhnHighS16 VraddhnHighS16
//go:noescape
func VraddhnHighS16(r *arm.Int8X16, v0 *arm.Int8X8, v1 *arm.Int16X8, v2 *arm.Int16X8)

// Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VraddhnHighS32 VraddhnHighS32
//go:noescape
func VraddhnHighS32(r *arm.Int16X8, v0 *arm.Int16X4, v1 *arm.Int32X4, v2 *arm.Int32X4)

// Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VraddhnHighS64 VraddhnHighS64
//go:noescape
func VraddhnHighS64(r *arm.Int32X4, v0 *arm.Int32X2, v1 *arm.Int64X2, v2 *arm.Int64X2)

// Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VraddhnHighU16 VraddhnHighU16
//go:noescape
func VraddhnHighU16(r *arm.Uint8X16, v0 *arm.Uint8X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8)

// Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VraddhnHighU32 VraddhnHighU32
//go:noescape
func VraddhnHighU32(r *arm.Uint16X8, v0 *arm.Uint16X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)

// Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VraddhnHighU64 VraddhnHighU64
//go:noescape
func VraddhnHighU64(r *arm.Uint32X4, v0 *arm.Uint32X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2)

// Rotate and Exclusive OR rotates each 64-bit element of the 128-bit vector in a source SIMD&FP register left by 1, performs a bitwise exclusive OR of the resulting 128-bit vector and the vector in another source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname Vrax1QU64 Vrax1QU64
//go:noescape
func Vrax1QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)

// Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrbitS8 VrbitS8
//go:noescape
func VrbitS8(r *arm.Int8X8, v0 *arm.Int8X8)

// Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrbitU8 VrbitU8
//go:noescape
func VrbitU8(r *arm.Uint8X8, v0 *arm.Uint8X8)

// Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrbitP8 VrbitP8
//go:noescape
func VrbitP8(r *arm.Poly8X8, v0 *arm.Poly8X8)

// Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrbitqS8 VrbitqS8
//go:noescape
func VrbitqS8(r *arm.Int8X16, v0 *arm.Int8X16)

// Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrbitqU8 VrbitqU8
//go:noescape
func VrbitqU8(r *arm.Uint8X16, v0 *arm.Uint8X16)

// Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrbitqP8 VrbitqP8
//go:noescape
func VrbitqP8(r *arm.Poly8X16, v0 *arm.Poly8X16)

// Unsigned Reciprocal Estimate. This instruction reads each vector element from the source SIMD&FP register, calculates an approximate inverse for the unsigned integer value, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpeU32 VrecpeU32
//go:noescape
func VrecpeU32(r *arm.Uint32X2, v0 *arm.Uint32X2)

// Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpeF32 VrecpeF32
//go:noescape
func VrecpeF32(r *arm.Float32X2, v0 *arm.Float32X2)

// Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpeF64 VrecpeF64
//go:noescape
func VrecpeF64(r *arm.Float64X1, v0 *arm.Float64X1)

// Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpedF64 VrecpedF64
//go:noescape
func VrecpedF64(r *arm.Float64, v0 *arm.Float64)

// Unsigned Reciprocal Estimate. This instruction reads each vector element from the source SIMD&FP register, calculates an approximate inverse for the unsigned integer value, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpeqU32 VrecpeqU32
//go:noescape
func VrecpeqU32(r *arm.Uint32X4, v0 *arm.Uint32X4)

// Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpeqF32 VrecpeqF32
//go:noescape
func VrecpeqF32(r *arm.Float32X4, v0 *arm.Float32X4)

// Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpeqF64 VrecpeqF64
//go:noescape
func VrecpeqF64(r *arm.Float64X2, v0 *arm.Float64X2)

// Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpesF32 VrecpesF32
//go:noescape
func VrecpesF32(r *arm.Float32, v0 *arm.Float32)

// Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpsF32 VrecpsF32
//go:noescape
func VrecpsF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpsF64 VrecpsF64
//go:noescape
func VrecpsF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)

// Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpsdF64 VrecpsdF64
//go:noescape
func VrecpsdF64(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64)

// Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpsqF32 VrecpsqF32
//go:noescape
func VrecpsqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpsqF64 VrecpsqF64
//go:noescape
func VrecpsqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpssF32 VrecpssF32
//go:noescape
func VrecpssF32(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32)

// Floating-point Reciprocal exponent (scalar). This instruction finds an approximate reciprocal exponent for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpxdF64 VrecpxdF64
//go:noescape
func VrecpxdF64(r *arm.Float64, v0 *arm.Float64)

// Floating-point Reciprocal exponent (scalar). This instruction finds an approximate reciprocal exponent for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpxsF32 VrecpxsF32
//go:noescape
func VrecpxsF32(r *arm.Float32, v0 *arm.Float32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretF32S8 VreinterpretF32S8
//go:noescape
func VreinterpretF32S8(r *arm.Float32X2, v0 *arm.Int8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretF32S16 VreinterpretF32S16
//go:noescape
func VreinterpretF32S16(r *arm.Float32X2, v0 *arm.Int16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretF32S32 VreinterpretF32S32
//go:noescape
func VreinterpretF32S32(r *arm.Float32X2, v0 *arm.Int32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretF32S64 VreinterpretF32S64
//go:noescape
func VreinterpretF32S64(r *arm.Float32X2, v0 *arm.Int64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretF32U8 VreinterpretF32U8
//go:noescape
func VreinterpretF32U8(r *arm.Float32X2, v0 *arm.Uint8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretF32U16 VreinterpretF32U16
//go:noescape
func VreinterpretF32U16(r *arm.Float32X2, v0 *arm.Uint16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretF32U32 VreinterpretF32U32
//go:noescape
func VreinterpretF32U32(r *arm.Float32X2, v0 *arm.Uint32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretF32U64 VreinterpretF32U64
//go:noescape
func VreinterpretF32U64(r *arm.Float32X2, v0 *arm.Uint64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretF32F64 VreinterpretF32F64
//go:noescape
func VreinterpretF32F64(r *arm.Float32X2, v0 *arm.Float64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretF32P16 VreinterpretF32P16
//go:noescape
func VreinterpretF32P16(r *arm.Float32X2, v0 *arm.Poly16X4)

// vreinterpret_f32_p64
//
//go:linkname VreinterpretF32P64 VreinterpretF32P64
//go:noescape
func VreinterpretF32P64(r *arm.Float32X2, v0 *arm.Poly64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretF32P8 VreinterpretF32P8
//go:noescape
func VreinterpretF32P8(r *arm.Float32X2, v0 *arm.Poly8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretF64S8 VreinterpretF64S8
//go:noescape
func VreinterpretF64S8(r *arm.Float64X1, v0 *arm.Int8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretF64S16 VreinterpretF64S16
//go:noescape
func VreinterpretF64S16(r *arm.Float64X1, v0 *arm.Int16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretF64S32 VreinterpretF64S32
//go:noescape
func VreinterpretF64S32(r *arm.Float64X1, v0 *arm.Int32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretF64S64 VreinterpretF64S64
//go:noescape
func VreinterpretF64S64(r *arm.Float64X1, v0 *arm.Int64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretF64U8 VreinterpretF64U8
//go:noescape
func VreinterpretF64U8(r *arm.Float64X1, v0 *arm.Uint8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretF64U16 VreinterpretF64U16
//go:noescape
func VreinterpretF64U16(r *arm.Float64X1, v0 *arm.Uint16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretF64U32 VreinterpretF64U32
//go:noescape
func VreinterpretF64U32(r *arm.Float64X1, v0 *arm.Uint32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretF64U64 VreinterpretF64U64
//go:noescape
func VreinterpretF64U64(r *arm.Float64X1, v0 *arm.Uint64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretF64F32 VreinterpretF64F32
//go:noescape
func VreinterpretF64F32(r *arm.Float64X1, v0 *arm.Float32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretF64P16 VreinterpretF64P16
//go:noescape
func VreinterpretF64P16(r *arm.Float64X1, v0 *arm.Poly16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretF64P64 VreinterpretF64P64
//go:noescape
func VreinterpretF64P64(r *arm.Float64X1, v0 *arm.Poly64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretF64P8 VreinterpretF64P8
//go:noescape
func VreinterpretF64P8(r *arm.Float64X1, v0 *arm.Poly8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP16S8 VreinterpretP16S8
//go:noescape
func VreinterpretP16S8(r *arm.Poly16X4, v0 *arm.Int8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP16S16 VreinterpretP16S16
//go:noescape
func VreinterpretP16S16(r *arm.Poly16X4, v0 *arm.Int16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP16S32 VreinterpretP16S32
//go:noescape
func VreinterpretP16S32(r *arm.Poly16X4, v0 *arm.Int32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP16S64 VreinterpretP16S64
//go:noescape
func VreinterpretP16S64(r *arm.Poly16X4, v0 *arm.Int64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP16U8 VreinterpretP16U8
//go:noescape
func VreinterpretP16U8(r *arm.Poly16X4, v0 *arm.Uint8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP16U16 VreinterpretP16U16
//go:noescape
func VreinterpretP16U16(r *arm.Poly16X4, v0 *arm.Uint16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP16U32 VreinterpretP16U32
//go:noescape
func VreinterpretP16U32(r *arm.Poly16X4, v0 *arm.Uint32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP16U64 VreinterpretP16U64
//go:noescape
func VreinterpretP16U64(r *arm.Poly16X4, v0 *arm.Uint64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP16F32 VreinterpretP16F32
//go:noescape
func VreinterpretP16F32(r *arm.Poly16X4, v0 *arm.Float32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP16F64 VreinterpretP16F64
//go:noescape
func VreinterpretP16F64(r *arm.Poly16X4, v0 *arm.Float64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP16P64 VreinterpretP16P64
//go:noescape
func VreinterpretP16P64(r *arm.Poly16X4, v0 *arm.Poly64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP16P8 VreinterpretP16P8
//go:noescape
func VreinterpretP16P8(r *arm.Poly16X4, v0 *arm.Poly8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP64S8 VreinterpretP64S8
//go:noescape
func VreinterpretP64S8(r *arm.Poly64X1, v0 *arm.Int8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP64S16 VreinterpretP64S16
//go:noescape
func VreinterpretP64S16(r *arm.Poly64X1, v0 *arm.Int16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP64S32 VreinterpretP64S32
//go:noescape
func VreinterpretP64S32(r *arm.Poly64X1, v0 *arm.Int32X2)

// vreinterpret_p64_s64
//
//go:linkname VreinterpretP64S64 VreinterpretP64S64
//go:noescape
func VreinterpretP64S64(r *arm.Poly64X1, v0 *arm.Int64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP64U8 VreinterpretP64U8
//go:noescape
func VreinterpretP64U8(r *arm.Poly64X1, v0 *arm.Uint8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP64U16 VreinterpretP64U16
//go:noescape
func VreinterpretP64U16(r *arm.Poly64X1, v0 *arm.Uint16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP64U32 VreinterpretP64U32
//go:noescape
func VreinterpretP64U32(r *arm.Poly64X1, v0 *arm.Uint32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP64U64 VreinterpretP64U64
//go:noescape
func VreinterpretP64U64(r *arm.Poly64X1, v0 *arm.Uint64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP64F32 VreinterpretP64F32
//go:noescape
func VreinterpretP64F32(r *arm.Poly64X1, v0 *arm.Float32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP64F64 VreinterpretP64F64
//go:noescape
func VreinterpretP64F64(r *arm.Poly64X1, v0 *arm.Float64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP64P16 VreinterpretP64P16
//go:noescape
func VreinterpretP64P16(r *arm.Poly64X1, v0 *arm.Poly16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP64P8 VreinterpretP64P8
//go:noescape
func VreinterpretP64P8(r *arm.Poly64X1, v0 *arm.Poly8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP8S8 VreinterpretP8S8
//go:noescape
func VreinterpretP8S8(r *arm.Poly8X8, v0 *arm.Int8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP8S16 VreinterpretP8S16
//go:noescape
func VreinterpretP8S16(r *arm.Poly8X8, v0 *arm.Int16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP8S32 VreinterpretP8S32
//go:noescape
func VreinterpretP8S32(r *arm.Poly8X8, v0 *arm.Int32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP8S64 VreinterpretP8S64
//go:noescape
func VreinterpretP8S64(r *arm.Poly8X8, v0 *arm.Int64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP8U8 VreinterpretP8U8
//go:noescape
func VreinterpretP8U8(r *arm.Poly8X8, v0 *arm.Uint8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP8U16 VreinterpretP8U16
//go:noescape
func VreinterpretP8U16(r *arm.Poly8X8, v0 *arm.Uint16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP8U32 VreinterpretP8U32
//go:noescape
func VreinterpretP8U32(r *arm.Poly8X8, v0 *arm.Uint32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP8U64 VreinterpretP8U64
//go:noescape
func VreinterpretP8U64(r *arm.Poly8X8, v0 *arm.Uint64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP8F32 VreinterpretP8F32
//go:noescape
func VreinterpretP8F32(r *arm.Poly8X8, v0 *arm.Float32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP8F64 VreinterpretP8F64
//go:noescape
func VreinterpretP8F64(r *arm.Poly8X8, v0 *arm.Float64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP8P16 VreinterpretP8P16
//go:noescape
func VreinterpretP8P16(r *arm.Poly8X8, v0 *arm.Poly16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretP8P64 VreinterpretP8P64
//go:noescape
func VreinterpretP8P64(r *arm.Poly8X8, v0 *arm.Poly64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS16S8 VreinterpretS16S8
//go:noescape
func VreinterpretS16S8(r *arm.Int16X4, v0 *arm.Int8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS16S32 VreinterpretS16S32
//go:noescape
func VreinterpretS16S32(r *arm.Int16X4, v0 *arm.Int32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS16S64 VreinterpretS16S64
//go:noescape
func VreinterpretS16S64(r *arm.Int16X4, v0 *arm.Int64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS16U8 VreinterpretS16U8
//go:noescape
func VreinterpretS16U8(r *arm.Int16X4, v0 *arm.Uint8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS16U16 VreinterpretS16U16
//go:noescape
func VreinterpretS16U16(r *arm.Int16X4, v0 *arm.Uint16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS16U32 VreinterpretS16U32
//go:noescape
func VreinterpretS16U32(r *arm.Int16X4, v0 *arm.Uint32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS16U64 VreinterpretS16U64
//go:noescape
func VreinterpretS16U64(r *arm.Int16X4, v0 *arm.Uint64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS16F32 VreinterpretS16F32
//go:noescape
func VreinterpretS16F32(r *arm.Int16X4, v0 *arm.Float32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS16F64 VreinterpretS16F64
//go:noescape
func VreinterpretS16F64(r *arm.Int16X4, v0 *arm.Float64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS16P16 VreinterpretS16P16
//go:noescape
func VreinterpretS16P16(r *arm.Int16X4, v0 *arm.Poly16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS16P64 VreinterpretS16P64
//go:noescape
func VreinterpretS16P64(r *arm.Int16X4, v0 *arm.Poly64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS16P8 VreinterpretS16P8
//go:noescape
func VreinterpretS16P8(r *arm.Int16X4, v0 *arm.Poly8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS32S8 VreinterpretS32S8
//go:noescape
func VreinterpretS32S8(r *arm.Int32X2, v0 *arm.Int8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS32S16 VreinterpretS32S16
//go:noescape
func VreinterpretS32S16(r *arm.Int32X2, v0 *arm.Int16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS32S64 VreinterpretS32S64
//go:noescape
func VreinterpretS32S64(r *arm.Int32X2, v0 *arm.Int64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS32U8 VreinterpretS32U8
//go:noescape
func VreinterpretS32U8(r *arm.Int32X2, v0 *arm.Uint8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS32U16 VreinterpretS32U16
//go:noescape
func VreinterpretS32U16(r *arm.Int32X2, v0 *arm.Uint16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS32U32 VreinterpretS32U32
//go:noescape
func VreinterpretS32U32(r *arm.Int32X2, v0 *arm.Uint32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS32U64 VreinterpretS32U64
//go:noescape
func VreinterpretS32U64(r *arm.Int32X2, v0 *arm.Uint64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS32F32 VreinterpretS32F32
//go:noescape
func VreinterpretS32F32(r *arm.Int32X2, v0 *arm.Float32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS32F64 VreinterpretS32F64
//go:noescape
func VreinterpretS32F64(r *arm.Int32X2, v0 *arm.Float64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS32P16 VreinterpretS32P16
//go:noescape
func VreinterpretS32P16(r *arm.Int32X2, v0 *arm.Poly16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS32P64 VreinterpretS32P64
//go:noescape
func VreinterpretS32P64(r *arm.Int32X2, v0 *arm.Poly64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS32P8 VreinterpretS32P8
//go:noescape
func VreinterpretS32P8(r *arm.Int32X2, v0 *arm.Poly8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS64S8 VreinterpretS64S8
//go:noescape
func VreinterpretS64S8(r *arm.Int64X1, v0 *arm.Int8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS64S16 VreinterpretS64S16
//go:noescape
func VreinterpretS64S16(r *arm.Int64X1, v0 *arm.Int16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS64S32 VreinterpretS64S32
//go:noescape
func VreinterpretS64S32(r *arm.Int64X1, v0 *arm.Int32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS64U8 VreinterpretS64U8
//go:noescape
func VreinterpretS64U8(r *arm.Int64X1, v0 *arm.Uint8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS64U16 VreinterpretS64U16
//go:noescape
func VreinterpretS64U16(r *arm.Int64X1, v0 *arm.Uint16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS64U32 VreinterpretS64U32
//go:noescape
func VreinterpretS64U32(r *arm.Int64X1, v0 *arm.Uint32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS64U64 VreinterpretS64U64
//go:noescape
func VreinterpretS64U64(r *arm.Int64X1, v0 *arm.Uint64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS64F32 VreinterpretS64F32
//go:noescape
func VreinterpretS64F32(r *arm.Int64X1, v0 *arm.Float32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS64F64 VreinterpretS64F64
//go:noescape
func VreinterpretS64F64(r *arm.Int64X1, v0 *arm.Float64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS64P16 VreinterpretS64P16
//go:noescape
func VreinterpretS64P16(r *arm.Int64X1, v0 *arm.Poly16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS64P64 VreinterpretS64P64
//go:noescape
func VreinterpretS64P64(r *arm.Int64X1, v0 *arm.Poly64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS64P8 VreinterpretS64P8
//go:noescape
func VreinterpretS64P8(r *arm.Int64X1, v0 *arm.Poly8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS8S16 VreinterpretS8S16
//go:noescape
func VreinterpretS8S16(r *arm.Int8X8, v0 *arm.Int16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS8S32 VreinterpretS8S32
//go:noescape
func VreinterpretS8S32(r *arm.Int8X8, v0 *arm.Int32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS8S64 VreinterpretS8S64
//go:noescape
func VreinterpretS8S64(r *arm.Int8X8, v0 *arm.Int64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS8U8 VreinterpretS8U8
//go:noescape
func VreinterpretS8U8(r *arm.Int8X8, v0 *arm.Uint8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS8U16 VreinterpretS8U16
//go:noescape
func VreinterpretS8U16(r *arm.Int8X8, v0 *arm.Uint16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS8U32 VreinterpretS8U32
//go:noescape
func VreinterpretS8U32(r *arm.Int8X8, v0 *arm.Uint32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS8U64 VreinterpretS8U64
//go:noescape
func VreinterpretS8U64(r *arm.Int8X8, v0 *arm.Uint64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS8F32 VreinterpretS8F32
//go:noescape
func VreinterpretS8F32(r *arm.Int8X8, v0 *arm.Float32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS8F64 VreinterpretS8F64
//go:noescape
func VreinterpretS8F64(r *arm.Int8X8, v0 *arm.Float64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS8P16 VreinterpretS8P16
//go:noescape
func VreinterpretS8P16(r *arm.Int8X8, v0 *arm.Poly16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS8P64 VreinterpretS8P64
//go:noescape
func VreinterpretS8P64(r *arm.Int8X8, v0 *arm.Poly64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS8P8 VreinterpretS8P8
//go:noescape
func VreinterpretS8P8(r *arm.Int8X8, v0 *arm.Poly8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU16S8 VreinterpretU16S8
//go:noescape
func VreinterpretU16S8(r *arm.Uint16X4, v0 *arm.Int8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU16S16 VreinterpretU16S16
//go:noescape
func VreinterpretU16S16(r *arm.Uint16X4, v0 *arm.Int16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU16S32 VreinterpretU16S32
//go:noescape
func VreinterpretU16S32(r *arm.Uint16X4, v0 *arm.Int32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU16S64 VreinterpretU16S64
//go:noescape
func VreinterpretU16S64(r *arm.Uint16X4, v0 *arm.Int64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU16U8 VreinterpretU16U8
//go:noescape
func VreinterpretU16U8(r *arm.Uint16X4, v0 *arm.Uint8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU16U32 VreinterpretU16U32
//go:noescape
func VreinterpretU16U32(r *arm.Uint16X4, v0 *arm.Uint32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU16U64 VreinterpretU16U64
//go:noescape
func VreinterpretU16U64(r *arm.Uint16X4, v0 *arm.Uint64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU16F32 VreinterpretU16F32
//go:noescape
func VreinterpretU16F32(r *arm.Uint16X4, v0 *arm.Float32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU16F64 VreinterpretU16F64
//go:noescape
func VreinterpretU16F64(r *arm.Uint16X4, v0 *arm.Float64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU16P16 VreinterpretU16P16
//go:noescape
func VreinterpretU16P16(r *arm.Uint16X4, v0 *arm.Poly16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU16P64 VreinterpretU16P64
//go:noescape
func VreinterpretU16P64(r *arm.Uint16X4, v0 *arm.Poly64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU16P8 VreinterpretU16P8
//go:noescape
func VreinterpretU16P8(r *arm.Uint16X4, v0 *arm.Poly8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU32S8 VreinterpretU32S8
//go:noescape
func VreinterpretU32S8(r *arm.Uint32X2, v0 *arm.Int8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU32S16 VreinterpretU32S16
//go:noescape
func VreinterpretU32S16(r *arm.Uint32X2, v0 *arm.Int16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU32S32 VreinterpretU32S32
//go:noescape
func VreinterpretU32S32(r *arm.Uint32X2, v0 *arm.Int32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU32S64 VreinterpretU32S64
//go:noescape
func VreinterpretU32S64(r *arm.Uint32X2, v0 *arm.Int64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU32U8 VreinterpretU32U8
//go:noescape
func VreinterpretU32U8(r *arm.Uint32X2, v0 *arm.Uint8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU32U16 VreinterpretU32U16
//go:noescape
func VreinterpretU32U16(r *arm.Uint32X2, v0 *arm.Uint16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU32U64 VreinterpretU32U64
//go:noescape
func VreinterpretU32U64(r *arm.Uint32X2, v0 *arm.Uint64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU32F32 VreinterpretU32F32
//go:noescape
func VreinterpretU32F32(r *arm.Uint32X2, v0 *arm.Float32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU32F64 VreinterpretU32F64
//go:noescape
func VreinterpretU32F64(r *arm.Uint32X2, v0 *arm.Float64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU32P16 VreinterpretU32P16
//go:noescape
func VreinterpretU32P16(r *arm.Uint32X2, v0 *arm.Poly16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU32P64 VreinterpretU32P64
//go:noescape
func VreinterpretU32P64(r *arm.Uint32X2, v0 *arm.Poly64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU32P8 VreinterpretU32P8
//go:noescape
func VreinterpretU32P8(r *arm.Uint32X2, v0 *arm.Poly8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU64S8 VreinterpretU64S8
//go:noescape
func VreinterpretU64S8(r *arm.Uint64X1, v0 *arm.Int8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU64S16 VreinterpretU64S16
//go:noescape
func VreinterpretU64S16(r *arm.Uint64X1, v0 *arm.Int16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU64S32 VreinterpretU64S32
//go:noescape
func VreinterpretU64S32(r *arm.Uint64X1, v0 *arm.Int32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU64S64 VreinterpretU64S64
//go:noescape
func VreinterpretU64S64(r *arm.Uint64X1, v0 *arm.Int64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU64U8 VreinterpretU64U8
//go:noescape
func VreinterpretU64U8(r *arm.Uint64X1, v0 *arm.Uint8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU64U16 VreinterpretU64U16
//go:noescape
func VreinterpretU64U16(r *arm.Uint64X1, v0 *arm.Uint16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU64U32 VreinterpretU64U32
//go:noescape
func VreinterpretU64U32(r *arm.Uint64X1, v0 *arm.Uint32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU64F32 VreinterpretU64F32
//go:noescape
func VreinterpretU64F32(r *arm.Uint64X1, v0 *arm.Float32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU64F64 VreinterpretU64F64
//go:noescape
func VreinterpretU64F64(r *arm.Uint64X1, v0 *arm.Float64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU64P16 VreinterpretU64P16
//go:noescape
func VreinterpretU64P16(r *arm.Uint64X1, v0 *arm.Poly16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU64P64 VreinterpretU64P64
//go:noescape
func VreinterpretU64P64(r *arm.Uint64X1, v0 *arm.Poly64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU64P8 VreinterpretU64P8
//go:noescape
func VreinterpretU64P8(r *arm.Uint64X1, v0 *arm.Poly8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU8S8 VreinterpretU8S8
//go:noescape
func VreinterpretU8S8(r *arm.Uint8X8, v0 *arm.Int8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU8S16 VreinterpretU8S16
//go:noescape
func VreinterpretU8S16(r *arm.Uint8X8, v0 *arm.Int16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU8S32 VreinterpretU8S32
//go:noescape
func VreinterpretU8S32(r *arm.Uint8X8, v0 *arm.Int32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU8S64 VreinterpretU8S64
//go:noescape
func VreinterpretU8S64(r *arm.Uint8X8, v0 *arm.Int64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU8U16 VreinterpretU8U16
//go:noescape
func VreinterpretU8U16(r *arm.Uint8X8, v0 *arm.Uint16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU8U32 VreinterpretU8U32
//go:noescape
func VreinterpretU8U32(r *arm.Uint8X8, v0 *arm.Uint32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU8U64 VreinterpretU8U64
//go:noescape
func VreinterpretU8U64(r *arm.Uint8X8, v0 *arm.Uint64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU8F32 VreinterpretU8F32
//go:noescape
func VreinterpretU8F32(r *arm.Uint8X8, v0 *arm.Float32X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU8F64 VreinterpretU8F64
//go:noescape
func VreinterpretU8F64(r *arm.Uint8X8, v0 *arm.Float64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU8P16 VreinterpretU8P16
//go:noescape
func VreinterpretU8P16(r *arm.Uint8X8, v0 *arm.Poly16X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU8P64 VreinterpretU8P64
//go:noescape
func VreinterpretU8P64(r *arm.Uint8X8, v0 *arm.Poly64X1)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU8P8 VreinterpretU8P8
//go:noescape
func VreinterpretU8P8(r *arm.Uint8X8, v0 *arm.Poly8X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqF32S8 VreinterpretqF32S8
//go:noescape
func VreinterpretqF32S8(r *arm.Float32X4, v0 *arm.Int8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqF32S16 VreinterpretqF32S16
//go:noescape
func VreinterpretqF32S16(r *arm.Float32X4, v0 *arm.Int16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqF32S32 VreinterpretqF32S32
//go:noescape
func VreinterpretqF32S32(r *arm.Float32X4, v0 *arm.Int32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqF32S64 VreinterpretqF32S64
//go:noescape
func VreinterpretqF32S64(r *arm.Float32X4, v0 *arm.Int64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqF32U8 VreinterpretqF32U8
//go:noescape
func VreinterpretqF32U8(r *arm.Float32X4, v0 *arm.Uint8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqF32U16 VreinterpretqF32U16
//go:noescape
func VreinterpretqF32U16(r *arm.Float32X4, v0 *arm.Uint16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqF32U32 VreinterpretqF32U32
//go:noescape
func VreinterpretqF32U32(r *arm.Float32X4, v0 *arm.Uint32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqF32U64 VreinterpretqF32U64
//go:noescape
func VreinterpretqF32U64(r *arm.Float32X4, v0 *arm.Uint64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqF32F64 VreinterpretqF32F64
//go:noescape
func VreinterpretqF32F64(r *arm.Float32X4, v0 *arm.Float64X2)

// vreinterpretq_f32_p128
//
//go:linkname VreinterpretqF32P128 VreinterpretqF32P128
//go:noescape
func VreinterpretqF32P128(r *arm.Float32X4, v0 *arm.Poly128)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqF32P16 VreinterpretqF32P16
//go:noescape
func VreinterpretqF32P16(r *arm.Float32X4, v0 *arm.Poly16X8)

// vreinterpretq_f32_p64
//
//go:linkname VreinterpretqF32P64 VreinterpretqF32P64
//go:noescape
func VreinterpretqF32P64(r *arm.Float32X4, v0 *arm.Poly64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqF32P8 VreinterpretqF32P8
//go:noescape
func VreinterpretqF32P8(r *arm.Float32X4, v0 *arm.Poly8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqF64S8 VreinterpretqF64S8
//go:noescape
func VreinterpretqF64S8(r *arm.Float64X2, v0 *arm.Int8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqF64S16 VreinterpretqF64S16
//go:noescape
func VreinterpretqF64S16(r *arm.Float64X2, v0 *arm.Int16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqF64S32 VreinterpretqF64S32
//go:noescape
func VreinterpretqF64S32(r *arm.Float64X2, v0 *arm.Int32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqF64S64 VreinterpretqF64S64
//go:noescape
func VreinterpretqF64S64(r *arm.Float64X2, v0 *arm.Int64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqF64U8 VreinterpretqF64U8
//go:noescape
func VreinterpretqF64U8(r *arm.Float64X2, v0 *arm.Uint8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqF64U16 VreinterpretqF64U16
//go:noescape
func VreinterpretqF64U16(r *arm.Float64X2, v0 *arm.Uint16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqF64U32 VreinterpretqF64U32
//go:noescape
func VreinterpretqF64U32(r *arm.Float64X2, v0 *arm.Uint32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqF64U64 VreinterpretqF64U64
//go:noescape
func VreinterpretqF64U64(r *arm.Float64X2, v0 *arm.Uint64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqF64F32 VreinterpretqF64F32
//go:noescape
func VreinterpretqF64F32(r *arm.Float64X2, v0 *arm.Float32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqF64P128 VreinterpretqF64P128
//go:noescape
func VreinterpretqF64P128(r *arm.Float64X2, v0 *arm.Poly128)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqF64P16 VreinterpretqF64P16
//go:noescape
func VreinterpretqF64P16(r *arm.Float64X2, v0 *arm.Poly16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqF64P64 VreinterpretqF64P64
//go:noescape
func VreinterpretqF64P64(r *arm.Float64X2, v0 *arm.Poly64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqF64P8 VreinterpretqF64P8
//go:noescape
func VreinterpretqF64P8(r *arm.Float64X2, v0 *arm.Poly8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP128S8 VreinterpretqP128S8
//go:noescape
func VreinterpretqP128S8(r *arm.Poly128, v0 *arm.Int8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP128S16 VreinterpretqP128S16
//go:noescape
func VreinterpretqP128S16(r *arm.Poly128, v0 *arm.Int16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP128S32 VreinterpretqP128S32
//go:noescape
func VreinterpretqP128S32(r *arm.Poly128, v0 *arm.Int32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP128S64 VreinterpretqP128S64
//go:noescape
func VreinterpretqP128S64(r *arm.Poly128, v0 *arm.Int64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP128U8 VreinterpretqP128U8
//go:noescape
func VreinterpretqP128U8(r *arm.Poly128, v0 *arm.Uint8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP128U16 VreinterpretqP128U16
//go:noescape
func VreinterpretqP128U16(r *arm.Poly128, v0 *arm.Uint16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP128U32 VreinterpretqP128U32
//go:noescape
func VreinterpretqP128U32(r *arm.Poly128, v0 *arm.Uint32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP128U64 VreinterpretqP128U64
//go:noescape
func VreinterpretqP128U64(r *arm.Poly128, v0 *arm.Uint64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP128F32 VreinterpretqP128F32
//go:noescape
func VreinterpretqP128F32(r *arm.Poly128, v0 *arm.Float32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP128F64 VreinterpretqP128F64
//go:noescape
func VreinterpretqP128F64(r *arm.Poly128, v0 *arm.Float64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP128P16 VreinterpretqP128P16
//go:noescape
func VreinterpretqP128P16(r *arm.Poly128, v0 *arm.Poly16X8)

// vreinterpretq_p128_p64
//
//go:linkname VreinterpretqP128P64 VreinterpretqP128P64
//go:noescape
func VreinterpretqP128P64(r *arm.Poly128, v0 *arm.Poly64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP128P8 VreinterpretqP128P8
//go:noescape
func VreinterpretqP128P8(r *arm.Poly128, v0 *arm.Poly8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP16S8 VreinterpretqP16S8
//go:noescape
func VreinterpretqP16S8(r *arm.Poly16X8, v0 *arm.Int8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP16S16 VreinterpretqP16S16
//go:noescape
func VreinterpretqP16S16(r *arm.Poly16X8, v0 *arm.Int16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP16S32 VreinterpretqP16S32
//go:noescape
func VreinterpretqP16S32(r *arm.Poly16X8, v0 *arm.Int32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP16S64 VreinterpretqP16S64
//go:noescape
func VreinterpretqP16S64(r *arm.Poly16X8, v0 *arm.Int64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP16U8 VreinterpretqP16U8
//go:noescape
func VreinterpretqP16U8(r *arm.Poly16X8, v0 *arm.Uint8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP16U16 VreinterpretqP16U16
//go:noescape
func VreinterpretqP16U16(r *arm.Poly16X8, v0 *arm.Uint16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP16U32 VreinterpretqP16U32
//go:noescape
func VreinterpretqP16U32(r *arm.Poly16X8, v0 *arm.Uint32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP16U64 VreinterpretqP16U64
//go:noescape
func VreinterpretqP16U64(r *arm.Poly16X8, v0 *arm.Uint64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP16F32 VreinterpretqP16F32
//go:noescape
func VreinterpretqP16F32(r *arm.Poly16X8, v0 *arm.Float32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP16F64 VreinterpretqP16F64
//go:noescape
func VreinterpretqP16F64(r *arm.Poly16X8, v0 *arm.Float64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP16P128 VreinterpretqP16P128
//go:noescape
func VreinterpretqP16P128(r *arm.Poly16X8, v0 *arm.Poly128)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP16P64 VreinterpretqP16P64
//go:noescape
func VreinterpretqP16P64(r *arm.Poly16X8, v0 *arm.Poly64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP16P8 VreinterpretqP16P8
//go:noescape
func VreinterpretqP16P8(r *arm.Poly16X8, v0 *arm.Poly8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP64S8 VreinterpretqP64S8
//go:noescape
func VreinterpretqP64S8(r *arm.Poly64X2, v0 *arm.Int8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP64S16 VreinterpretqP64S16
//go:noescape
func VreinterpretqP64S16(r *arm.Poly64X2, v0 *arm.Int16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP64S32 VreinterpretqP64S32
//go:noescape
func VreinterpretqP64S32(r *arm.Poly64X2, v0 *arm.Int32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP64S64 VreinterpretqP64S64
//go:noescape
func VreinterpretqP64S64(r *arm.Poly64X2, v0 *arm.Int64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP64U8 VreinterpretqP64U8
//go:noescape
func VreinterpretqP64U8(r *arm.Poly64X2, v0 *arm.Uint8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP64U16 VreinterpretqP64U16
//go:noescape
func VreinterpretqP64U16(r *arm.Poly64X2, v0 *arm.Uint16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP64U32 VreinterpretqP64U32
//go:noescape
func VreinterpretqP64U32(r *arm.Poly64X2, v0 *arm.Uint32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP64U64 VreinterpretqP64U64
//go:noescape
func VreinterpretqP64U64(r *arm.Poly64X2, v0 *arm.Uint64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP64F32 VreinterpretqP64F32
//go:noescape
func VreinterpretqP64F32(r *arm.Poly64X2, v0 *arm.Float32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP64F64 VreinterpretqP64F64
//go:noescape
func VreinterpretqP64F64(r *arm.Poly64X2, v0 *arm.Float64X2)

// vreinterpretq_p64_p128
//
//go:linkname VreinterpretqP64P128 VreinterpretqP64P128
//go:noescape
func VreinterpretqP64P128(r *arm.Poly64X2, v0 *arm.Poly128)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP64P16 VreinterpretqP64P16
//go:noescape
func VreinterpretqP64P16(r *arm.Poly64X2, v0 *arm.Poly16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP64P8 VreinterpretqP64P8
//go:noescape
func VreinterpretqP64P8(r *arm.Poly64X2, v0 *arm.Poly8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP8S8 VreinterpretqP8S8
//go:noescape
func VreinterpretqP8S8(r *arm.Poly8X16, v0 *arm.Int8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP8S16 VreinterpretqP8S16
//go:noescape
func VreinterpretqP8S16(r *arm.Poly8X16, v0 *arm.Int16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP8S32 VreinterpretqP8S32
//go:noescape
func VreinterpretqP8S32(r *arm.Poly8X16, v0 *arm.Int32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP8S64 VreinterpretqP8S64
//go:noescape
func VreinterpretqP8S64(r *arm.Poly8X16, v0 *arm.Int64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP8U8 VreinterpretqP8U8
//go:noescape
func VreinterpretqP8U8(r *arm.Poly8X16, v0 *arm.Uint8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP8U16 VreinterpretqP8U16
//go:noescape
func VreinterpretqP8U16(r *arm.Poly8X16, v0 *arm.Uint16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP8U32 VreinterpretqP8U32
//go:noescape
func VreinterpretqP8U32(r *arm.Poly8X16, v0 *arm.Uint32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP8U64 VreinterpretqP8U64
//go:noescape
func VreinterpretqP8U64(r *arm.Poly8X16, v0 *arm.Uint64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP8F32 VreinterpretqP8F32
//go:noescape
func VreinterpretqP8F32(r *arm.Poly8X16, v0 *arm.Float32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP8F64 VreinterpretqP8F64
//go:noescape
func VreinterpretqP8F64(r *arm.Poly8X16, v0 *arm.Float64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP8P128 VreinterpretqP8P128
//go:noescape
func VreinterpretqP8P128(r *arm.Poly8X16, v0 *arm.Poly128)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP8P16 VreinterpretqP8P16
//go:noescape
func VreinterpretqP8P16(r *arm.Poly8X16, v0 *arm.Poly16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqP8P64 VreinterpretqP8P64
//go:noescape
func VreinterpretqP8P64(r *arm.Poly8X16, v0 *arm.Poly64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS16S8 VreinterpretqS16S8
//go:noescape
func VreinterpretqS16S8(r *arm.Int16X8, v0 *arm.Int8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS16S32 VreinterpretqS16S32
//go:noescape
func VreinterpretqS16S32(r *arm.Int16X8, v0 *arm.Int32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS16S64 VreinterpretqS16S64
//go:noescape
func VreinterpretqS16S64(r *arm.Int16X8, v0 *arm.Int64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS16U8 VreinterpretqS16U8
//go:noescape
func VreinterpretqS16U8(r *arm.Int16X8, v0 *arm.Uint8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS16U16 VreinterpretqS16U16
//go:noescape
func VreinterpretqS16U16(r *arm.Int16X8, v0 *arm.Uint16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS16U32 VreinterpretqS16U32
//go:noescape
func VreinterpretqS16U32(r *arm.Int16X8, v0 *arm.Uint32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS16U64 VreinterpretqS16U64
//go:noescape
func VreinterpretqS16U64(r *arm.Int16X8, v0 *arm.Uint64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS16F32 VreinterpretqS16F32
//go:noescape
func VreinterpretqS16F32(r *arm.Int16X8, v0 *arm.Float32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS16F64 VreinterpretqS16F64
//go:noescape
func VreinterpretqS16F64(r *arm.Int16X8, v0 *arm.Float64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS16P128 VreinterpretqS16P128
//go:noescape
func VreinterpretqS16P128(r *arm.Int16X8, v0 *arm.Poly128)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS16P16 VreinterpretqS16P16
//go:noescape
func VreinterpretqS16P16(r *arm.Int16X8, v0 *arm.Poly16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS16P64 VreinterpretqS16P64
//go:noescape
func VreinterpretqS16P64(r *arm.Int16X8, v0 *arm.Poly64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS16P8 VreinterpretqS16P8
//go:noescape
func VreinterpretqS16P8(r *arm.Int16X8, v0 *arm.Poly8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS32S8 VreinterpretqS32S8
//go:noescape
func VreinterpretqS32S8(r *arm.Int32X4, v0 *arm.Int8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS32S16 VreinterpretqS32S16
//go:noescape
func VreinterpretqS32S16(r *arm.Int32X4, v0 *arm.Int16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS32S64 VreinterpretqS32S64
//go:noescape
func VreinterpretqS32S64(r *arm.Int32X4, v0 *arm.Int64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS32U8 VreinterpretqS32U8
//go:noescape
func VreinterpretqS32U8(r *arm.Int32X4, v0 *arm.Uint8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS32U16 VreinterpretqS32U16
//go:noescape
func VreinterpretqS32U16(r *arm.Int32X4, v0 *arm.Uint16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS32U32 VreinterpretqS32U32
//go:noescape
func VreinterpretqS32U32(r *arm.Int32X4, v0 *arm.Uint32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS32U64 VreinterpretqS32U64
//go:noescape
func VreinterpretqS32U64(r *arm.Int32X4, v0 *arm.Uint64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS32F32 VreinterpretqS32F32
//go:noescape
func VreinterpretqS32F32(r *arm.Int32X4, v0 *arm.Float32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS32F64 VreinterpretqS32F64
//go:noescape
func VreinterpretqS32F64(r *arm.Int32X4, v0 *arm.Float64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS32P128 VreinterpretqS32P128
//go:noescape
func VreinterpretqS32P128(r *arm.Int32X4, v0 *arm.Poly128)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS32P16 VreinterpretqS32P16
//go:noescape
func VreinterpretqS32P16(r *arm.Int32X4, v0 *arm.Poly16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS32P64 VreinterpretqS32P64
//go:noescape
func VreinterpretqS32P64(r *arm.Int32X4, v0 *arm.Poly64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS32P8 VreinterpretqS32P8
//go:noescape
func VreinterpretqS32P8(r *arm.Int32X4, v0 *arm.Poly8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS64S8 VreinterpretqS64S8
//go:noescape
func VreinterpretqS64S8(r *arm.Int64X2, v0 *arm.Int8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS64S16 VreinterpretqS64S16
//go:noescape
func VreinterpretqS64S16(r *arm.Int64X2, v0 *arm.Int16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS64S32 VreinterpretqS64S32
//go:noescape
func VreinterpretqS64S32(r *arm.Int64X2, v0 *arm.Int32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS64U8 VreinterpretqS64U8
//go:noescape
func VreinterpretqS64U8(r *arm.Int64X2, v0 *arm.Uint8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS64U16 VreinterpretqS64U16
//go:noescape
func VreinterpretqS64U16(r *arm.Int64X2, v0 *arm.Uint16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS64U32 VreinterpretqS64U32
//go:noescape
func VreinterpretqS64U32(r *arm.Int64X2, v0 *arm.Uint32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS64U64 VreinterpretqS64U64
//go:noescape
func VreinterpretqS64U64(r *arm.Int64X2, v0 *arm.Uint64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS64F32 VreinterpretqS64F32
//go:noescape
func VreinterpretqS64F32(r *arm.Int64X2, v0 *arm.Float32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS64F64 VreinterpretqS64F64
//go:noescape
func VreinterpretqS64F64(r *arm.Int64X2, v0 *arm.Float64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS64P128 VreinterpretqS64P128
//go:noescape
func VreinterpretqS64P128(r *arm.Int64X2, v0 *arm.Poly128)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS64P16 VreinterpretqS64P16
//go:noescape
func VreinterpretqS64P16(r *arm.Int64X2, v0 *arm.Poly16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS64P64 VreinterpretqS64P64
//go:noescape
func VreinterpretqS64P64(r *arm.Int64X2, v0 *arm.Poly64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS64P8 VreinterpretqS64P8
//go:noescape
func VreinterpretqS64P8(r *arm.Int64X2, v0 *arm.Poly8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS8S16 VreinterpretqS8S16
//go:noescape
func VreinterpretqS8S16(r *arm.Int8X16, v0 *arm.Int16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS8S32 VreinterpretqS8S32
//go:noescape
func VreinterpretqS8S32(r *arm.Int8X16, v0 *arm.Int32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS8S64 VreinterpretqS8S64
//go:noescape
func VreinterpretqS8S64(r *arm.Int8X16, v0 *arm.Int64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS8U8 VreinterpretqS8U8
//go:noescape
func VreinterpretqS8U8(r *arm.Int8X16, v0 *arm.Uint8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS8U16 VreinterpretqS8U16
//go:noescape
func VreinterpretqS8U16(r *arm.Int8X16, v0 *arm.Uint16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS8U32 VreinterpretqS8U32
//go:noescape
func VreinterpretqS8U32(r *arm.Int8X16, v0 *arm.Uint32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS8U64 VreinterpretqS8U64
//go:noescape
func VreinterpretqS8U64(r *arm.Int8X16, v0 *arm.Uint64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS8F32 VreinterpretqS8F32
//go:noescape
func VreinterpretqS8F32(r *arm.Int8X16, v0 *arm.Float32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS8F64 VreinterpretqS8F64
//go:noescape
func VreinterpretqS8F64(r *arm.Int8X16, v0 *arm.Float64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS8P128 VreinterpretqS8P128
//go:noescape
func VreinterpretqS8P128(r *arm.Int8X16, v0 *arm.Poly128)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS8P16 VreinterpretqS8P16
//go:noescape
func VreinterpretqS8P16(r *arm.Int8X16, v0 *arm.Poly16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS8P64 VreinterpretqS8P64
//go:noescape
func VreinterpretqS8P64(r *arm.Int8X16, v0 *arm.Poly64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS8P8 VreinterpretqS8P8
//go:noescape
func VreinterpretqS8P8(r *arm.Int8X16, v0 *arm.Poly8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU16S8 VreinterpretqU16S8
//go:noescape
func VreinterpretqU16S8(r *arm.Uint16X8, v0 *arm.Int8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU16S16 VreinterpretqU16S16
//go:noescape
func VreinterpretqU16S16(r *arm.Uint16X8, v0 *arm.Int16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU16S32 VreinterpretqU16S32
//go:noescape
func VreinterpretqU16S32(r *arm.Uint16X8, v0 *arm.Int32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU16S64 VreinterpretqU16S64
//go:noescape
func VreinterpretqU16S64(r *arm.Uint16X8, v0 *arm.Int64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU16U8 VreinterpretqU16U8
//go:noescape
func VreinterpretqU16U8(r *arm.Uint16X8, v0 *arm.Uint8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU16U32 VreinterpretqU16U32
//go:noescape
func VreinterpretqU16U32(r *arm.Uint16X8, v0 *arm.Uint32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU16U64 VreinterpretqU16U64
//go:noescape
func VreinterpretqU16U64(r *arm.Uint16X8, v0 *arm.Uint64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU16F32 VreinterpretqU16F32
//go:noescape
func VreinterpretqU16F32(r *arm.Uint16X8, v0 *arm.Float32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU16F64 VreinterpretqU16F64
//go:noescape
func VreinterpretqU16F64(r *arm.Uint16X8, v0 *arm.Float64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU16P128 VreinterpretqU16P128
//go:noescape
func VreinterpretqU16P128(r *arm.Uint16X8, v0 *arm.Poly128)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU16P16 VreinterpretqU16P16
//go:noescape
func VreinterpretqU16P16(r *arm.Uint16X8, v0 *arm.Poly16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU16P64 VreinterpretqU16P64
//go:noescape
func VreinterpretqU16P64(r *arm.Uint16X8, v0 *arm.Poly64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU16P8 VreinterpretqU16P8
//go:noescape
func VreinterpretqU16P8(r *arm.Uint16X8, v0 *arm.Poly8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU32S8 VreinterpretqU32S8
//go:noescape
func VreinterpretqU32S8(r *arm.Uint32X4, v0 *arm.Int8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU32S16 VreinterpretqU32S16
//go:noescape
func VreinterpretqU32S16(r *arm.Uint32X4, v0 *arm.Int16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU32S32 VreinterpretqU32S32
//go:noescape
func VreinterpretqU32S32(r *arm.Uint32X4, v0 *arm.Int32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU32S64 VreinterpretqU32S64
//go:noescape
func VreinterpretqU32S64(r *arm.Uint32X4, v0 *arm.Int64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU32U8 VreinterpretqU32U8
//go:noescape
func VreinterpretqU32U8(r *arm.Uint32X4, v0 *arm.Uint8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU32U16 VreinterpretqU32U16
//go:noescape
func VreinterpretqU32U16(r *arm.Uint32X4, v0 *arm.Uint16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU32U64 VreinterpretqU32U64
//go:noescape
func VreinterpretqU32U64(r *arm.Uint32X4, v0 *arm.Uint64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU32F32 VreinterpretqU32F32
//go:noescape
func VreinterpretqU32F32(r *arm.Uint32X4, v0 *arm.Float32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU32F64 VreinterpretqU32F64
//go:noescape
func VreinterpretqU32F64(r *arm.Uint32X4, v0 *arm.Float64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU32P128 VreinterpretqU32P128
//go:noescape
func VreinterpretqU32P128(r *arm.Uint32X4, v0 *arm.Poly128)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU32P16 VreinterpretqU32P16
//go:noescape
func VreinterpretqU32P16(r *arm.Uint32X4, v0 *arm.Poly16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU32P64 VreinterpretqU32P64
//go:noescape
func VreinterpretqU32P64(r *arm.Uint32X4, v0 *arm.Poly64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU32P8 VreinterpretqU32P8
//go:noescape
func VreinterpretqU32P8(r *arm.Uint32X4, v0 *arm.Poly8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU64S8 VreinterpretqU64S8
//go:noescape
func VreinterpretqU64S8(r *arm.Uint64X2, v0 *arm.Int8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU64S16 VreinterpretqU64S16
//go:noescape
func VreinterpretqU64S16(r *arm.Uint64X2, v0 *arm.Int16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU64S32 VreinterpretqU64S32
//go:noescape
func VreinterpretqU64S32(r *arm.Uint64X2, v0 *arm.Int32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU64S64 VreinterpretqU64S64
//go:noescape
func VreinterpretqU64S64(r *arm.Uint64X2, v0 *arm.Int64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU64U8 VreinterpretqU64U8
//go:noescape
func VreinterpretqU64U8(r *arm.Uint64X2, v0 *arm.Uint8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU64U16 VreinterpretqU64U16
//go:noescape
func VreinterpretqU64U16(r *arm.Uint64X2, v0 *arm.Uint16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU64U32 VreinterpretqU64U32
//go:noescape
func VreinterpretqU64U32(r *arm.Uint64X2, v0 *arm.Uint32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU64F32 VreinterpretqU64F32
//go:noescape
func VreinterpretqU64F32(r *arm.Uint64X2, v0 *arm.Float32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU64F64 VreinterpretqU64F64
//go:noescape
func VreinterpretqU64F64(r *arm.Uint64X2, v0 *arm.Float64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU64P128 VreinterpretqU64P128
//go:noescape
func VreinterpretqU64P128(r *arm.Uint64X2, v0 *arm.Poly128)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU64P16 VreinterpretqU64P16
//go:noescape
func VreinterpretqU64P16(r *arm.Uint64X2, v0 *arm.Poly16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU64P64 VreinterpretqU64P64
//go:noescape
func VreinterpretqU64P64(r *arm.Uint64X2, v0 *arm.Poly64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU64P8 VreinterpretqU64P8
//go:noescape
func VreinterpretqU64P8(r *arm.Uint64X2, v0 *arm.Poly8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU8S8 VreinterpretqU8S8
//go:noescape
func VreinterpretqU8S8(r *arm.Uint8X16, v0 *arm.Int8X16)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU8S16 VreinterpretqU8S16
//go:noescape
func VreinterpretqU8S16(r *arm.Uint8X16, v0 *arm.Int16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU8S32 VreinterpretqU8S32
//go:noescape
func VreinterpretqU8S32(r *arm.Uint8X16, v0 *arm.Int32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU8S64 VreinterpretqU8S64
//go:noescape
func VreinterpretqU8S64(r *arm.Uint8X16, v0 *arm.Int64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU8U16 VreinterpretqU8U16
//go:noescape
func VreinterpretqU8U16(r *arm.Uint8X16, v0 *arm.Uint16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU8U32 VreinterpretqU8U32
//go:noescape
func VreinterpretqU8U32(r *arm.Uint8X16, v0 *arm.Uint32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU8U64 VreinterpretqU8U64
//go:noescape
func VreinterpretqU8U64(r *arm.Uint8X16, v0 *arm.Uint64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU8F32 VreinterpretqU8F32
//go:noescape
func VreinterpretqU8F32(r *arm.Uint8X16, v0 *arm.Float32X4)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU8F64 VreinterpretqU8F64
//go:noescape
func VreinterpretqU8F64(r *arm.Uint8X16, v0 *arm.Float64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU8P128 VreinterpretqU8P128
//go:noescape
func VreinterpretqU8P128(r *arm.Uint8X16, v0 *arm.Poly128)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU8P16 VreinterpretqU8P16
//go:noescape
func VreinterpretqU8P16(r *arm.Uint8X16, v0 *arm.Poly16X8)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU8P64 VreinterpretqU8P64
//go:noescape
func VreinterpretqU8P64(r *arm.Uint8X16, v0 *arm.Poly64X2)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU8P8 VreinterpretqU8P8
//go:noescape
func VreinterpretqU8P8(r *arm.Uint8X16, v0 *arm.Poly8X16)

// Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev16S8 Vrev16S8
//go:noescape
func Vrev16S8(r *arm.Int8X8, v0 *arm.Int8X8)

// Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev16U8 Vrev16U8
//go:noescape
func Vrev16U8(r *arm.Uint8X8, v0 *arm.Uint8X8)

// Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev16P8 Vrev16P8
//go:noescape
func Vrev16P8(r *arm.Poly8X8, v0 *arm.Poly8X8)

// Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev16QS8 Vrev16QS8
//go:noescape
func Vrev16QS8(r *arm.Int8X16, v0 *arm.Int8X16)

// Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev16QU8 Vrev16QU8
//go:noescape
func Vrev16QU8(r *arm.Uint8X16, v0 *arm.Uint8X16)

// Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev16QP8 Vrev16QP8
//go:noescape
func Vrev16QP8(r *arm.Poly8X16, v0 *arm.Poly8X16)

// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev32S8 Vrev32S8
//go:noescape
func Vrev32S8(r *arm.Int8X8, v0 *arm.Int8X8)

// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev32S16 Vrev32S16
//go:noescape
func Vrev32S16(r *arm.Int16X4, v0 *arm.Int16X4)

// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev32U8 Vrev32U8
//go:noescape
func Vrev32U8(r *arm.Uint8X8, v0 *arm.Uint8X8)

// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev32U16 Vrev32U16
//go:noescape
func Vrev32U16(r *arm.Uint16X4, v0 *arm.Uint16X4)

// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev32P16 Vrev32P16
//go:noescape
func Vrev32P16(r *arm.Poly16X4, v0 *arm.Poly16X4)

// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev32P8 Vrev32P8
//go:noescape
func Vrev32P8(r *arm.Poly8X8, v0 *arm.Poly8X8)

// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev32QS8 Vrev32QS8
//go:noescape
func Vrev32QS8(r *arm.Int8X16, v0 *arm.Int8X16)

// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev32QS16 Vrev32QS16
//go:noescape
func Vrev32QS16(r *arm.Int16X8, v0 *arm.Int16X8)

// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev32QU8 Vrev32QU8
//go:noescape
func Vrev32QU8(r *arm.Uint8X16, v0 *arm.Uint8X16)

// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev32QU16 Vrev32QU16
//go:noescape
func Vrev32QU16(r *arm.Uint16X8, v0 *arm.Uint16X8)

// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev32QP16 Vrev32QP16
//go:noescape
func Vrev32QP16(r *arm.Poly16X8, v0 *arm.Poly16X8)

// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev32QP8 Vrev32QP8
//go:noescape
func Vrev32QP8(r *arm.Poly8X16, v0 *arm.Poly8X16)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64S8 Vrev64S8
//go:noescape
func Vrev64S8(r *arm.Int8X8, v0 *arm.Int8X8)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64S16 Vrev64S16
//go:noescape
func Vrev64S16(r *arm.Int16X4, v0 *arm.Int16X4)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64S32 Vrev64S32
//go:noescape
func Vrev64S32(r *arm.Int32X2, v0 *arm.Int32X2)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64U8 Vrev64U8
//go:noescape
func Vrev64U8(r *arm.Uint8X8, v0 *arm.Uint8X8)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64U16 Vrev64U16
//go:noescape
func Vrev64U16(r *arm.Uint16X4, v0 *arm.Uint16X4)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64U32 Vrev64U32
//go:noescape
func Vrev64U32(r *arm.Uint32X2, v0 *arm.Uint32X2)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64F32 Vrev64F32
//go:noescape
func Vrev64F32(r *arm.Float32X2, v0 *arm.Float32X2)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64P16 Vrev64P16
//go:noescape
func Vrev64P16(r *arm.Poly16X4, v0 *arm.Poly16X4)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64P8 Vrev64P8
//go:noescape
func Vrev64P8(r *arm.Poly8X8, v0 *arm.Poly8X8)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64QS8 Vrev64QS8
//go:noescape
func Vrev64QS8(r *arm.Int8X16, v0 *arm.Int8X16)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64QS16 Vrev64QS16
//go:noescape
func Vrev64QS16(r *arm.Int16X8, v0 *arm.Int16X8)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64QS32 Vrev64QS32
//go:noescape
func Vrev64QS32(r *arm.Int32X4, v0 *arm.Int32X4)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64QU8 Vrev64QU8
//go:noescape
func Vrev64QU8(r *arm.Uint8X16, v0 *arm.Uint8X16)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64QU16 Vrev64QU16
//go:noescape
func Vrev64QU16(r *arm.Uint16X8, v0 *arm.Uint16X8)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64QU32 Vrev64QU32
//go:noescape
func Vrev64QU32(r *arm.Uint32X4, v0 *arm.Uint32X4)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64QF32 Vrev64QF32
//go:noescape
func Vrev64QF32(r *arm.Float32X4, v0 *arm.Float32X4)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64QP16 Vrev64QP16
//go:noescape
func Vrev64QP16(r *arm.Poly16X8, v0 *arm.Poly16X8)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64QP8 Vrev64QP8
//go:noescape
func Vrev64QP8(r *arm.Poly8X16, v0 *arm.Poly8X16)

// Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrhaddS8 VrhaddS8
//go:noescape
func VrhaddS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrhaddS16 VrhaddS16
//go:noescape
func VrhaddS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrhaddS32 VrhaddS32
//go:noescape
func VrhaddS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrhaddU8 VrhaddU8
//go:noescape
func VrhaddU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrhaddU16 VrhaddU16
//go:noescape
func VrhaddU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrhaddU32 VrhaddU32
//go:noescape
func VrhaddU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrhaddqS8 VrhaddqS8
//go:noescape
func VrhaddqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrhaddqS16 VrhaddqS16
//go:noescape
func VrhaddqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrhaddqS32 VrhaddqS32
//go:noescape
func VrhaddqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrhaddqU8 VrhaddqU8
//go:noescape
func VrhaddqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrhaddqU16 VrhaddqU16
//go:noescape
func VrhaddqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrhaddqU32 VrhaddqU32
//go:noescape
func VrhaddqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Floating-point Round to Integral, toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndF32 VrndF32
//go:noescape
func VrndF32(r *arm.Float32X2, v0 *arm.Float32X2)

// Floating-point Round to Integral, toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndF64 VrndF64
//go:noescape
func VrndF64(r *arm.Float64X1, v0 *arm.Float64X1)

// Floating-point Round to 32-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd32XF32 Vrnd32XF32
//go:noescape
func Vrnd32XF32(r *arm.Float32X2, v0 *arm.Float32X2)

// Floating-point Round to 32-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd32XF64 Vrnd32XF64
//go:noescape
func Vrnd32XF64(r *arm.Float64X1, v0 *arm.Float64X1)

// Floating-point Round to 32-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd32XqF32 Vrnd32XqF32
//go:noescape
func Vrnd32XqF32(r *arm.Float32X4, v0 *arm.Float32X4)

// Floating-point Round to 32-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd32XqF64 Vrnd32XqF64
//go:noescape
func Vrnd32XqF64(r *arm.Float64X2, v0 *arm.Float64X2)

// Floating-point Round to 32-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd32ZF32 Vrnd32ZF32
//go:noescape
func Vrnd32ZF32(r *arm.Float32X2, v0 *arm.Float32X2)

// Floating-point Round to 32-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd32ZF64 Vrnd32ZF64
//go:noescape
func Vrnd32ZF64(r *arm.Float64X1, v0 *arm.Float64X1)

// Floating-point Round to 32-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd32ZqF32 Vrnd32ZqF32
//go:noescape
func Vrnd32ZqF32(r *arm.Float32X4, v0 *arm.Float32X4)

// Floating-point Round to 32-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd32ZqF64 Vrnd32ZqF64
//go:noescape
func Vrnd32ZqF64(r *arm.Float64X2, v0 *arm.Float64X2)

// Floating-point Round to 64-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd64XF32 Vrnd64XF32
//go:noescape
func Vrnd64XF32(r *arm.Float32X2, v0 *arm.Float32X2)

// Floating-point Round to 64-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd64XF64 Vrnd64XF64
//go:noescape
func Vrnd64XF64(r *arm.Float64X1, v0 *arm.Float64X1)

// Floating-point Round to 64-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd64XqF32 Vrnd64XqF32
//go:noescape
func Vrnd64XqF32(r *arm.Float32X4, v0 *arm.Float32X4)

// Floating-point Round to 64-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd64XqF64 Vrnd64XqF64
//go:noescape
func Vrnd64XqF64(r *arm.Float64X2, v0 *arm.Float64X2)

// Floating-point Round to 64-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd64ZF32 Vrnd64ZF32
//go:noescape
func Vrnd64ZF32(r *arm.Float32X2, v0 *arm.Float32X2)

// Floating-point Round to 64-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd64ZF64 Vrnd64ZF64
//go:noescape
func Vrnd64ZF64(r *arm.Float64X1, v0 *arm.Float64X1)

// Floating-point Round to 64-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd64ZqF32 Vrnd64ZqF32
//go:noescape
func Vrnd64ZqF32(r *arm.Float32X4, v0 *arm.Float32X4)

// Floating-point Round to 64-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd64ZqF64 Vrnd64ZqF64
//go:noescape
func Vrnd64ZqF64(r *arm.Float64X2, v0 *arm.Float64X2)

// Floating-point Round to Integral, to nearest with ties to Away (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest with Ties to Away rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndaF32 VrndaF32
//go:noescape
func VrndaF32(r *arm.Float32X2, v0 *arm.Float32X2)

// Floating-point Round to Integral, to nearest with ties to Away (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest with Ties to Away rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndaF64 VrndaF64
//go:noescape
func VrndaF64(r *arm.Float64X1, v0 *arm.Float64X1)

// Floating-point Round to Integral, to nearest with ties to Away (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest with Ties to Away rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndaqF32 VrndaqF32
//go:noescape
func VrndaqF32(r *arm.Float32X4, v0 *arm.Float32X4)

// Floating-point Round to Integral, to nearest with ties to Away (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest with Ties to Away rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndaqF64 VrndaqF64
//go:noescape
func VrndaqF64(r *arm.Float64X2, v0 *arm.Float64X2)

// Floating-point Round to Integral, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndiF32 VrndiF32
//go:noescape
func VrndiF32(r *arm.Float32X2, v0 *arm.Float32X2)

// Floating-point Round to Integral, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndiF64 VrndiF64
//go:noescape
func VrndiF64(r *arm.Float64X1, v0 *arm.Float64X1)

// Floating-point Round to Integral, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndiqF32 VrndiqF32
//go:noescape
func VrndiqF32(r *arm.Float32X4, v0 *arm.Float32X4)

// Floating-point Round to Integral, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndiqF64 VrndiqF64
//go:noescape
func VrndiqF64(r *arm.Float64X2, v0 *arm.Float64X2)

// Floating-point Round to Integral, toward Minus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndmF32 VrndmF32
//go:noescape
func VrndmF32(r *arm.Float32X2, v0 *arm.Float32X2)

// Floating-point Round to Integral, toward Minus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndmF64 VrndmF64
//go:noescape
func VrndmF64(r *arm.Float64X1, v0 *arm.Float64X1)

// Floating-point Round to Integral, toward Minus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndmqF32 VrndmqF32
//go:noescape
func VrndmqF32(r *arm.Float32X4, v0 *arm.Float32X4)

// Floating-point Round to Integral, toward Minus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndmqF64 VrndmqF64
//go:noescape
func VrndmqF64(r *arm.Float64X2, v0 *arm.Float64X2)

// Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndnF32 VrndnF32
//go:noescape
func VrndnF32(r *arm.Float32X2, v0 *arm.Float32X2)

// Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndnF64 VrndnF64
//go:noescape
func VrndnF64(r *arm.Float64X1, v0 *arm.Float64X1)

// Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndnqF32 VrndnqF32
//go:noescape
func VrndnqF32(r *arm.Float32X4, v0 *arm.Float32X4)

// Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndnqF64 VrndnqF64
//go:noescape
func VrndnqF64(r *arm.Float64X2, v0 *arm.Float64X2)

// Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndnsF32 VrndnsF32
//go:noescape
func VrndnsF32(r *arm.Float32, v0 *arm.Float32)

// Floating-point Round to Integral, toward Plus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndpF32 VrndpF32
//go:noescape
func VrndpF32(r *arm.Float32X2, v0 *arm.Float32X2)

// Floating-point Round to Integral, toward Plus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndpF64 VrndpF64
//go:noescape
func VrndpF64(r *arm.Float64X1, v0 *arm.Float64X1)

// Floating-point Round to Integral, toward Plus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndpqF32 VrndpqF32
//go:noescape
func VrndpqF32(r *arm.Float32X4, v0 *arm.Float32X4)

// Floating-point Round to Integral, toward Plus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndpqF64 VrndpqF64
//go:noescape
func VrndpqF64(r *arm.Float64X2, v0 *arm.Float64X2)

// Floating-point Round to Integral, toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndqF32 VrndqF32
//go:noescape
func VrndqF32(r *arm.Float32X4, v0 *arm.Float32X4)

// Floating-point Round to Integral, toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndqF64 VrndqF64
//go:noescape
func VrndqF64(r *arm.Float64X2, v0 *arm.Float64X2)

// Floating-point Round to Integral exact, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndxF32 VrndxF32
//go:noescape
func VrndxF32(r *arm.Float32X2, v0 *arm.Float32X2)

// Floating-point Round to Integral exact, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndxF64 VrndxF64
//go:noescape
func VrndxF64(r *arm.Float64X1, v0 *arm.Float64X1)

// Floating-point Round to Integral exact, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndxqF32 VrndxqF32
//go:noescape
func VrndxqF32(r *arm.Float32X4, v0 *arm.Float32X4)

// Floating-point Round to Integral exact, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndxqF64 VrndxqF64
//go:noescape
func VrndxqF64(r *arm.Float64X2, v0 *arm.Float64X2)

// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrshlS8 VrshlS8
//go:noescape
func VrshlS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrshlS16 VrshlS16
//go:noescape
func VrshlS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrshlS32 VrshlS32
//go:noescape
func VrshlS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrshlS64 VrshlS64
//go:noescape
func VrshlS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)

// Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrshlU8 VrshlU8
//go:noescape
func VrshlU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Int8X8)

// Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrshlU16 VrshlU16
//go:noescape
func VrshlU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Int16X4)

// Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrshlU32 VrshlU32
//go:noescape
func VrshlU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Int32X2)

// Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrshlU64 VrshlU64
//go:noescape
func VrshlU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Int64X1)

// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrshldS64 VrshldS64
//go:noescape
func VrshldS64(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64)

// Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrshldU64 VrshldU64
//go:noescape
func VrshldU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Int64)

// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrshlqS8 VrshlqS8
//go:noescape
func VrshlqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrshlqS16 VrshlqS16
//go:noescape
func VrshlqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrshlqS32 VrshlqS32
//go:noescape
func VrshlqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrshlqS64 VrshlqS64
//go:noescape
func VrshlqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrshlqU8 VrshlqU8
//go:noescape
func VrshlqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Int8X16)

// Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrshlqU16 VrshlqU16
//go:noescape
func VrshlqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Int16X8)

// Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrshlqU32 VrshlqU32
//go:noescape
func VrshlqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Int32X4)

// Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrshlqU64 VrshlqU64
//go:noescape
func VrshlqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Int64X2)

// Unsigned Reciprocal Square Root Estimate. This instruction reads each vector element from the source SIMD&FP register, calculates an approximate inverse square root for each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VrsqrteU32 VrsqrteU32
//go:noescape
func VrsqrteU32(r *arm.Uint32X2, v0 *arm.Uint32X2)

// Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrsqrteF32 VrsqrteF32
//go:noescape
func VrsqrteF32(r *arm.Float32X2, v0 *arm.Float32X2)

// Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrsqrteF64 VrsqrteF64
//go:noescape
func VrsqrteF64(r *arm.Float64X1, v0 *arm.Float64X1)

// Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrsqrtedF64 VrsqrtedF64
//go:noescape
func VrsqrtedF64(r *arm.Float64, v0 *arm.Float64)

// Unsigned Reciprocal Square Root Estimate. This instruction reads each vector element from the source SIMD&FP register, calculates an approximate inverse square root for each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VrsqrteqU32 VrsqrteqU32
//go:noescape
func VrsqrteqU32(r *arm.Uint32X4, v0 *arm.Uint32X4)

// Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrsqrteqF32 VrsqrteqF32
//go:noescape
func VrsqrteqF32(r *arm.Float32X4, v0 *arm.Float32X4)

// Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrsqrteqF64 VrsqrteqF64
//go:noescape
func VrsqrteqF64(r *arm.Float64X2, v0 *arm.Float64X2)

// Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrsqrtesF32 VrsqrtesF32
//go:noescape
func VrsqrtesF32(r *arm.Float32, v0 *arm.Float32)

// Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrsqrtsF32 VrsqrtsF32
//go:noescape
func VrsqrtsF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrsqrtsF64 VrsqrtsF64
//go:noescape
func VrsqrtsF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)

// Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrsqrtsdF64 VrsqrtsdF64
//go:noescape
func VrsqrtsdF64(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64)

// Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrsqrtsqF32 VrsqrtsqF32
//go:noescape
func VrsqrtsqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrsqrtsqF64 VrsqrtsqF64
//go:noescape
func VrsqrtsqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrsqrtssF32 VrsqrtssF32
//go:noescape
func VrsqrtssF32(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32)

// Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VrsubhnS16 VrsubhnS16
//go:noescape
func VrsubhnS16(r *arm.Int8X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VrsubhnS32 VrsubhnS32
//go:noescape
func VrsubhnS32(r *arm.Int16X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VrsubhnS64 VrsubhnS64
//go:noescape
func VrsubhnS64(r *arm.Int32X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VrsubhnU16 VrsubhnU16
//go:noescape
func VrsubhnU16(r *arm.Uint8X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VrsubhnU32 VrsubhnU32
//go:noescape
func VrsubhnU32(r *arm.Uint16X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VrsubhnU64 VrsubhnU64
//go:noescape
func VrsubhnU64(r *arm.Uint32X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)

// Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VrsubhnHighS16 VrsubhnHighS16
//go:noescape
func VrsubhnHighS16(r *arm.Int8X16, v0 *arm.Int8X8, v1 *arm.Int16X8, v2 *arm.Int16X8)

// Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VrsubhnHighS32 VrsubhnHighS32
//go:noescape
func VrsubhnHighS32(r *arm.Int16X8, v0 *arm.Int16X4, v1 *arm.Int32X4, v2 *arm.Int32X4)

// Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VrsubhnHighS64 VrsubhnHighS64
//go:noescape
func VrsubhnHighS64(r *arm.Int32X4, v0 *arm.Int32X2, v1 *arm.Int64X2, v2 *arm.Int64X2)

// Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VrsubhnHighU16 VrsubhnHighU16
//go:noescape
func VrsubhnHighU16(r *arm.Uint8X16, v0 *arm.Uint8X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8)

// Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VrsubhnHighU32 VrsubhnHighU32
//go:noescape
func VrsubhnHighU32(r *arm.Uint16X8, v0 *arm.Uint16X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)

// Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
//
//go:linkname VrsubhnHighU64 VrsubhnHighU64
//go:noescape
func VrsubhnHighU64(r *arm.Uint32X4, v0 *arm.Uint32X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2)

// SHA1 hash update (choose).
//
//go:linkname Vsha1CqU32 Vsha1CqU32
//go:noescape
func Vsha1CqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32, v2 *arm.Uint32X4)

// SHA1 fixed rotate.
//
//go:linkname Vsha1HU32 Vsha1HU32
//go:noescape
func Vsha1HU32(r *arm.Uint32, v0 *arm.Uint32)

// SHA1 hash update (majority).
//
//go:linkname Vsha1MqU32 Vsha1MqU32
//go:noescape
func Vsha1MqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32, v2 *arm.Uint32X4)

// SHA1 hash update (parity).
//
//go:linkname Vsha1PqU32 Vsha1PqU32
//go:noescape
func Vsha1PqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32, v2 *arm.Uint32X4)

// SHA1 schedule update 0.
//
//go:linkname Vsha1Su0QU32 Vsha1Su0QU32
//go:noescape
func Vsha1Su0QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)

// SHA1 schedule update 1.
//
//go:linkname Vsha1Su1QU32 Vsha1Su1QU32
//go:noescape
func Vsha1Su1QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// SHA256 hash update (part 2).
//
//go:linkname Vsha256H2QU32 Vsha256H2QU32
//go:noescape
func Vsha256H2QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)

// SHA256 hash update (part 1).
//
//go:linkname Vsha256HqU32 Vsha256HqU32
//go:noescape
func Vsha256HqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)

// SHA256 schedule update 0.
//
//go:linkname Vsha256Su0QU32 Vsha256Su0QU32
//go:noescape
func Vsha256Su0QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// SHA256 schedule update 1.
//
//go:linkname Vsha256Su1QU32 Vsha256Su1QU32
//go:noescape
func Vsha256Su1QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)

// SHA512 Hash update part 2 takes the values from the three 128-bit source SIMD&FP registers and produces a 128-bit output value that combines the sigma0 and majority functions of two iterations of the SHA512 computation. It returns this value to the destination SIMD&FP register.
//
//go:linkname Vsha512H2QU64 Vsha512H2QU64
//go:noescape
func Vsha512H2QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2)

// SHA512 Hash update part 1 takes the values from the three 128-bit source SIMD&FP registers and produces a 128-bit output value that combines the sigma1 and chi functions of two iterations of the SHA512 computation. It returns this value to the destination SIMD&FP register.
//
//go:linkname Vsha512HqU64 Vsha512HqU64
//go:noescape
func Vsha512HqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2)

// SHA512 Schedule Update 0 takes the values from the two 128-bit source SIMD&FP registers and produces a 128-bit output value that combines the gamma0 functions of two iterations of the SHA512 schedule update that are performed after the first 16 iterations within a block. It returns this value to the destination SIMD&FP register.
//
//go:linkname Vsha512Su0QU64 Vsha512Su0QU64
//go:noescape
func Vsha512Su0QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)

// SHA512 Schedule Update 1 takes the values from the three source SIMD&FP registers and produces a 128-bit output value that combines the gamma1 functions of two iterations of the SHA512 schedule update that are performed after the first 16 iterations within a block. It returns this value to the destination SIMD&FP register.
//
//go:linkname Vsha512Su1QU64 Vsha512Su1QU64
//go:noescape
func Vsha512Su1QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2)

// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VshlS8 VshlS8
//go:noescape
func VshlS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VshlS16 VshlS16
//go:noescape
func VshlS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VshlS32 VshlS32
//go:noescape
func VshlS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VshlS64 VshlS64
//go:noescape
func VshlS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)

// Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VshlU8 VshlU8
//go:noescape
func VshlU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Int8X8)

// Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VshlU16 VshlU16
//go:noescape
func VshlU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Int16X4)

// Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VshlU32 VshlU32
//go:noescape
func VshlU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Int32X2)

// Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VshlU64 VshlU64
//go:noescape
func VshlU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Int64X1)

// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VshldS64 VshldS64
//go:noescape
func VshldS64(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64)

// Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VshldU64 VshldU64
//go:noescape
func VshldU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Int64)

// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VshlqS8 VshlqS8
//go:noescape
func VshlqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VshlqS16 VshlqS16
//go:noescape
func VshlqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VshlqS32 VshlqS32
//go:noescape
func VshlqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VshlqS64 VshlqS64
//go:noescape
func VshlqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VshlqU8 VshlqU8
//go:noescape
func VshlqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Int8X16)

// Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VshlqU16 VshlqU16
//go:noescape
func VshlqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Int16X8)

// Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VshlqU32 VshlqU32
//go:noescape
func VshlqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Int32X4)

// Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VshlqU64 VshlqU64
//go:noescape
func VshlqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Int64X2)

// SM3PARTW1 takes three 128-bit vectors from the three source SIMD&FP registers and returns a 128-bit result in the destination SIMD&FP register. The result is obtained by a three-way exclusive OR of the elements within the input vectors with some fixed rotations, see the Operation pseudocode for more information.
//
//go:linkname Vsm3Partw1QU32 Vsm3Partw1QU32
//go:noescape
func Vsm3Partw1QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)

// SM3PARTW2 takes three 128-bit vectors from three source SIMD&FP registers and returns a 128-bit result in the destination SIMD&FP register. The result is obtained by a three-way exclusive OR of the elements within the input vectors with some fixed rotations, see the Operation pseudocode for more information.
//
//go:linkname Vsm3Partw2QU32 Vsm3Partw2QU32
//go:noescape
func Vsm3Partw2QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)

// SM3SS1 rotates the top 32 bits of the 128-bit vector in the first source SIMD&FP register by 12, and adds that 32-bit value to the two other 32-bit values held in the top 32 bits of each of the 128-bit vectors in the second and third source SIMD&FP registers, rotating this result left by 7 and writing the final result into the top 32 bits of the vector in the destination SIMD&FP register, with the bottom 96 bits of the vector being written to 0.
//
//go:linkname Vsm3Ss1QU32 Vsm3Ss1QU32
//go:noescape
func Vsm3Ss1QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)

// SM4 Key takes an input as a 128-bit vector from the first source SIMD&FP register and a 128-bit constant from the second SIMD&FP register. It derives four iterations of the output key, in accordance with the SM4 standard, returning the 128-bit result to the destination SIMD&FP register.
//
//go:linkname Vsm4EkeyqU32 Vsm4EkeyqU32
//go:noescape
func Vsm4EkeyqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// SM4 Encode takes input data as a 128-bit vector from the first source SIMD&FP register, and four iterations of the round key held as the elements of the 128-bit vector in the second source SIMD&FP register. It encrypts the data by four rounds, in accordance with the SM4 standard, returning the 128-bit result to the destination SIMD&FP register.
//
//go:linkname Vsm4EqU32 Vsm4EqU32
//go:noescape
func Vsm4EqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register.
//
//go:linkname VsqaddU8 VsqaddU8
//go:noescape
func VsqaddU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Int8X8)

// Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register.
//
//go:linkname VsqaddU16 VsqaddU16
//go:noescape
func VsqaddU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Int16X4)

// Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register.
//
//go:linkname VsqaddU32 VsqaddU32
//go:noescape
func VsqaddU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Int32X2)

// Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register.
//
//go:linkname VsqaddU64 VsqaddU64
//go:noescape
func VsqaddU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Int64X1)

// Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register.
//
//go:linkname VsqaddbU8 VsqaddbU8
//go:noescape
func VsqaddbU8(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Int8)

// Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register.
//
//go:linkname VsqadddU64 VsqadddU64
//go:noescape
func VsqadddU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Int64)

// Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register.
//
//go:linkname VsqaddhU16 VsqaddhU16
//go:noescape
func VsqaddhU16(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Int16)

// Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register.
//
//go:linkname VsqaddqU8 VsqaddqU8
//go:noescape
func VsqaddqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Int8X16)

// Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register.
//
//go:linkname VsqaddqU16 VsqaddqU16
//go:noescape
func VsqaddqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Int16X8)

// Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register.
//
//go:linkname VsqaddqU32 VsqaddqU32
//go:noescape
func VsqaddqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Int32X4)

// Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register.
//
//go:linkname VsqaddqU64 VsqaddqU64
//go:noescape
func VsqaddqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Int64X2)

// Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register.
//
//go:linkname VsqaddsU32 VsqaddsU32
//go:noescape
func VsqaddsU32(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Int32)

// Floating-point Square Root (vector). This instruction calculates the square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsqrtF32 VsqrtF32
//go:noescape
func VsqrtF32(r *arm.Float32X2, v0 *arm.Float32X2)

// Floating-point Square Root (vector). This instruction calculates the square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsqrtF64 VsqrtF64
//go:noescape
func VsqrtF64(r *arm.Float64X1, v0 *arm.Float64X1)

// Floating-point Square Root (vector). This instruction calculates the square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsqrtqF32 VsqrtqF32
//go:noescape
func VsqrtqF32(r *arm.Float32X4, v0 *arm.Float32X4)

// Floating-point Square Root (vector). This instruction calculates the square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsqrtqF64 VsqrtqF64
//go:noescape
func VsqrtqF64(r *arm.Float64X2, v0 *arm.Float64X2)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubS8 VsubS8
//go:noescape
func VsubS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubS16 VsubS16
//go:noescape
func VsubS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubS32 VsubS32
//go:noescape
func VsubS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubS64 VsubS64
//go:noescape
func VsubS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubU8 VsubU8
//go:noescape
func VsubU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubU16 VsubU16
//go:noescape
func VsubU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubU32 VsubU32
//go:noescape
func VsubU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubU64 VsubU64
//go:noescape
func VsubU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)

// Floating-point Subtract (vector). This instruction subtracts the elements in the vector in the second source SIMD&FP register, from the corresponding elements in the vector in the first source SIMD&FP register, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubF32 VsubF32
//go:noescape
func VsubF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Floating-point Subtract (vector). This instruction subtracts the elements in the vector in the second source SIMD&FP register, from the corresponding elements in the vector in the first source SIMD&FP register, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubF64 VsubF64
//go:noescape
func VsubF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubdS64 VsubdS64
//go:noescape
func VsubdS64(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubdU64 VsubdU64
//go:noescape
func VsubdU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64)

// Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VsubhnS16 VsubhnS16
//go:noescape
func VsubhnS16(r *arm.Int8X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VsubhnS32 VsubhnS32
//go:noescape
func VsubhnS32(r *arm.Int16X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VsubhnS64 VsubhnS64
//go:noescape
func VsubhnS64(r *arm.Int32X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VsubhnU16 VsubhnU16
//go:noescape
func VsubhnU16(r *arm.Uint8X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VsubhnU32 VsubhnU32
//go:noescape
func VsubhnU32(r *arm.Uint16X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VsubhnU64 VsubhnU64
//go:noescape
func VsubhnU64(r *arm.Uint32X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)

// Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VsubhnHighS16 VsubhnHighS16
//go:noescape
func VsubhnHighS16(r *arm.Int8X16, v0 *arm.Int8X8, v1 *arm.Int16X8, v2 *arm.Int16X8)

// Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VsubhnHighS32 VsubhnHighS32
//go:noescape
func VsubhnHighS32(r *arm.Int16X8, v0 *arm.Int16X4, v1 *arm.Int32X4, v2 *arm.Int32X4)

// Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VsubhnHighS64 VsubhnHighS64
//go:noescape
func VsubhnHighS64(r *arm.Int32X4, v0 *arm.Int32X2, v1 *arm.Int64X2, v2 *arm.Int64X2)

// Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VsubhnHighU16 VsubhnHighU16
//go:noescape
func VsubhnHighU16(r *arm.Uint8X16, v0 *arm.Uint8X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8)

// Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VsubhnHighU32 VsubhnHighU32
//go:noescape
func VsubhnHighU32(r *arm.Uint16X8, v0 *arm.Uint16X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)

// Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VsubhnHighU64 VsubhnHighU64
//go:noescape
func VsubhnHighU64(r *arm.Uint32X4, v0 *arm.Uint32X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2)

// Signed Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VsublS8 VsublS8
//go:noescape
func VsublS8(r *arm.Int16X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Signed Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VsublS16 VsublS16
//go:noescape
func VsublS16(r *arm.Int32X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Signed Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VsublS32 VsublS32
//go:noescape
func VsublS32(r *arm.Int64X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Unsigned Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VsublU8 VsublU8
//go:noescape
func VsublU8(r *arm.Uint16X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Unsigned Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VsublU16 VsublU16
//go:noescape
func VsublU16(r *arm.Uint32X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Unsigned Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VsublU32 VsublU32
//go:noescape
func VsublU32(r *arm.Uint64X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Signed Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VsublHighS8 VsublHighS8
//go:noescape
func VsublHighS8(r *arm.Int16X8, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Signed Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VsublHighS16 VsublHighS16
//go:noescape
func VsublHighS16(r *arm.Int32X4, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Signed Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VsublHighS32 VsublHighS32
//go:noescape
func VsublHighS32(r *arm.Int64X2, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Unsigned Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VsublHighU8 VsublHighU8
//go:noescape
func VsublHighU8(r *arm.Uint16X8, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Unsigned Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VsublHighU16 VsublHighU16
//go:noescape
func VsublHighU16(r *arm.Uint32X4, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Unsigned Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. The destination vector elements are twice as long as the source vector elements.
//
//go:linkname VsublHighU32 VsublHighU32
//go:noescape
func VsublHighU32(r *arm.Uint64X2, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubqS8 VsubqS8
//go:noescape
func VsubqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubqS16 VsubqS16
//go:noescape
func VsubqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubqS32 VsubqS32
//go:noescape
func VsubqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubqS64 VsubqS64
//go:noescape
func VsubqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubqU8 VsubqU8
//go:noescape
func VsubqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubqU16 VsubqU16
//go:noescape
func VsubqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubqU32 VsubqU32
//go:noescape
func VsubqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubqU64 VsubqU64
//go:noescape
func VsubqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)

// Floating-point Subtract (vector). This instruction subtracts the elements in the vector in the second source SIMD&FP register, from the corresponding elements in the vector in the first source SIMD&FP register, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubqF32 VsubqF32
//go:noescape
func VsubqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Floating-point Subtract (vector). This instruction subtracts the elements in the vector in the second source SIMD&FP register, from the corresponding elements in the vector in the first source SIMD&FP register, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubqF64 VsubqF64
//go:noescape
func VsubqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Signed Subtract Wide. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values.
//
//go:linkname VsubwS8 VsubwS8
//go:noescape
func VsubwS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X8)

// Signed Subtract Wide. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values.
//
//go:linkname VsubwS16 VsubwS16
//go:noescape
func VsubwS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4)

// Signed Subtract Wide. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values.
//
//go:linkname VsubwS32 VsubwS32
//go:noescape
func VsubwS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2)

// Unsigned Subtract Wide. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element in the lower or upper half of the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values.
//
//go:linkname VsubwU8 VsubwU8
//go:noescape
func VsubwU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X8)

// Unsigned Subtract Wide. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element in the lower or upper half of the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values.
//
//go:linkname VsubwU16 VsubwU16
//go:noescape
func VsubwU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X4)

// Unsigned Subtract Wide. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element in the lower or upper half of the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values.
//
//go:linkname VsubwU32 VsubwU32
//go:noescape
func VsubwU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X2)

// Signed Subtract Wide. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values.
//
//go:linkname VsubwHighS8 VsubwHighS8
//go:noescape
func VsubwHighS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X16)

// Signed Subtract Wide. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values.
//
//go:linkname VsubwHighS16 VsubwHighS16
//go:noescape
func VsubwHighS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8)

// Signed Subtract Wide. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values.
//
//go:linkname VsubwHighS32 VsubwHighS32
//go:noescape
func VsubwHighS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4)

// Unsigned Subtract Wide. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element in the lower or upper half of the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values.
//
//go:linkname VsubwHighU8 VsubwHighU8
//go:noescape
func VsubwHighU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X16)

// Unsigned Subtract Wide. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element in the lower or upper half of the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values.
//
//go:linkname VsubwHighU16 VsubwHighU16
//go:noescape
func VsubwHighU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X8)

// Unsigned Subtract Wide. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element in the lower or upper half of the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values.
//
//go:linkname VsubwHighU32 VsubwHighU32
//go:noescape
func VsubwHighU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X4)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vtbl1S8 Vtbl1S8
//go:noescape
func Vtbl1S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vtbl1U8 Vtbl1U8
//go:noescape
func Vtbl1U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vtbl1P8 Vtbl1P8
//go:noescape
func Vtbl1P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Uint8X8)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vtbl2S8 Vtbl2S8
//go:noescape
func Vtbl2S8(r *arm.Int8X8, v0 *arm.Int8X8X2, v1 *arm.Int8X8)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vtbl2U8 Vtbl2U8
//go:noescape
func Vtbl2U8(r *arm.Uint8X8, v0 *arm.Uint8X8X2, v1 *arm.Uint8X8)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vtbl2P8 Vtbl2P8
//go:noescape
func Vtbl2P8(r *arm.Poly8X8, v0 *arm.Poly8X8X2, v1 *arm.Uint8X8)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vtbl3S8 Vtbl3S8
//go:noescape
func Vtbl3S8(r *arm.Int8X8, v0 *arm.Int8X8X3, v1 *arm.Int8X8)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vtbl3U8 Vtbl3U8
//go:noescape
func Vtbl3U8(r *arm.Uint8X8, v0 *arm.Uint8X8X3, v1 *arm.Uint8X8)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vtbl3P8 Vtbl3P8
//go:noescape
func Vtbl3P8(r *arm.Poly8X8, v0 *arm.Poly8X8X3, v1 *arm.Uint8X8)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vtbl4S8 Vtbl4S8
//go:noescape
func Vtbl4S8(r *arm.Int8X8, v0 *arm.Int8X8X4, v1 *arm.Int8X8)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vtbl4U8 Vtbl4U8
//go:noescape
func Vtbl4U8(r *arm.Uint8X8, v0 *arm.Uint8X8X4, v1 *arm.Uint8X8)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vtbl4P8 Vtbl4P8
//go:noescape
func Vtbl4P8(r *arm.Poly8X8, v0 *arm.Poly8X8X4, v1 *arm.Uint8X8)

// Table vector lookup extension
//
//go:linkname Vtbx1S8 Vtbx1S8
//go:noescape
func Vtbx1S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8, v2 *arm.Int8X8)

// Table vector lookup extension
//
//go:linkname Vtbx1U8 Vtbx1U8
//go:noescape
func Vtbx1U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8, v2 *arm.Uint8X8)

// Table vector lookup extension
//
//go:linkname Vtbx1P8 Vtbx1P8
//go:noescape
func Vtbx1P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8, v2 *arm.Uint8X8)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vtbx2S8 Vtbx2S8
//go:noescape
func Vtbx2S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8X2, v2 *arm.Int8X8)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vtbx2U8 Vtbx2U8
//go:noescape
func Vtbx2U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8X2, v2 *arm.Uint8X8)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vtbx2P8 Vtbx2P8
//go:noescape
func Vtbx2P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8X2, v2 *arm.Uint8X8)

// Table vector lookup extension
//
//go:linkname Vtbx3S8 Vtbx3S8
//go:noescape
func Vtbx3S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8X3, v2 *arm.Int8X8)

// Table vector lookup extension
//
//go:linkname Vtbx3U8 Vtbx3U8
//go:noescape
func Vtbx3U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8X3, v2 *arm.Uint8X8)

// Table vector lookup extension
//
//go:linkname Vtbx3P8 Vtbx3P8
//go:noescape
func Vtbx3P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8X3, v2 *arm.Uint8X8)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vtbx4S8 Vtbx4S8
//go:noescape
func Vtbx4S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8X4, v2 *arm.Int8X8)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vtbx4U8 Vtbx4U8
//go:noescape
func Vtbx4U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8X4, v2 *arm.Uint8X8)

// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vtbx4P8 Vtbx4P8
//go:noescape
func Vtbx4P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8X4, v2 *arm.Uint8X8)

// Transpose elements
//
//go:linkname VtrnS8 VtrnS8
//go:noescape
func VtrnS8(r *arm.Int8X8X2, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Transpose elements
//
//go:linkname VtrnS16 VtrnS16
//go:noescape
func VtrnS16(r *arm.Int16X4X2, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Transpose elements
//
//go:linkname VtrnS32 VtrnS32
//go:noescape
func VtrnS32(r *arm.Int32X2X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Transpose elements
//
//go:linkname VtrnU8 VtrnU8
//go:noescape
func VtrnU8(r *arm.Uint8X8X2, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Transpose elements
//
//go:linkname VtrnU16 VtrnU16
//go:noescape
func VtrnU16(r *arm.Uint16X4X2, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Transpose elements
//
//go:linkname VtrnU32 VtrnU32
//go:noescape
func VtrnU32(r *arm.Uint32X2X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Transpose elements
//
//go:linkname VtrnF32 VtrnF32
//go:noescape
func VtrnF32(r *arm.Float32X2X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1S8 Vtrn1S8
//go:noescape
func Vtrn1S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1S16 Vtrn1S16
//go:noescape
func Vtrn1S16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1S32 Vtrn1S32
//go:noescape
func Vtrn1S32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1U8 Vtrn1U8
//go:noescape
func Vtrn1U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1U16 Vtrn1U16
//go:noescape
func Vtrn1U16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1U32 Vtrn1U32
//go:noescape
func Vtrn1U32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1F32 Vtrn1F32
//go:noescape
func Vtrn1F32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1P16 Vtrn1P16
//go:noescape
func Vtrn1P16(r *arm.Poly16X4, v0 *arm.Poly16X4, v1 *arm.Poly16X4)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1P8 Vtrn1P8
//go:noescape
func Vtrn1P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1QS8 Vtrn1QS8
//go:noescape
func Vtrn1QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1QS16 Vtrn1QS16
//go:noescape
func Vtrn1QS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1QS32 Vtrn1QS32
//go:noescape
func Vtrn1QS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1QS64 Vtrn1QS64
//go:noescape
func Vtrn1QS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1QU8 Vtrn1QU8
//go:noescape
func Vtrn1QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1QU16 Vtrn1QU16
//go:noescape
func Vtrn1QU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1QU32 Vtrn1QU32
//go:noescape
func Vtrn1QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1QU64 Vtrn1QU64
//go:noescape
func Vtrn1QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1QF32 Vtrn1QF32
//go:noescape
func Vtrn1QF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1QF64 Vtrn1QF64
//go:noescape
func Vtrn1QF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1QP16 Vtrn1QP16
//go:noescape
func Vtrn1QP16(r *arm.Poly16X8, v0 *arm.Poly16X8, v1 *arm.Poly16X8)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1QP64 Vtrn1QP64
//go:noescape
func Vtrn1QP64(r *arm.Poly64X2, v0 *arm.Poly64X2, v1 *arm.Poly64X2)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1QP8 Vtrn1QP8
//go:noescape
func Vtrn1QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2S8 Vtrn2S8
//go:noescape
func Vtrn2S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2S16 Vtrn2S16
//go:noescape
func Vtrn2S16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2S32 Vtrn2S32
//go:noescape
func Vtrn2S32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2U8 Vtrn2U8
//go:noescape
func Vtrn2U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2U16 Vtrn2U16
//go:noescape
func Vtrn2U16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2U32 Vtrn2U32
//go:noescape
func Vtrn2U32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2F32 Vtrn2F32
//go:noescape
func Vtrn2F32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2P16 Vtrn2P16
//go:noescape
func Vtrn2P16(r *arm.Poly16X4, v0 *arm.Poly16X4, v1 *arm.Poly16X4)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2P8 Vtrn2P8
//go:noescape
func Vtrn2P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2QS8 Vtrn2QS8
//go:noescape
func Vtrn2QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2QS16 Vtrn2QS16
//go:noescape
func Vtrn2QS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2QS32 Vtrn2QS32
//go:noescape
func Vtrn2QS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2QS64 Vtrn2QS64
//go:noescape
func Vtrn2QS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2QU8 Vtrn2QU8
//go:noescape
func Vtrn2QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2QU16 Vtrn2QU16
//go:noescape
func Vtrn2QU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2QU32 Vtrn2QU32
//go:noescape
func Vtrn2QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2QU64 Vtrn2QU64
//go:noescape
func Vtrn2QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2QF32 Vtrn2QF32
//go:noescape
func Vtrn2QF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2QF64 Vtrn2QF64
//go:noescape
func Vtrn2QF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2QP16 Vtrn2QP16
//go:noescape
func Vtrn2QP16(r *arm.Poly16X8, v0 *arm.Poly16X8, v1 *arm.Poly16X8)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2QP64 Vtrn2QP64
//go:noescape
func Vtrn2QP64(r *arm.Poly64X2, v0 *arm.Poly64X2, v1 *arm.Poly64X2)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2QP8 Vtrn2QP8
//go:noescape
func Vtrn2QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16)

// Transpose elements
//
//go:linkname VtrnP16 VtrnP16
//go:noescape
func VtrnP16(r *arm.Poly16X4X2, v0 *arm.Poly16X4, v1 *arm.Poly16X4)

// Transpose elements
//
//go:linkname VtrnP8 VtrnP8
//go:noescape
func VtrnP8(r *arm.Poly8X8X2, v0 *arm.Poly8X8, v1 *arm.Poly8X8)

// Transpose elements
//
//go:linkname VtrnqS8 VtrnqS8
//go:noescape
func VtrnqS8(r *arm.Int8X16X2, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Transpose elements
//
//go:linkname VtrnqS16 VtrnqS16
//go:noescape
func VtrnqS16(r *arm.Int16X8X2, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Transpose elements
//
//go:linkname VtrnqS32 VtrnqS32
//go:noescape
func VtrnqS32(r *arm.Int32X4X2, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Transpose elements
//
//go:linkname VtrnqU8 VtrnqU8
//go:noescape
func VtrnqU8(r *arm.Uint8X16X2, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Transpose elements
//
//go:linkname VtrnqU16 VtrnqU16
//go:noescape
func VtrnqU16(r *arm.Uint16X8X2, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Transpose elements
//
//go:linkname VtrnqU32 VtrnqU32
//go:noescape
func VtrnqU32(r *arm.Uint32X4X2, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Transpose elements
//
//go:linkname VtrnqF32 VtrnqF32
//go:noescape
func VtrnqF32(r *arm.Float32X4X2, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Transpose elements
//
//go:linkname VtrnqP16 VtrnqP16
//go:noescape
func VtrnqP16(r *arm.Poly16X8X2, v0 *arm.Poly16X8, v1 *arm.Poly16X8)

// Transpose elements
//
//go:linkname VtrnqP8 VtrnqP8
//go:noescape
func VtrnqP8(r *arm.Poly8X16X2, v0 *arm.Poly8X16, v1 *arm.Poly8X16)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstS8 VtstS8
//go:noescape
func VtstS8(r *arm.Uint8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstS16 VtstS16
//go:noescape
func VtstS16(r *arm.Uint16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstS32 VtstS32
//go:noescape
func VtstS32(r *arm.Uint32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstS64 VtstS64
//go:noescape
func VtstS64(r *arm.Uint64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstU8 VtstU8
//go:noescape
func VtstU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstU16 VtstU16
//go:noescape
func VtstU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstU32 VtstU32
//go:noescape
func VtstU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstU64 VtstU64
//go:noescape
func VtstU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)

// vtst_p16
//
//go:linkname VtstP16 VtstP16
//go:noescape
func VtstP16(r *arm.Uint16X4, v0 *arm.Poly16X4, v1 *arm.Poly16X4)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstP64 VtstP64
//go:noescape
func VtstP64(r *arm.Uint64X1, v0 *arm.Poly64X1, v1 *arm.Poly64X1)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstP8 VtstP8
//go:noescape
func VtstP8(r *arm.Uint8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstdS64 VtstdS64
//go:noescape
func VtstdS64(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstdU64 VtstdU64
//go:noescape
func VtstdU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstqS8 VtstqS8
//go:noescape
func VtstqS8(r *arm.Uint8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstqS16 VtstqS16
//go:noescape
func VtstqS16(r *arm.Uint16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstqS32 VtstqS32
//go:noescape
func VtstqS32(r *arm.Uint32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstqS64 VtstqS64
//go:noescape
func VtstqS64(r *arm.Uint64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstqU8 VtstqU8
//go:noescape
func VtstqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstqU16 VtstqU16
//go:noescape
func VtstqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstqU32 VtstqU32
//go:noescape
func VtstqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstqU64 VtstqU64
//go:noescape
func VtstqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)

// vtstq_p16
//
//go:linkname VtstqP16 VtstqP16
//go:noescape
func VtstqP16(r *arm.Uint16X8, v0 *arm.Poly16X8, v1 *arm.Poly16X8)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstqP64 VtstqP64
//go:noescape
func VtstqP64(r *arm.Uint64X2, v0 *arm.Poly64X2, v1 *arm.Poly64X2)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstqP8 VtstqP8
//go:noescape
func VtstqP8(r *arm.Uint8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16)

// Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register.
//
//go:linkname VuqaddS8 VuqaddS8
//go:noescape
func VuqaddS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Uint8X8)

// Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register.
//
//go:linkname VuqaddS16 VuqaddS16
//go:noescape
func VuqaddS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Uint16X4)

// Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register.
//
//go:linkname VuqaddS32 VuqaddS32
//go:noescape
func VuqaddS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Uint32X2)

// Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register.
//
//go:linkname VuqaddS64 VuqaddS64
//go:noescape
func VuqaddS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Uint64X1)

// Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register.
//
//go:linkname VuqaddbS8 VuqaddbS8
//go:noescape
func VuqaddbS8(r *arm.Int8, v0 *arm.Int8, v1 *arm.Uint8)

// Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register.
//
//go:linkname VuqadddS64 VuqadddS64
//go:noescape
func VuqadddS64(r *arm.Int64, v0 *arm.Int64, v1 *arm.Uint64)

// Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register.
//
//go:linkname VuqaddhS16 VuqaddhS16
//go:noescape
func VuqaddhS16(r *arm.Int16, v0 *arm.Int16, v1 *arm.Uint16)

// Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register.
//
//go:linkname VuqaddqS8 VuqaddqS8
//go:noescape
func VuqaddqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Uint8X16)

// Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register.
//
//go:linkname VuqaddqS16 VuqaddqS16
//go:noescape
func VuqaddqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Uint16X8)

// Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register.
//
//go:linkname VuqaddqS32 VuqaddqS32
//go:noescape
func VuqaddqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Uint32X4)

// Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register.
//
//go:linkname VuqaddqS64 VuqaddqS64
//go:noescape
func VuqaddqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Uint64X2)

// Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register.
//
//go:linkname VuqaddsS32 VuqaddsS32
//go:noescape
func VuqaddsS32(r *arm.Int32, v0 *arm.Int32, v1 *arm.Uint32)

// Dot Product vector form with unsigned and signed integers. This instruction performs the dot product of the four unsigned 8-bit integer values in each 32-bit element of the first source register with the four signed 8-bit integer values in the corresponding 32-bit element of the second source register, accumulating the result into the corresponding 32-bit element of the destination register.
//
//go:linkname VusdotS32 VusdotS32
//go:noescape
func VusdotS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Uint8X8, v2 *arm.Int8X8)

// Dot Product vector form with unsigned and signed integers. This instruction performs the dot product of the four unsigned 8-bit integer values in each 32-bit element of the first source register with the four signed 8-bit integer values in the corresponding 32-bit element of the second source register, accumulating the result into the corresponding 32-bit element of the destination register.
//
//go:linkname VusdotqS32 VusdotqS32
//go:noescape
func VusdotqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Uint8X16, v2 *arm.Int8X16)

// Unsigned and signed 8-bit integer matrix multiply-accumulate. This instruction multiplies the 2x8 matrix of unsigned 8-bit integer values in the first source vector by the 8x2 matrix of signed 8-bit integer values in the second source vector. The resulting 2x2 32-bit integer matrix product is destructively added to the 32-bit integer matrix accumulator in the destination vector. This is equivalent to performing an 8-way dot product per destination element.
//
//go:linkname VusmmlaqS32 VusmmlaqS32
//go:noescape
func VusmmlaqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Uint8X16, v2 *arm.Int8X16)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VuzpS8 VuzpS8
//go:noescape
func VuzpS8(r *arm.Int8X8X2, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VuzpS16 VuzpS16
//go:noescape
func VuzpS16(r *arm.Int16X4X2, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VuzpS32 VuzpS32
//go:noescape
func VuzpS32(r *arm.Int32X2X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VuzpU8 VuzpU8
//go:noescape
func VuzpU8(r *arm.Uint8X8X2, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VuzpU16 VuzpU16
//go:noescape
func VuzpU16(r *arm.Uint16X4X2, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VuzpU32 VuzpU32
//go:noescape
func VuzpU32(r *arm.Uint32X2X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VuzpF32 VuzpF32
//go:noescape
func VuzpF32(r *arm.Float32X2X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1S8 Vuzp1S8
//go:noescape
func Vuzp1S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1S16 Vuzp1S16
//go:noescape
func Vuzp1S16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1S32 Vuzp1S32
//go:noescape
func Vuzp1S32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1U8 Vuzp1U8
//go:noescape
func Vuzp1U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1U16 Vuzp1U16
//go:noescape
func Vuzp1U16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1U32 Vuzp1U32
//go:noescape
func Vuzp1U32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1F32 Vuzp1F32
//go:noescape
func Vuzp1F32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1P16 Vuzp1P16
//go:noescape
func Vuzp1P16(r *arm.Poly16X4, v0 *arm.Poly16X4, v1 *arm.Poly16X4)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1P8 Vuzp1P8
//go:noescape
func Vuzp1P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1QS8 Vuzp1QS8
//go:noescape
func Vuzp1QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1QS16 Vuzp1QS16
//go:noescape
func Vuzp1QS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1QS32 Vuzp1QS32
//go:noescape
func Vuzp1QS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1QS64 Vuzp1QS64
//go:noescape
func Vuzp1QS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1QU8 Vuzp1QU8
//go:noescape
func Vuzp1QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1QU16 Vuzp1QU16
//go:noescape
func Vuzp1QU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1QU32 Vuzp1QU32
//go:noescape
func Vuzp1QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1QU64 Vuzp1QU64
//go:noescape
func Vuzp1QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1QF32 Vuzp1QF32
//go:noescape
func Vuzp1QF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1QF64 Vuzp1QF64
//go:noescape
func Vuzp1QF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1QP16 Vuzp1QP16
//go:noescape
func Vuzp1QP16(r *arm.Poly16X8, v0 *arm.Poly16X8, v1 *arm.Poly16X8)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1QP64 Vuzp1QP64
//go:noescape
func Vuzp1QP64(r *arm.Poly64X2, v0 *arm.Poly64X2, v1 *arm.Poly64X2)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1QP8 Vuzp1QP8
//go:noescape
func Vuzp1QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2S8 Vuzp2S8
//go:noescape
func Vuzp2S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2S16 Vuzp2S16
//go:noescape
func Vuzp2S16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2S32 Vuzp2S32
//go:noescape
func Vuzp2S32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2U8 Vuzp2U8
//go:noescape
func Vuzp2U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2U16 Vuzp2U16
//go:noescape
func Vuzp2U16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2U32 Vuzp2U32
//go:noescape
func Vuzp2U32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2F32 Vuzp2F32
//go:noescape
func Vuzp2F32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2P16 Vuzp2P16
//go:noescape
func Vuzp2P16(r *arm.Poly16X4, v0 *arm.Poly16X4, v1 *arm.Poly16X4)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2P8 Vuzp2P8
//go:noescape
func Vuzp2P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2QS8 Vuzp2QS8
//go:noescape
func Vuzp2QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2QS16 Vuzp2QS16
//go:noescape
func Vuzp2QS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2QS32 Vuzp2QS32
//go:noescape
func Vuzp2QS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2QS64 Vuzp2QS64
//go:noescape
func Vuzp2QS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2QU8 Vuzp2QU8
//go:noescape
func Vuzp2QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2QU16 Vuzp2QU16
//go:noescape
func Vuzp2QU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2QU32 Vuzp2QU32
//go:noescape
func Vuzp2QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2QU64 Vuzp2QU64
//go:noescape
func Vuzp2QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2QF32 Vuzp2QF32
//go:noescape
func Vuzp2QF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2QF64 Vuzp2QF64
//go:noescape
func Vuzp2QF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2QP16 Vuzp2QP16
//go:noescape
func Vuzp2QP16(r *arm.Poly16X8, v0 *arm.Poly16X8, v1 *arm.Poly16X8)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2QP64 Vuzp2QP64
//go:noescape
func Vuzp2QP64(r *arm.Poly64X2, v0 *arm.Poly64X2, v1 *arm.Poly64X2)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2QP8 Vuzp2QP8
//go:noescape
func Vuzp2QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VuzpP16 VuzpP16
//go:noescape
func VuzpP16(r *arm.Poly16X4X2, v0 *arm.Poly16X4, v1 *arm.Poly16X4)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VuzpP8 VuzpP8
//go:noescape
func VuzpP8(r *arm.Poly8X8X2, v0 *arm.Poly8X8, v1 *arm.Poly8X8)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VuzpqS8 VuzpqS8
//go:noescape
func VuzpqS8(r *arm.Int8X16X2, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VuzpqS16 VuzpqS16
//go:noescape
func VuzpqS16(r *arm.Int16X8X2, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VuzpqS32 VuzpqS32
//go:noescape
func VuzpqS32(r *arm.Int32X4X2, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VuzpqU8 VuzpqU8
//go:noescape
func VuzpqU8(r *arm.Uint8X16X2, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VuzpqU16 VuzpqU16
//go:noescape
func VuzpqU16(r *arm.Uint16X8X2, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VuzpqU32 VuzpqU32
//go:noescape
func VuzpqU32(r *arm.Uint32X4X2, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VuzpqF32 VuzpqF32
//go:noescape
func VuzpqF32(r *arm.Float32X4X2, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VuzpqP16 VuzpqP16
//go:noescape
func VuzpqP16(r *arm.Poly16X8X2, v0 *arm.Poly16X8, v1 *arm.Poly16X8)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VuzpqP8 VuzpqP8
//go:noescape
func VuzpqP8(r *arm.Poly8X16X2, v0 *arm.Poly8X16, v1 *arm.Poly8X16)

// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.
//
//go:linkname VzipS8 VzipS8
//go:noescape
func VzipS8(r *arm.Int8X8X2, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.
//
//go:linkname VzipS16 VzipS16
//go:noescape
func VzipS16(r *arm.Int16X4X2, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.
//
//go:linkname VzipS32 VzipS32
//go:noescape
func VzipS32(r *arm.Int32X2X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.
//
//go:linkname VzipU8 VzipU8
//go:noescape
func VzipU8(r *arm.Uint8X8X2, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.
//
//go:linkname VzipU16 VzipU16
//go:noescape
func VzipU16(r *arm.Uint16X4X2, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.
//
//go:linkname VzipU32 VzipU32
//go:noescape
func VzipU32(r *arm.Uint32X2X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.
//
//go:linkname VzipF32 VzipF32
//go:noescape
func VzipF32(r *arm.Float32X2X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1S8 Vzip1S8
//go:noescape
func Vzip1S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1S16 Vzip1S16
//go:noescape
func Vzip1S16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1S32 Vzip1S32
//go:noescape
func Vzip1S32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1U8 Vzip1U8
//go:noescape
func Vzip1U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1U16 Vzip1U16
//go:noescape
func Vzip1U16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1U32 Vzip1U32
//go:noescape
func Vzip1U32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1F32 Vzip1F32
//go:noescape
func Vzip1F32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1P16 Vzip1P16
//go:noescape
func Vzip1P16(r *arm.Poly16X4, v0 *arm.Poly16X4, v1 *arm.Poly16X4)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1P8 Vzip1P8
//go:noescape
func Vzip1P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1QS8 Vzip1QS8
//go:noescape
func Vzip1QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1QS16 Vzip1QS16
//go:noescape
func Vzip1QS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1QS32 Vzip1QS32
//go:noescape
func Vzip1QS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1QS64 Vzip1QS64
//go:noescape
func Vzip1QS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1QU8 Vzip1QU8
//go:noescape
func Vzip1QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1QU16 Vzip1QU16
//go:noescape
func Vzip1QU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1QU32 Vzip1QU32
//go:noescape
func Vzip1QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1QU64 Vzip1QU64
//go:noescape
func Vzip1QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1QF32 Vzip1QF32
//go:noescape
func Vzip1QF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1QF64 Vzip1QF64
//go:noescape
func Vzip1QF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1QP16 Vzip1QP16
//go:noescape
func Vzip1QP16(r *arm.Poly16X8, v0 *arm.Poly16X8, v1 *arm.Poly16X8)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1QP64 Vzip1QP64
//go:noescape
func Vzip1QP64(r *arm.Poly64X2, v0 *arm.Poly64X2, v1 *arm.Poly64X2)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1QP8 Vzip1QP8
//go:noescape
func Vzip1QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2S8 Vzip2S8
//go:noescape
func Vzip2S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2S16 Vzip2S16
//go:noescape
func Vzip2S16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2S32 Vzip2S32
//go:noescape
func Vzip2S32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2U8 Vzip2U8
//go:noescape
func Vzip2U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2U16 Vzip2U16
//go:noescape
func Vzip2U16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2U32 Vzip2U32
//go:noescape
func Vzip2U32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2F32 Vzip2F32
//go:noescape
func Vzip2F32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2P16 Vzip2P16
//go:noescape
func Vzip2P16(r *arm.Poly16X4, v0 *arm.Poly16X4, v1 *arm.Poly16X4)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2P8 Vzip2P8
//go:noescape
func Vzip2P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2QS8 Vzip2QS8
//go:noescape
func Vzip2QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2QS16 Vzip2QS16
//go:noescape
func Vzip2QS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2QS32 Vzip2QS32
//go:noescape
func Vzip2QS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2QS64 Vzip2QS64
//go:noescape
func Vzip2QS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2QU8 Vzip2QU8
//go:noescape
func Vzip2QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2QU16 Vzip2QU16
//go:noescape
func Vzip2QU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2QU32 Vzip2QU32
//go:noescape
func Vzip2QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2QU64 Vzip2QU64
//go:noescape
func Vzip2QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2QF32 Vzip2QF32
//go:noescape
func Vzip2QF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2QF64 Vzip2QF64
//go:noescape
func Vzip2QF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2QP16 Vzip2QP16
//go:noescape
func Vzip2QP16(r *arm.Poly16X8, v0 *arm.Poly16X8, v1 *arm.Poly16X8)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2QP64 Vzip2QP64
//go:noescape
func Vzip2QP64(r *arm.Poly64X2, v0 *arm.Poly64X2, v1 *arm.Poly64X2)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2QP8 Vzip2QP8
//go:noescape
func Vzip2QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16)

// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.
//
//go:linkname VzipP16 VzipP16
//go:noescape
func VzipP16(r *arm.Poly16X4X2, v0 *arm.Poly16X4, v1 *arm.Poly16X4)

// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.
//
//go:linkname VzipP8 VzipP8
//go:noescape
func VzipP8(r *arm.Poly8X8X2, v0 *arm.Poly8X8, v1 *arm.Poly8X8)

// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.
//
//go:linkname VzipqS8 VzipqS8
//go:noescape
func VzipqS8(r *arm.Int8X16X2, v0 *arm.Int8X16, v1 *arm.Int8X16)

// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.
//
//go:linkname VzipqS16 VzipqS16
//go:noescape
func VzipqS16(r *arm.Int16X8X2, v0 *arm.Int16X8, v1 *arm.Int16X8)

// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.
//
//go:linkname VzipqS32 VzipqS32
//go:noescape
func VzipqS32(r *arm.Int32X4X2, v0 *arm.Int32X4, v1 *arm.Int32X4)

// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.
//
//go:linkname VzipqU8 VzipqU8
//go:noescape
func VzipqU8(r *arm.Uint8X16X2, v0 *arm.Uint8X16, v1 *arm.Uint8X16)

// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.
//
//go:linkname VzipqU16 VzipqU16
//go:noescape
func VzipqU16(r *arm.Uint16X8X2, v0 *arm.Uint16X8, v1 *arm.Uint16X8)

// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.
//
//go:linkname VzipqU32 VzipqU32
//go:noescape
func VzipqU32(r *arm.Uint32X4X2, v0 *arm.Uint32X4, v1 *arm.Uint32X4)

// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.
//
//go:linkname VzipqF32 VzipqF32
//go:noescape
func VzipqF32(r *arm.Float32X4X2, v0 *arm.Float32X4, v1 *arm.Float32X4)

// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.
//
//go:linkname VzipqP16 VzipqP16
//go:noescape
func VzipqP16(r *arm.Poly16X8X2, v0 *arm.Poly16X8, v1 *arm.Poly16X8)

// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.
//
//go:linkname VzipqP8 VzipqP8
//go:noescape
func VzipqP8(r *arm.Poly8X16X2, v0 *arm.Poly8X16, v1 *arm.Poly8X16)


================================================
FILE: arm/neon/functions_bypass.go
================================================
package neon

/*
#include <arm_neon.h>
void vmulS8_bypass(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vmul_s8(*v0, *v1); }
void vmulS8_full(int8_t* r, int8_t* v0, int8_t* v1, int n) {
	int8x8_t* pr = (int8x8_t*)r;
	int8x8_t* pa = (int8x8_t*)v0;
	int8x8_t* pb = (int8x8_t*)v1;
	for (int i=0; i<n; i+=8) {
		*pr = vmul_s8(*pa, *pb);
		pr += 1;
		pa += 1;
		pb += 1;
	}
}
*/
import "C"
import "github.com/alivanz/go-simd/arm"

//go:linkname vmulS8_bypass vmulS8_bypass
//go:noescape
func vmulS8_bypass(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)

//go:linkname vmulS8_full vmulS8_full
//go:noescape
func vmulS8_full(r *int8, v0 *int8, v1 *int8, n int)


================================================
FILE: arm/neon/functions_cgo.go
================================================
package neon

/*
#cgo CFLAGS: -march=armv8.5-a+crypto+i8mm
#include <arm_neon.h>
*/
import "C"

type int8x8 = C.int8x8_t

func vmulS8_cgo(r, v0, v1 *int8x8) {
	*r = C.vmul_s8(*v0, *v1)
}


================================================
FILE: arm/neon/functions_test.go
================================================
package neon

import (
	"math/rand"
	"reflect"
	"runtime"
	"testing"
	"unsafe"

	"github.com/alivanz/go-simd/arm"
)

func TestMult(t *testing.T) {
	var (
		a      = arm.Int8X8{0, 1, 2, 3, 4, 5, 6, 7}
		b      = arm.Int8X8{7, 6, 5, 4, 3, 2, 1, 0}
		r      = arm.Int8X8{0, 6, 10, 12, 12, 10, 6, 0}
		result arm.Int8X8
	)
	VmulS8(&result, &a, &b)
	if !reflect.DeepEqual(result, r) {
		t.Fatal(result)
	}
}

func TestMultFull(t *testing.T) {
	const N = 1024 * 16
	var (
		a      [N]int8
		b      [N]int8
		ref    [N]int8
		result [N]int8
	)
	for i := 0; i < N; i++ {
		a[i] = int8(rand.Int())
		b[i] = int8(rand.Int())
		ref[i] = a[i] * b[i]
	}
	vmulS8_full(&result[0], &a[0], &b[0], N)
	if !reflect.DeepEqual(result, ref) {
		t.Fail()
	}
}

func BenchmarkMultRef(t *testing.B) {
	const N = 1024 * 16
	var (
		a      [N]int8
		b      [N]int8
		result [N]int8
	)
	for j := range a[:] {
		a[j] = int8(rand.Int())
		b[j] = int8(rand.Int())
	}
	t.ResetTimer()
	for i := 0; i < t.N; i++ {
		for j := 0; j < N; j++ {
			result[j] = a[j] * b[j]
		}
	}
	runtime.KeepAlive(&result)
}

func BenchmarkMultSimd(t *testing.B) {
	const N = 1024 * 16
	var (
		a      [N]int8
		b      [N]int8
		result [N]int8
	)
	for i := 0; i < t.N; i++ {
		for j := 0; j < N; j += 8 {
			VmulS8(
				(*arm.Int8X8)(unsafe.Pointer(&result[j])),
				(*arm.Int8X8)(unsafe.Pointer(&a[j])),
				(*arm.Int8X8)(unsafe.Pointer(&b[j])),
			)
		}
	}
}

func BenchmarkMultSimdBypass(t *testing.B) {
	const N = 1024 * 16
	var (
		a      [N]int8
		b      [N]int8
		result [N]int8
	)
	for i := 0; i < t.N; i++ {
		for j := 0; j < N; j += 8 {
			vmulS8_bypass(
				(*arm.Int8X8)(unsafe.Pointer(&result[j])),
				(*arm.Int8X8)(unsafe.Pointer(&a[j])),
				(*arm.Int8X8)(unsafe.Pointer(&b[j])),
			)
		}
	}
}

func BenchmarkMultSimdFull(t *testing.B) {
	const N = 1024 * 16
	var (
		a      [N]int8
		b      [N]int8
		result [N]int8
	)
	for i := 0; i < t.N; i++ {
		vmulS8_full(
			&result[0],
			&a[0],
			&b[0],
			N,
		)
	}
}

func BenchmarkMultSimdCgo(t *testing.B) {
	const N = 1024 * 16
	var (
		a      [N]int8
		b      [N]int8
		result [N]int8
	)
	for i := 0; i < t.N; i++ {
		for j := 0; j < N; j += 8 {
			vmulS8_cgo(
				(*int8x8)(unsafe.Pointer(&result[j])),
				(*int8x8)(unsafe.Pointer(&a[j])),
				(*int8x8)(unsafe.Pointer(&b[j])),
			)
		}
	}
}


================================================
FILE: arm/neon/loops.c
================================================
#include <arm_neon.h>

#define save(dst, src) *dst = src
#define load(src) (*src)
#define LOOP1(name, rtype, itype, f, set, load, rstep, istep) \
    void name(rtype *r, itype *v, int32_t n)                  \
    {                                                         \
        while (n >= rstep)                                    \
        {                                                     \
            set(r, f(load(v)));                               \
            r += rstep;                                       \
            n -= rstep;                                       \
            v += istep;                                       \
        }                                                     \
    }

LOOP1(VabsS8N, int8_t, int8_t, vabs_s8, vst1_s8, vld1_s8, 8, 8)
LOOP1(VabsS16N, int16_t, int16_t, vabs_s16, vst1_s16, vld1_s16, 4, 4)
LOOP1(VabsS32N, int32_t, int32_t, vabs_s32, vst1_s32, vld1_s32, 2, 2)
LOOP1(VabsS64N, int64_t, int64_t, vabs_s64, vst1_s64, vld1_s64, 1, 1)
LOOP1(VabsF32N, float32_t, float32_t, vabs_f32, vst1_f32, vld1_f32, 2, 2)
LOOP1(VabsF64N, float64_t, float64_t, vabs_f64, vst1_f64, vld1_f64, 1, 1)
LOOP1(VabsdS64N, int64_t, int64_t, vabsd_s64, save, load, 1, 1)
LOOP1(VabsqS8N, int8_t, int8_t, vabsq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP1(VabsqS16N, int16_t, int16_t, vabsq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP1(VabsqS32N, int32_t, int32_t, vabsq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP1(VabsqS64N, int64_t, int64_t, vabsq_s64, vst1q_s64, vld1q_s64, 2, 2)
LOOP1(VabsqF32N, float32_t, float32_t, vabsq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP1(VabsqF64N, float64_t, float64_t, vabsq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP1(VaddvS8N, int8_t, int8_t, vaddv_s8, save, vld1_s8, 1, 8)
LOOP1(VaddvS16N, int16_t, int16_t, vaddv_s16, save, vld1_s16, 1, 4)
LOOP1(VaddvS32N, int32_t, int32_t, vaddv_s32, save, vld1_s32, 1, 2)
LOOP1(VaddvU8N, uint8_t, uint8_t, vaddv_u8, save, vld1_u8, 1, 8)
LOOP1(VaddvU16N, uint16_t, uint16_t, vaddv_u16, save, vld1_u16, 1, 4)
LOOP1(VaddvU32N, uint32_t, uint32_t, vaddv_u32, save, vld1_u32, 1, 2)
LOOP1(VaddvF32N, float32_t, float32_t, vaddv_f32, save, vld1_f32, 1, 2)
LOOP1(VaddvqS8N, int8_t, int8_t, vaddvq_s8, save, vld1q_s8, 1, 16)
LOOP1(VaddvqS16N, int16_t, int16_t, vaddvq_s16, save, vld1q_s16, 1, 8)
LOOP1(VaddvqS32N, int32_t, int32_t, vaddvq_s32, save, vld1q_s32, 1, 4)
LOOP1(VaddvqS64N, int64_t, int64_t, vaddvq_s64, save, vld1q_s64, 1, 2)
LOOP1(VaddvqU8N, uint8_t, uint8_t, vaddvq_u8, save, vld1q_u8, 1, 16)
LOOP1(VaddvqU16N, uint16_t, uint16_t, vaddvq_u16, save, vld1q_u16, 1, 8)
LOOP1(VaddvqU32N, uint32_t, uint32_t, vaddvq_u32, save, vld1q_u32, 1, 4)
LOOP1(VaddvqU64N, uint64_t, uint64_t, vaddvq_u64, save, vld1q_u64, 1, 2)
LOOP1(VaddvqF32N, float32_t, float32_t, vaddvq_f32, save, vld1q_f32, 1, 4)
LOOP1(VaddvqF64N, float64_t, float64_t, vaddvq_f64, save, vld1q_f64, 1, 2)
LOOP1(VaesimcqU8N, uint8_t, uint8_t, vaesimcq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP1(VaesmcqU8N, uint8_t, uint8_t, vaesmcq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP1(VceqzS8N, uint8_t, int8_t, vceqz_s8, vst1_u8, vld1_s8, 8, 8)
LOOP1(VceqzS16N, uint16_t, int16_t, vceqz_s16, vst1_u16, vld1_s16, 4, 4)
LOOP1(VceqzS32N, uint32_t, int32_t, vceqz_s32, vst1_u32, vld1_s32, 2, 2)
LOOP1(VceqzS64N, uint64_t, int64_t, vceqz_s64, vst1_u64, vld1_s64, 1, 1)
LOOP1(VceqzU8N, uint8_t, uint8_t, vceqz_u8, vst1_u8, vld1_u8, 8, 8)
LOOP1(VceqzU16N, uint16_t, uint16_t, vceqz_u16, vst1_u16, vld1_u16, 4, 4)
LOOP1(VceqzU32N, uint32_t, uint32_t, vceqz_u32, vst1_u32, vld1_u32, 2, 2)
LOOP1(VceqzU64N, uint64_t, uint64_t, vceqz_u64, vst1_u64, vld1_u64, 1, 1)
LOOP1(VceqzF32N, uint32_t, float32_t, vceqz_f32, vst1_u32, vld1_f32, 2, 2)
LOOP1(VceqzF64N, uint64_t, float64_t, vceqz_f64, vst1_u64, vld1_f64, 1, 1)
LOOP1(VceqzdS64N, uint64_t, int64_t, vceqzd_s64, save, load, 1, 1)
LOOP1(VceqzdU64N, uint64_t, uint64_t, vceqzd_u64, save, load, 1, 1)
LOOP1(VceqzdF64N, uint64_t, float64_t, vceqzd_f64, save, load, 1, 1)
LOOP1(VceqzqS8N, uint8_t, int8_t, vceqzq_s8, vst1q_u8, vld1q_s8, 16, 16)
LOOP1(VceqzqS16N, uint16_t, int16_t, vceqzq_s16, vst1q_u16, vld1q_s16, 8, 8)
LOOP1(VceqzqS32N, uint32_t, int32_t, vceqzq_s32, vst1q_u32, vld1q_s32, 4, 4)
LOOP1(VceqzqS64N, uint64_t, int64_t, vceqzq_s64, vst1q_u64, vld1q_s64, 2, 2)
LOOP1(VceqzqU8N, uint8_t, uint8_t, vceqzq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP1(VceqzqU16N, uint16_t, uint16_t, vceqzq_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP1(VceqzqU32N, uint32_t, uint32_t, vceqzq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP1(VceqzqU64N, uint64_t, uint64_t, vceqzq_u64, vst1q_u64, vld1q_u64, 2, 2)
LOOP1(VceqzqF32N, uint32_t, float32_t, vceqzq_f32, vst1q_u32, vld1q_f32, 4, 4)
LOOP1(VceqzqF64N, uint64_t, float64_t, vceqzq_f64, vst1q_u64, vld1q_f64, 2, 2)
LOOP1(VceqzsF32N, uint32_t, float32_t, vceqzs_f32, save, load, 1, 1)
LOOP1(VcgezS8N, uint8_t, int8_t, vcgez_s8, vst1_u8, vld1_s8, 8, 8)
LOOP1(VcgezS16N, uint16_t, int16_t, vcgez_s16, vst1_u16, vld1_s16, 4, 4)
LOOP1(VcgezS32N, uint32_t, int32_t, vcgez_s32, vst1_u32, vld1_s32, 2, 2)
LOOP1(VcgezS64N, uint64_t, int64_t, vcgez_s64, vst1_u64, vld1_s64, 1, 1)
LOOP1(VcgezF32N, uint32_t, float32_t, vcgez_f32, vst1_u32, vld1_f32, 2, 2)
LOOP1(VcgezF64N, uint64_t, float64_t, vcgez_f64, vst1_u64, vld1_f64, 1, 1)
LOOP1(VcgezdS64N, uint64_t, int64_t, vcgezd_s64, save, load, 1, 1)
LOOP1(VcgezdF64N, uint64_t, float64_t, vcgezd_f64, save, load, 1, 1)
LOOP1(VcgezqS8N, uint8_t, int8_t, vcgezq_s8, vst1q_u8, vld1q_s8, 16, 16)
LOOP1(VcgezqS16N, uint16_t, int16_t, vcgezq_s16, vst1q_u16, vld1q_s16, 8, 8)
LOOP1(VcgezqS32N, uint32_t, int32_t, vcgezq_s32, vst1q_u32, vld1q_s32, 4, 4)
LOOP1(VcgezqS64N, uint64_t, int64_t, vcgezq_s64, vst1q_u64, vld1q_s64, 2, 2)
LOOP1(VcgezqF32N, uint32_t, float32_t, vcgezq_f32, vst1q_u32, vld1q_f32, 4, 4)
LOOP1(VcgezqF64N, uint64_t, float64_t, vcgezq_f64, vst1q_u64, vld1q_f64, 2, 2)
LOOP1(VcgezsF32N, uint32_t, float32_t, vcgezs_f32, save, load, 1, 1)
LOOP1(VcgtzS8N, uint8_t, int8_t, vcgtz_s8, vst1_u8, vld1_s8, 8, 8)
LOOP1(VcgtzS16N, uint16_t, int16_t, vcgtz_s16, vst1_u16, vld1_s16, 4, 4)
LOOP1(VcgtzS32N, uint32_t, int32_t, vcgtz_s32, vst1_u32, vld1_s32, 2, 2)
LOOP1(VcgtzS64N, uint64_t, int64_t, vcgtz_s64, vst1_u64, vld1_s64, 1, 1)
LOOP1(VcgtzF32N, uint32_t, float32_t, vcgtz_f32, vst1_u32, vld1_f32, 2, 2)
LOOP1(VcgtzF64N, uint64_t, float64_t, vcgtz_f64, vst1_u64, vld1_f64, 1, 1)
LOOP1(VcgtzdS64N, uint64_t, int64_t, vcgtzd_s64, save, load, 1, 1)
LOOP1(VcgtzdF64N, uint64_t, float64_t, vcgtzd_f64, save, load, 1, 1)
LOOP1(VcgtzqS8N, uint8_t, int8_t, vcgtzq_s8, vst1q_u8, vld1q_s8, 16, 16)
LOOP1(VcgtzqS16N, uint16_t, int16_t, vcgtzq_s16, vst1q_u16, vld1q_s16, 8, 8)
LOOP1(VcgtzqS32N, uint32_t, int32_t, vcgtzq_s32, vst1q_u32, vld1q_s32, 4, 4)
LOOP1(VcgtzqS64N, uint64_t, int64_t, vcgtzq_s64, vst1q_u64, vld1q_s64, 2, 2)
LOOP1(VcgtzqF32N, uint32_t, float32_t, vcgtzq_f32, vst1q_u32, vld1q_f32, 4, 4)
LOOP1(VcgtzqF64N, uint64_t, float64_t, vcgtzq_f64, vst1q_u64, vld1q_f64, 2, 2)
LOOP1(VcgtzsF32N, uint32_t, float32_t, vcgtzs_f32, save, load, 1, 1)
LOOP1(VclezS8N, uint8_t, int8_t, vclez_s8, vst1_u8, vld1_s8, 8, 8)
LOOP1(VclezS16N, uint16_t, int16_t, vclez_s16, vst1_u16, vld1_s16, 4, 4)
LOOP1(VclezS32N, uint32_t, int32_t, vclez_s32, vst1_u32, vld1_s32, 2, 2)
LOOP1(VclezS64N, uint64_t, int64_t, vclez_s64, vst1_u64, vld1_s64, 1, 1)
LOOP1(VclezF32N, uint32_t, float32_t, vclez_f32, vst1_u32, vld1_f32, 2, 2)
LOOP1(VclezF64N, uint64_t, float64_t, vclez_f64, vst1_u64, vld1_f64, 1, 1)
LOOP1(VclezdS64N, uint64_t, int64_t, vclezd_s64, save, load, 1, 1)
LOOP1(VclezdF64N, uint64_t, float64_t, vclezd_f64, save, load, 1, 1)
LOOP1(VclezqS8N, uint8_t, int8_t, vclezq_s8, vst1q_u8, vld1q_s8, 16, 16)
LOOP1(VclezqS16N, uint16_t, int16_t, vclezq_s16, vst1q_u16, vld1q_s16, 8, 8)
LOOP1(VclezqS32N, uint32_t, int32_t, vclezq_s32, vst1q_u32, vld1q_s32, 4, 4)
LOOP1(VclezqS64N, uint64_t, int64_t, vclezq_s64, vst1q_u64, vld1q_s64, 2, 2)
LOOP1(VclezqF32N, uint32_t, float32_t, vclezq_f32, vst1q_u32, vld1q_f32, 4, 4)
LOOP1(VclezqF64N, uint64_t, float64_t, vclezq_f64, vst1q_u64, vld1q_f64, 2, 2)
LOOP1(VclezsF32N, uint32_t, float32_t, vclezs_f32, save, load, 1, 1)
LOOP1(VclsS8N, int8_t, int8_t, vcls_s8, vst1_s8, vld1_s8, 8, 8)
LOOP1(VclsS16N, int16_t, int16_t, vcls_s16, vst1_s16, vld1_s16, 4, 4)
LOOP1(VclsS32N, int32_t, int32_t, vcls_s32, vst1_s32, vld1_s32, 2, 2)
LOOP1(VclsU8N, int8_t, uint8_t, vcls_u8, vst1_s8, vld1_u8, 8, 8)
LOOP1(VclsU16N, int16_t, uint16_t, vcls_u16, vst1_s16, vld1_u16, 4, 4)
LOOP1(VclsU32N, int32_t, uint32_t, vcls_u32, vst1_s32, vld1_u32, 2, 2)
LOOP1(VclsqS8N, int8_t, int8_t, vclsq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP1(VclsqS16N, int16_t, int16_t, vclsq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP1(VclsqS32N, int32_t, int32_t, vclsq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP1(VclsqU8N, int8_t, uint8_t, vclsq_u8, vst1q_s8, vld1q_u8, 16, 16)
LOOP1(VclsqU16N, int16_t, uint16_t, vclsq_u16, vst1q_s16, vld1q_u16, 8, 8)
LOOP1(VclsqU32N, int32_t, uint32_t, vclsq_u32, vst1q_s32, vld1q_u32, 4, 4)
LOOP1(VcltzS8N, uint8_t, int8_t, vcltz_s8, vst1_u8, vld1_s8, 8, 8)
LOOP1(VcltzS16N, uint16_t, int16_t, vcltz_s16, vst1_u16, vld1_s16, 4, 4)
LOOP1(VcltzS32N, uint32_t, int32_t, vcltz_s32, vst1_u32, vld1_s32, 2, 2)
LOOP1(VcltzS64N, uint64_t, int64_t, vcltz_s64, vst1_u64, vld1_s64, 1, 1)
LOOP1(VcltzF32N, uint32_t, float32_t, vcltz_f32, vst1_u32, vld1_f32, 2, 2)
LOOP1(VcltzF64N, uint64_t, float64_t, vcltz_f64, vst1_u64, vld1_f64, 1, 1)
LOOP1(VcltzdS64N, uint64_t, int64_t, vcltzd_s64, save, load, 1, 1)
LOOP1(VcltzdF64N, uint64_t, float64_t, vcltzd_f64, save, load, 1, 1)
LOOP1(VcltzqS8N, uint8_t, int8_t, vcltzq_s8, vst1q_u8, vld1q_s8, 16, 16)
LOOP1(VcltzqS16N, uint16_t, int16_t, vcltzq_s16, vst1q_u16, vld1q_s16, 8, 8)
LOOP1(VcltzqS32N, uint32_t, int32_t, vcltzq_s32, vst1q_u32, vld1q_s32, 4, 4)
LOOP1(VcltzqS64N, uint64_t, int64_t, vcltzq_s64, vst1q_u64, vld1q_s64, 2, 2)
LOOP1(VcltzqF32N, uint32_t, float32_t, vcltzq_f32, vst1q_u32, vld1q_f32, 4, 4)
LOOP1(VcltzqF64N, uint64_t, float64_t, vcltzq_f64, vst1q_u64, vld1q_f64, 2, 2)
LOOP1(VcltzsF32N, uint32_t, float32_t, vcltzs_f32, save, load, 1, 1)
LOOP1(VclzS8N, int8_t, int8_t, vclz_s8, vst1_s8, vld1_s8, 8, 8)
LOOP1(VclzS16N, int16_t, int16_t, vclz_s16, vst1_s16, vld1_s16, 4, 4)
LOOP1(VclzS32N, int32_t, int32_t, vclz_s32, vst1_s32, vld1_s32, 2, 2)
LOOP1(VclzU8N, uint8_t, uint8_t, vclz_u8, vst1_u8, vld1_u8, 8, 8)
LOOP1(VclzU16N, uint16_t, uint16_t, vclz_u16, vst1_u16, vld1_u16, 4, 4)
LOOP1(VclzU32N, uint32_t, uint32_t, vclz_u32, vst1_u32, vld1_u32, 2, 2)
LOOP1(VclzqS8N, int8_t, int8_t, vclzq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP1(VclzqS16N, int16_t, int16_t, vclzq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP1(VclzqS32N, int32_t, int32_t, vclzq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP1(VclzqU8N, uint8_t, uint8_t, vclzq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP1(VclzqU16N, uint16_t, uint16_t, vclzq_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP1(VclzqU32N, uint32_t, uint32_t, vclzq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP1(VcntS8N, int8_t, int8_t, vcnt_s8, vst1_s8, vld1_s8, 8, 8)
LOOP1(VcntU8N, uint8_t, uint8_t, vcnt_u8, vst1_u8, vld1_u8, 8, 8)
LOOP1(VcntqS8N, int8_t, int8_t, vcntq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP1(VcntqU8N, uint8_t, uint8_t, vcntq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP1(VcvtF32S32N, float32_t, int32_t, vcvt_f32_s32, vst1_f32, vld1_s32, 2, 2)
LOOP1(VcvtF32U32N, float32_t, uint32_t, vcvt_f32_u32, vst1_f32, vld1_u32, 2, 2)
LOOP1(VcvtF64S64N, float64_t, int64_t, vcvt_f64_s64, vst1_f64, vld1_s64, 1, 1)
LOOP1(VcvtF64U64N, float64_t, uint64_t, vcvt_f64_u64, vst1_f64, vld1_u64, 1, 1)
LOOP1(VcvtS32F32N, int32_t, float32_t, vcvt_s32_f32, vst1_s32, vld1_f32, 2, 2)
LOOP1(VcvtS64F64N, int64_t, float64_t, vcvt_s64_f64, vst1_s64, vld1_f64, 1, 1)
LOOP1(VcvtU32F32N, uint32_t, float32_t, vcvt_u32_f32, vst1_u32, vld1_f32, 2, 2)
LOOP1(VcvtU64F64N, uint64_t, float64_t, vcvt_u64_f64, vst1_u64, vld1_f64, 1, 1)
LOOP1(VcvtaS32F32N, int32_t, float32_t, vcvta_s32_f32, vst1_s32, vld1_f32, 2, 2)
LOOP1(VcvtaS64F64N, int64_t, float64_t, vcvta_s64_f64, vst1_s64, vld1_f64, 1, 1)
LOOP1(VcvtaU32F32N, uint32_t, float32_t, vcvta_u32_f32, vst1_u32, vld1_f32, 2, 2)
LOOP1(VcvtaU64F64N, uint64_t, float64_t, vcvta_u64_f64, vst1_u64, vld1_f64, 1, 1)
LOOP1(VcvtadS64F64N, int64_t, float64_t, vcvtad_s64_f64, save, load, 1, 1)
LOOP1(VcvtadU64F64N, uint64_t, float64_t, vcvtad_u64_f64, save, load, 1, 1)
LOOP1(VcvtaqS32F32N, int32_t, float32_t, vcvtaq_s32_f32, vst1q_s32, vld1q_f32, 4, 4)
LOOP1(VcvtaqS64F64N, int64_t, float64_t, vcvtaq_s64_f64, vst1q_s64, vld1q_f64, 2, 2)
LOOP1(VcvtaqU32F32N, uint32_t, float32_t, vcvtaq_u32_f32, vst1q_u32, vld1q_f32, 4, 4)
LOOP1(VcvtaqU64F64N, uint64_t, float64_t, vcvtaq_u64_f64, vst1q_u64, vld1q_f64, 2, 2)
LOOP1(VcvtasS32F32N, int32_t, float32_t, vcvtas_s32_f32, save, load, 1, 1)
LOOP1(VcvtasU32F32N, uint32_t, float32_t, vcvtas_u32_f32, save, load, 1, 1)
LOOP1(VcvtdF64S64N, float64_t, int64_t, vcvtd_f64_s64, save, load, 1, 1)
LOOP1(VcvtdF64U64N, float64_t, uint64_t, vcvtd_f64_u64, save, load, 1, 1)
LOOP1(VcvtdS64F64N, int64_t, float64_t, vcvtd_s64_f64, save, load, 1, 1)
LOOP1(VcvtdU64F64N, uint64_t, float64_t, vcvtd_u64_f64, save, load, 1, 1)
LOOP1(VcvtmS32F32N, int32_t, float32_t, vcvtm_s32_f32, vst1_s32, vld1_f32, 2, 2)
LOOP1(VcvtmS64F64N, int64_t, float64_t, vcvtm_s64_f64, vst1_s64, vld1_f64, 1, 1)
LOOP1(VcvtmU32F32N, uint32_t, float32_t, vcvtm_u32_f32, vst1_u32, vld1_f32, 2, 2)
LOOP1(VcvtmU64F64N, uint64_t, float64_t, vcvtm_u64_f64, vst1_u64, vld1_f64, 1, 1)
LOOP1(VcvtmdS64F64N, int64_t, float64_t, vcvtmd_s64_f64, save, load, 1, 1)
LOOP1(VcvtmdU64F64N, uint64_t, float64_t, vcvtmd_u64_f64, save, load, 1, 1)
LOOP1(VcvtmqS32F32N, int32_t, float32_t, vcvtmq_s32_f32, vst1q_s32, vld1q_f32, 4, 4)
LOOP1(VcvtmqS64F64N, int64_t, float64_t, vcvtmq_s64_f64, vst1q_s64, vld1q_f64, 2, 2)
LOOP1(VcvtmqU32F32N, uint32_t, float32_t, vcvtmq_u32_f32, vst1q_u32, vld1q_f32, 4, 4)
LOOP1(VcvtmqU64F64N, uint64_t, float64_t, vcvtmq_u64_f64, vst1q_u64, vld1q_f64, 2, 2)
LOOP1(VcvtmsS32F32N, int32_t, float32_t, vcvtms_s32_f32, save, load, 1, 1)
LOOP1(VcvtmsU32F32N, uint32_t, float32_t, vcvtms_u32_f32, save, load, 1, 1)
LOOP1(VcvtnS32F32N, int32_t, float32_t, vcvtn_s32_f32, vst1_s32, vld1_f32, 2, 2)
LOOP1(VcvtnS64F64N, int64_t, float64_t, vcvtn_s64_f64, vst1_s64, vld1_f64, 1, 1)
LOOP1(VcvtnU32F32N, uint32_t, float32_t, vcvtn_u32_f32, vst1_u32, vld1_f32, 2, 2)
LOOP1(VcvtnU64F64N, uint64_t, float64_t, vcvtn_u64_f64, vst1_u64, vld1_f64, 1, 1)
LOOP1(VcvtndS64F64N, int64_t, float64_t, vcvtnd_s64_f64, save, load, 1, 1)
LOOP1(VcvtndU64F64N, uint64_t, float64_t, vcvtnd_u64_f64, save, load, 1, 1)
LOOP1(VcvtnqS32F32N, int32_t, float32_t, vcvtnq_s32_f32, vst1q_s32, vld1q_f32, 4, 4)
LOOP1(VcvtnqS64F64N, int64_t, float64_t, vcvtnq_s64_f64, vst1q_s64, vld1q_f64, 2, 2)
LOOP1(VcvtnqU32F32N, uint32_t, float32_t, vcvtnq_u32_f32, vst1q_u32, vld1q_f32, 4, 4)
LOOP1(VcvtnqU64F64N, uint64_t, float64_t, vcvtnq_u64_f64, vst1q_u64, vld1q_f64, 2, 2)
LOOP1(VcvtnsS32F32N, int32_t, float32_t, vcvtns_s32_f32, save, load, 1, 1)
LOOP1(VcvtnsU32F32N, uint32_t, float32_t, vcvtns_u32_f32, save, load, 1, 1)
LOOP1(VcvtpS32F32N, int32_t, float32_t, vcvtp_s32_f32, vst1_s32, vld1_f32, 2, 2)
LOOP1(VcvtpS64F64N, int64_t, float64_t, vcvtp_s64_f64, vst1_s64, vld1_f64, 1, 1)
LOOP1(VcvtpU32F32N, uint32_t, float32_t, vcvtp_u32_f32, vst1_u32, vld1_f32, 2, 2)
LOOP1(VcvtpU64F64N, uint64_t, float64_t, vcvtp_u64_f64, vst1_u64, vld1_f64, 1, 1)
LOOP1(VcvtpdS64F64N, int64_t, float64_t, vcvtpd_s64_f64, save, load, 1, 1)
LOOP1(VcvtpdU64F64N, uint64_t, float64_t, vcvtpd_u64_f64, save, load, 1, 1)
LOOP1(VcvtpqS32F32N, int32_t, float32_t, vcvtpq_s32_f32, vst1q_s32, vld1q_f32, 4, 4)
LOOP1(VcvtpqS64F64N, int64_t, float64_t, vcvtpq_s64_f64, vst1q_s64, vld1q_f64, 2, 2)
LOOP1(VcvtpqU32F32N, uint32_t, float32_t, vcvtpq_u32_f32, vst1q_u32, vld1q_f32, 4, 4)
LOOP1(VcvtpqU64F64N, uint64_t, float64_t, vcvtpq_u64_f64, vst1q_u64, vld1q_f64, 2, 2)
LOOP1(VcvtpsS32F32N, int32_t, float32_t, vcvtps_s32_f32, save, load, 1, 1)
LOOP1(VcvtpsU32F32N, uint32_t, float32_t, vcvtps_u32_f32, save, load, 1, 1)
LOOP1(VcvtqF32S32N, float32_t, int32_t, vcvtq_f32_s32, vst1q_f32, vld1q_s32, 4, 4)
LOOP1(VcvtqF32U32N, float32_t, uint32_t, vcvtq_f32_u32, vst1q_f32, vld1q_u32, 4, 4)
LOOP1(VcvtqF64S64N, float64_t, int64_t, vcvtq_f64_s64, vst1q_f64, vld1q_s64, 2, 2)
LOOP1(VcvtqF64U64N, float64_t, uint64_t, vcvtq_f64_u64, vst1q_f64, vld1q_u64, 2, 2)
LOOP1(VcvtqS32F32N, int32_t, float32_t, vcvtq_s32_f32, vst1q_s32, vld1q_f32, 4, 4)
LOOP1(VcvtqS64F64N, int64_t, float64_t, vcvtq_s64_f64, vst1q_s64, vld1q_f64, 2, 2)
LOOP1(VcvtqU32F32N, uint32_t, float32_t, vcvtq_u32_f32, vst1q_u32, vld1q_f32, 4, 4)
LOOP1(VcvtqU64F64N, uint64_t, float64_t, vcvtq_u64_f64, vst1q_u64, vld1q_f64, 2, 2)
LOOP1(VcvtsF32S32N, float32_t, int32_t, vcvts_f32_s32, save, load, 1, 1)
LOOP1(VcvtsF32U32N, float32_t, uint32_t, vcvts_f32_u32, save, load, 1, 1)
LOOP1(VcvtsS32F32N, int32_t, float32_t, vcvts_s32_f32, save, load, 1, 1)
LOOP1(VcvtsU32F32N, uint32_t, float32_t, vcvts_u32_f32, save, load, 1, 1)
LOOP1(VdupNS8N, int8_t, int8_t, vdup_n_s8, vst1_s8, load, 8, 1)
LOOP1(VdupNS16N, int16_t, int16_t, vdup_n_s16, vst1_s16, load, 4, 1)
LOOP1(VdupNS32N, int32_t, int32_t, vdup_n_s32, vst1_s32, load, 2, 1)
LOOP1(VdupNS64N, int64_t, int64_t, vdup_n_s64, vst1_s64, load, 1, 1)
LOOP1(VdupNU8N, uint8_t, uint8_t, vdup_n_u8, vst1_u8, load, 8, 1)
LOOP1(VdupNU16N, uint16_t, uint16_t, vdup_n_u16, vst1_u16, load, 4, 1)
LOOP1(VdupNU32N, uint32_t, uint32_t, vdup_n_u32, vst1_u32, load, 2, 1)
LOOP1(VdupNU64N, uint64_t, uint64_t, vdup_n_u64, vst1_u64, load, 1, 1)
LOOP1(VdupNF32N, float32_t, float32_t, vdup_n_f32, vst1_f32, load, 2, 1)
LOOP1(VdupNF64N, float64_t, float64_t, vdup_n_f64, vst1_f64, load, 1, 1)
LOOP1(VdupqNS8N, int8_t, int8_t, vdupq_n_s8, vst1q_s8, load, 16, 1)
LOOP1(VdupqNS16N, int16_t, int16_t, vdupq_n_s16, vst1q_s16, load, 8, 1)
LOOP1(VdupqNS32N, int32_t, int32_t, vdupq_n_s32, vst1q_s32, load, 4, 1)
LOOP1(VdupqNS64N, int64_t, int64_t, vdupq_n_s64, vst1q_s64, load, 2, 1)
LOOP1(VdupqNU8N, uint8_t, uint8_t, vdupq_n_u8, vst1q_u8, load, 16, 1)
LOOP1(VdupqNU16N, uint16_t, uint16_t, vdupq_n_u16, vst1q_u16, load, 8, 1)
LOOP1(VdupqNU32N, uint32_t, uint32_t, vdupq_n_u32, vst1q_u32, load, 4, 1)
LOOP1(VdupqNU64N, uint64_t, uint64_t, vdupq_n_u64, vst1q_u64, load, 2, 1)
LOOP1(VdupqNF32N, float32_t, float32_t, vdupq_n_f32, vst1q_f32, load, 4, 1)
LOOP1(VdupqNF64N, float64_t, float64_t, vdupq_n_f64, vst1q_f64, load, 2, 1)
LOOP1(VgetHighS8N, int8_t, int8_t, vget_high_s8, vst1_s8, vld1q_s8, 8, 16)
LOOP1(VgetHighS16N, int16_t, int16_t, vget_high_s16, vst1_s16, vld1q_s16, 4, 8)
LOOP1(VgetHighS32N, int32_t, int32_t, vget_high_s32, vst1_s32, vld1q_s32, 2, 4)
LOOP1(VgetHighS64N, int64_t, int64_t, vget_high_s64, vst1_s64, vld1q_s64, 1, 2)
LOOP1(VgetHighU8N, uint8_t, uint8_t, vget_high_u8, vst1_u8, vld1q_u8, 8, 16)
LOOP1(VgetHighU16N, uint16_t, uint16_t, vget_high_u16, vst1_u16, vld1q_u16, 4, 8)
LOOP1(VgetHighU32N, uint32_t, uint32_t, vget_high_u32, vst1_u32, vld1q_u32, 2, 4)
LOOP1(VgetHighU64N, uint64_t, uint64_t, vget_high_u64, vst1_u64, vld1q_u64, 1, 2)
LOOP1(VgetHighF32N, float32_t, float32_t, vget_high_f32, vst1_f32, vld1q_f32, 2, 4)
LOOP1(VgetHighF64N, float64_t, float64_t, vget_high_f64, vst1_f64, vld1q_f64, 1, 2)
LOOP1(VgetLowS8N, int8_t, int8_t, vget_low_s8, vst1_s8, vld1q_s8, 8, 16)
LOOP1(VgetLowS16N, int16_t, int16_t, vget_low_s16, vst1_s16, vld1q_s16, 4, 8)
LOOP1(VgetLowS32N, int32_t, int32_t, vget_low_s32, vst1_s32, vld1q_s32, 2, 4)
LOOP1(VgetLowS64N, int64_t, int64_t, vget_low_s64, vst1_s64, vld1q_s64, 1, 2)
LOOP1(VgetLowU8N, uint8_t, uint8_t, vget_low_u8, vst1_u8, vld1q_u8, 8, 16)
LOOP1(VgetLowU16N, uint16_t, uint16_t, vget_low_u16, vst1_u16, vld1q_u16, 4, 8)
LOOP1(VgetLowU32N, uint32_t, uint32_t, vget_low_u32, vst1_u32, vld1q_u32, 2, 4)
LOOP1(VgetLowU64N, uint64_t, uint64_t, vget_low_u64, vst1_u64, vld1q_u64, 1, 2)
LOOP1(VgetLowF32N, float32_t, float32_t, vget_low_f32, vst1_f32, vld1q_f32, 2, 4)
LOOP1(VgetLowF64N, float64_t, float64_t, vget_low_f64, vst1_f64, vld1q_f64, 1, 2)
LOOP1(VmaxnmvF32N, float32_t, float32_t, vmaxnmv_f32, save, vld1_f32, 1, 2)
LOOP1(VmaxnmvqF32N, float32_t, float32_t, vmaxnmvq_f32, save, vld1q_f32, 1, 4)
LOOP1(VmaxnmvqF64N, float64_t, float64_t, vmaxnmvq_f64, save, vld1q_f64, 1, 2)
LOOP1(VmaxvS8N, int8_t, int8_t, vmaxv_s8, save, vld1_s8, 1, 8)
LOOP1(VmaxvS16N, int16_t, int16_t, vmaxv_s16, save, vld1_s16, 1, 4)
LOOP1(VmaxvS32N, int32_t, int32_t, vmaxv_s32, save, vld1_s32, 1, 2)
LOOP1(VmaxvU8N, uint8_t, uint8_t, vmaxv_u8, save, vld1_u8, 1, 8)
LOOP1(VmaxvU16N, uint16_t, uint16_t, vmaxv_u16, save, vld1_u16, 1, 4)
LOOP1(VmaxvU32N, uint32_t, uint32_t, vmaxv_u32, save, vld1_u32, 1, 2)
LOOP1(VmaxvF32N, float32_t, float32_t, vmaxv_f32, save, vld1_f32, 1, 2)
LOOP1(VmaxvqS8N, int8_t, int8_t, vmaxvq_s8, save, vld1q_s8, 1, 16)
LOOP1(VmaxvqS16N, int16_t, int16_t, vmaxvq_s16, save, vld1q_s16, 1, 8)
LOOP1(VmaxvqS32N, int32_t, int32_t, vmaxvq_s32, save, vld1q_s32, 1, 4)
LOOP1(VmaxvqU8N, uint8_t, uint8_t, vmaxvq_u8, save, vld1q_u8, 1, 16)
LOOP1(VmaxvqU16N, uint16_t, uint16_t, vmaxvq_u16, save, vld1q_u16, 1, 8)
LOOP1(VmaxvqU32N, uint32_t, uint32_t, vmaxvq_u32, save, vld1q_u32, 1, 4)
LOOP1(VmaxvqF32N, float32_t, float32_t, vmaxvq_f32, save, vld1q_f32, 1, 4)
LOOP1(VmaxvqF64N, float64_t, float64_t, vmaxvq_f64, save, vld1q_f64, 1, 2)
LOOP1(VminnmvF32N, float32_t, float32_t, vminnmv_f32, save, vld1_f32, 1, 2)
LOOP1(VminnmvqF32N, float32_t, float32_t, vminnmvq_f32, save, vld1q_f32, 1, 4)
LOOP1(VminnmvqF64N, float64_t, float64_t, vminnmvq_f64, save, vld1q_f64, 1, 2)
LOOP1(VminvS8N, int8_t, int8_t, vminv_s8, save, vld1_s8, 1, 8)
LOOP1(VminvS16N, int16_t, int16_t, vminv_s16, save, vld1_s16, 1, 4)
LOOP1(VminvS32N, int32_t, int32_t, vminv_s32, save, vld1_s32, 1, 2)
LOOP1(VminvU8N, uint8_t, uint8_t, vminv_u8, save, vld1_u8, 1, 8)
LOOP1(VminvU16N, uint16_t, uint16_t, vminv_u16, save, vld1_u16, 1, 4)
LOOP1(VminvU32N, uint32_t, uint32_t, vminv_u32, save, vld1_u32, 1, 2)
LOOP1(VminvF32N, float32_t, float32_t, vminv_f32, save, vld1_f32, 1, 2)
LOOP1(VminvqS8N, int8_t, int8_t, vminvq_s8, save, vld1q_s8, 1, 16)
LOOP1(VminvqS16N, int16_t, int16_t, vminvq_s16, save, vld1q_s16, 1, 8)
LOOP1(VminvqS32N, int32_t, int32_t, vminvq_s32, save, vld1q_s32, 1, 4)
LOOP1(VminvqU8N, uint8_t, uint8_t, vminvq_u8, save, vld1q_u8, 1, 16)
LOOP1(VminvqU16N, uint16_t, uint16_t, vminvq_u16, save, vld1q_u16, 1, 8)
LOOP1(VminvqU32N, uint32_t, uint32_t, vminvq_u32, save, vld1q_u32, 1, 4)
LOOP1(VminvqF32N, float32_t, float32_t, vminvq_f32, save, vld1q_f32, 1, 4)
LOOP1(VminvqF64N, float64_t, float64_t, vminvq_f64, save, vld1q_f64, 1, 2)
LOOP1(VmovNS8N, int8_t, int8_t, vmov_n_s8, vst1_s8, load, 8, 1)
LOOP1(VmovNS16N, int16_t, int16_t, vmov_n_s16, vst1_s16, load, 4, 1)
LOOP1(VmovNS32N, int32_t, int32_t, vmov_n_s32, vst1_s32, load, 2, 1)
LOOP1(VmovNS64N, int64_t, int64_t, vmov_n_s64, vst1_s64, load, 1, 1)
LOOP1(VmovNU8N, uint8_t, uint8_t, vmov_n_u8, vst1_u8, load, 8, 1)
LOOP1(VmovNU16N, uint16_t, uint16_t, vmov_n_u16, vst1_u16, load, 4, 1)
LOOP1(VmovNU32N, uint32_t, uint32_t, vmov_n_u32, vst1_u32, load, 2, 1)
LOOP1(VmovNU64N, uint64_t, uint64_t, vmov_n_u64, vst1_u64, load, 1, 1)
LOOP1(VmovNF32N, float32_t, float32_t, vmov_n_f32, vst1_f32, load, 2, 1)
LOOP1(VmovNF64N, float64_t, float64_t, vmov_n_f64, vst1_f64, load, 1, 1)
LOOP1(VmovqNS8N, int8_t, int8_t, vmovq_n_s8, vst1q_s8, load, 16, 1)
LOOP1(VmovqNS16N, int16_t, int16_t, vmovq_n_s16, vst1q_s16, load, 8, 1)
LOOP1(VmovqNS32N, int32_t, int32_t, vmovq_n_s32, vst1q_s32, load, 4, 1)
LOOP1(VmovqNS64N, int64_t, int64_t, vmovq_n_s64, vst1q_s64, load, 2, 1)
LOOP1(VmovqNU8N, uint8_t, uint8_t, vmovq_n_u8, vst1q_u8, load, 16, 1)
LOOP1(VmovqNU16N, uint16_t, uint16_t, vmovq_n_u16, vst1q_u16, load, 8, 1)
LOOP1(VmovqNU32N, uint32_t, uint32_t, vmovq_n_u32, vst1q_u32, load, 4, 1)
LOOP1(VmovqNU64N, uint64_t, uint64_t, vmovq_n_u64, vst1q_u64, load, 2, 1)
LOOP1(VmovqNF32N, float32_t, float32_t, vmovq_n_f32, vst1q_f32, load, 4, 1)
LOOP1(VmovqNF64N, float64_t, float64_t, vmovq_n_f64, vst1q_f64, load, 2, 1)
LOOP1(VmvnS8N, int8_t, int8_t, vmvn_s8, vst1_s8, vld1_s8, 8, 8)
LOOP1(VmvnS16N, int16_t, int16_t, vmvn_s16, vst1_s16, vld1_s16, 4, 4)
LOOP1(VmvnS32N, int32_t, int32_t, vmvn_s32, vst1_s32, vld1_s32, 2, 2)
LOOP1(VmvnU8N, uint8_t, uint8_t, vmvn_u8, vst1_u8, vld1_u8, 8, 8)
LOOP1(VmvnU16N, uint16_t, uint16_t, vmvn_u16, vst1_u16, vld1_u16, 4, 4)
LOOP1(VmvnU32N, uint32_t, uint32_t, vmvn_u32, vst1_u32, vld1_u32, 2, 2)
LOOP1(VmvnqS8N, int8_t, int8_t, vmvnq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP1(VmvnqS16N, int16_t, int16_t, vmvnq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP1(VmvnqS32N, int32_t, int32_t, vmvnq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP1(VmvnqU8N, uint8_t, uint8_t, vmvnq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP1(VmvnqU16N, uint16_t, uint16_t, vmvnq_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP1(VmvnqU32N, uint32_t, uint32_t, vmvnq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP1(VnegS8N, int8_t, int8_t, vneg_s8, vst1_s8, vld1_s8, 8, 8)
LOOP1(VnegS16N, int16_t, int16_t, vneg_s16, vst1_s16, vld1_s16, 4, 4)
LOOP1(VnegS32N, int32_t, int32_t, vneg_s32, vst1_s32, vld1_s32, 2, 2)
LOOP1(VnegS64N, int64_t, int64_t, vneg_s64, vst1_s64, vld1_s64, 1, 1)
LOOP1(VnegF32N, float32_t, float32_t, vneg_f32, vst1_f32, vld1_f32, 2, 2)
LOOP1(VnegF64N, float64_t, float64_t, vneg_f64, vst1_f64, vld1_f64, 1, 1)
LOOP1(VnegdS64N, int64_t, int64_t, vnegd_s64, save, load, 1, 1)
LOOP1(VnegqS8N, int8_t, int8_t, vnegq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP1(VnegqS16N, int16_t, int16_t, vnegq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP1(VnegqS32N, int32_t, int32_t, vnegq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP1(VnegqS64N, int64_t, int64_t, vnegq_s64, vst1q_s64, vld1q_s64, 2, 2)
LOOP1(VnegqF32N, float32_t, float32_t, vnegq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP1(VnegqF64N, float64_t, float64_t, vnegq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP1(VpadddS64N, int64_t, int64_t, vpaddd_s64, save, vld1q_s64, 1, 2)
LOOP1(VpadddU64N, uint64_t, uint64_t, vpaddd_u64, save, vld1q_u64, 1, 2)
LOOP1(VpadddF64N, float64_t, float64_t, vpaddd_f64, save, vld1q_f64, 1, 2)
LOOP1(VpaddsF32N, float32_t, float32_t, vpadds_f32, save, vld1_f32, 1, 2)
LOOP1(VpmaxnmqdF64N, float64_t, float64_t, vpmaxnmqd_f64, save, vld1q_f64, 1, 2)
LOOP1(VpmaxnmsF32N, float32_t, float32_t, vpmaxnms_f32, save, vld1_f32, 1, 2)
LOOP1(VpmaxqdF64N, float64_t, float64_t, vpmaxqd_f64, save, vld1q_f64, 1, 2)
LOOP1(VpmaxsF32N, float32_t, float32_t, vpmaxs_f32, save, vld1_f32, 1, 2)
LOOP1(VpminnmqdF64N, float64_t, float64_t, vpminnmqd_f64, save, vld1q_f64, 1, 2)
LOOP1(VpminnmsF32N, float32_t, float32_t, vpminnms_f32, save, vld1_f32, 1, 2)
LOOP1(VpminqdF64N, float64_t, float64_t, vpminqd_f64, save, vld1q_f64, 1, 2)
LOOP1(VpminsF32N, float32_t, float32_t, vpmins_f32, save, vld1_f32, 1, 2)
LOOP1(VqabsS8N, int8_t, int8_t, vqabs_s8, vst1_s8, vld1_s8, 8, 8)
LOOP1(VqabsS16N, int16_t, int16_t, vqabs_s16, vst1_s16, vld1_s16, 4, 4)
LOOP1(VqabsS32N, int32_t, int32_t, vqabs_s32, vst1_s32, vld1_s32, 2, 2)
LOOP1(VqabsS64N, int64_t, int64_t, vqabs_s64, vst1_s64, vld1_s64, 1, 1)
LOOP1(VqabsbS8N, int8_t, int8_t, vqabsb_s8, save, load, 1, 1)
LOOP1(VqabsdS64N, int64_t, int64_t, vqabsd_s64, save, load, 1, 1)
LOOP1(VqabshS16N, int16_t, int16_t, vqabsh_s16, save, load, 1, 1)
LOOP1(VqabsqS8N, int8_t, int8_t, vqabsq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP1(VqabsqS16N, int16_t, int16_t, vqabsq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP1(VqabsqS32N, int32_t, int32_t, vqabsq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP1(VqabsqS64N, int64_t, int64_t, vqabsq_s64, vst1q_s64, vld1q_s64, 2, 2)
LOOP1(VqabssS32N, int32_t, int32_t, vqabss_s32, save, load, 1, 1)
LOOP1(VqnegS8N, int8_t, int8_t, vqneg_s8, vst1_s8, vld1_s8, 8, 8)
LOOP1(VqnegS16N, int16_t, int16_t, vqneg_s16, vst1_s16, vld1_s16, 4, 4)
LOOP1(VqnegS32N, int32_t, int32_t, vqneg_s32, vst1_s32, vld1_s32, 2, 2)
LOOP1(VqnegS64N, int64_t, int64_t, vqneg_s64, vst1_s64, vld1_s64, 1, 1)
LOOP1(VqnegbS8N, int8_t, int8_t, vqnegb_s8, save, load, 1, 1)
LOOP1(VqnegdS64N, int64_t, int64_t, vqnegd_s64, save, load, 1, 1)
LOOP1(VqneghS16N, int16_t, int16_t, vqnegh_s16, save, load, 1, 1)
LOOP1(VqnegqS8N, int8_t, int8_t, vqnegq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP1(VqnegqS16N, int16_t, int16_t, vqnegq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP1(VqnegqS32N, int32_t, int32_t, vqnegq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP1(VqnegqS64N, int64_t, int64_t, vqnegq_s64, vst1q_s64, vld1q_s64, 2, 2)
LOOP1(VqnegsS32N, int32_t, int32_t, vqnegs_s32, save, load, 1, 1)
LOOP1(VrbitS8N, int8_t, int8_t, vrbit_s8, vst1_s8, vld1_s8, 8, 8)
LOOP1(VrbitU8N, uint8_t, uint8_t, vrbit_u8, vst1_u8, vld1_u8, 8, 8)
LOOP1(VrbitqS8N, int8_t, int8_t, vrbitq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP1(VrbitqU8N, uint8_t, uint8_t, vrbitq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP1(VrecpeU32N, uint32_t, uint32_t, vrecpe_u32, vst1_u32, vld1_u32, 2, 2)
LOOP1(VrecpeF32N, float32_t, float32_t, vrecpe_f32, vst1_f32, vld1_f32, 2, 2)
LOOP1(VrecpeF64N, float64_t, float64_t, vrecpe_f64, vst1_f64, vld1_f64, 1, 1)
LOOP1(VrecpedF64N, float64_t, float64_t, vrecped_f64, save, load, 1, 1)
LOOP1(VrecpeqU32N, uint32_t, uint32_t, vrecpeq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP1(VrecpeqF32N, float32_t, float32_t, vrecpeq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP1(VrecpeqF64N, float64_t, float64_t, vrecpeq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP1(VrecpesF32N, float32_t, float32_t, vrecpes_f32, save, load, 1, 1)
LOOP1(VrecpxdF64N, float64_t, float64_t, vrecpxd_f64, save, load, 1, 1)
LOOP1(VrecpxsF32N, float32_t, float32_t, vrecpxs_f32, save, load, 1, 1)
LOOP1(VreinterpretF32S32N, float32_t, int32_t, vreinterpret_f32_s32, vst1_f32, vld1_s32, 2, 2)
LOOP1(VreinterpretF32U32N, float32_t, uint32_t, vreinterpret_f32_u32, vst1_f32, vld1_u32, 2, 2)
LOOP1(VreinterpretF64S64N, float64_t, int64_t, vreinterpret_f64_s64, vst1_f64, vld1_s64, 1, 1)
LOOP1(VreinterpretF64U64N, float64_t, uint64_t, vreinterpret_f64_u64, vst1_f64, vld1_u64, 1, 1)
LOOP1(VreinterpretS16U16N, int16_t, uint16_t, vreinterpret_s16_u16, vst1_s16, vld1_u16, 4, 4)
LOOP1(VreinterpretS32U32N, int32_t, uint32_t, vreinterpret_s32_u32, vst1_s32, vld1_u32, 2, 2)
LOOP1(VreinterpretS32F32N, int32_t, float32_t, vreinterpret_s32_f32, vst1_s32, vld1_f32, 2, 2)
LOOP1(VreinterpretS64U64N, int64_t, uint64_t, vreinterpret_s64_u64, vst1_s64, vld1_u64, 1, 1)
LOOP1(VreinterpretS64F64N, int64_t, float64_t, vreinterpret_s64_f64, vst1_s64, vld1_f64, 1, 1)
LOOP1(VreinterpretS8U8N, int8_t, uint8_t, vreinterpret_s8_u8, vst1_s8, vld1_u8, 8, 8)
LOOP1(VreinterpretU16S16N, uint16_t, int16_t, vreinterpret_u16_s16, vst1_u16, vld1_s16, 4, 4)
LOOP1(VreinterpretU32S32N, uint32_t, int32_t, vreinterpret_u32_s32, vst1_u32, vld1_s32, 2, 2)
LOOP1(VreinterpretU32F32N, uint32_t, float32_t, vreinterpret_u32_f32, vst1_u32, vld1_f32, 2, 2)
LOOP1(VreinterpretU64S64N, uint64_t, int64_t, vreinterpret_u64_s64, vst1_u64, vld1_s64, 1, 1)
LOOP1(VreinterpretU64F64N, uint64_t, float64_t, vreinterpret_u64_f64, vst1_u64, vld1_f64, 1, 1)
LOOP1(VreinterpretU8S8N, uint8_t, int8_t, vreinterpret_u8_s8, vst1_u8, vld1_s8, 8, 8)
LOOP1(VreinterpretqF32S32N, float32_t, int32_t, vreinterpretq_f32_s32, vst1q_f32, vld1q_s32, 4, 4)
LOOP1(VreinterpretqF32U32N, float32_t, uint32_t, vreinterpretq_f32_u32, vst1q_f32, vld1q_u32, 4, 4)
LOOP1(VreinterpretqF64S64N, float64_t, int64_t, vreinterpretq_f64_s64, vst1q_f64, vld1q_s64, 2, 2)
LOOP1(VreinterpretqF64U64N, float64_t, uint64_t, vreinterpretq_f64_u64, vst1q_f64, vld1q_u64, 2, 2)
LOOP1(VreinterpretqS16U16N, int16_t, uint16_t, vreinterpretq_s16_u16, vst1q_s16, vld1q_u16, 8, 8)
LOOP1(VreinterpretqS32U32N, int32_t, uint32_t, vreinterpretq_s32_u32, vst1q_s32, vld1q_u32, 4, 4)
LOOP1(VreinterpretqS32F32N, int32_t, float32_t, vreinterpretq_s32_f32, vst1q_s32, vld1q_f32, 4, 4)
LOOP1(VreinterpretqS64U64N, int64_t, uint64_t, vreinterpretq_s64_u64, vst1q_s64, vld1q_u64, 2, 2)
LOOP1(VreinterpretqS64F64N, int64_t, float64_t, vreinterpretq_s64_f64, vst1q_s64, vld1q_f64, 2, 2)
LOOP1(VreinterpretqS8U8N, int8_t, uint8_t, vreinterpretq_s8_u8, vst1q_s8, vld1q_u8, 16, 16)
LOOP1(VreinterpretqU16S16N, uint16_t, int16_t, vreinterpretq_u16_s16, vst1q_u16, vld1q_s16, 8, 8)
LOOP1(VreinterpretqU32S32N, uint32_t, int32_t, vreinterpretq_u32_s32, vst1q_u32, vld1q_s32, 4, 4)
LOOP1(VreinterpretqU32F32N, uint32_t, float32_t, vreinterpretq_u32_f32, vst1q_u32, vld1q_f32, 4, 4)
LOOP1(VreinterpretqU64S64N, uint64_t, int64_t, vreinterpretq_u64_s64, vst1q_u64, vld1q_s64, 2, 2)
LOOP1(VreinterpretqU64F64N, uint64_t, float64_t, vreinterpretq_u64_f64, vst1q_u64, vld1q_f64, 2, 2)
LOOP1(VreinterpretqU8S8N, uint8_t, int8_t, vreinterpretq_u8_s8, vst1q_u8, vld1q_s8, 16, 16)
LOOP1(Vrev16S8N, int8_t, int8_t, vrev16_s8, vst1_s8, vld1_s8, 8, 8)
LOOP1(Vrev16U8N, uint8_t, uint8_t, vrev16_u8, vst1_u8, vld1_u8, 8, 8)
LOOP1(Vrev16QS8N, int8_t, int8_t, vrev16q_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP1(Vrev16QU8N, uint8_t, uint8_t, vrev16q_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP1(Vrev32S8N, int8_t, int8_t, vrev32_s8, vst1_s8, vld1_s8, 8, 8)
LOOP1(Vrev32S16N, int16_t, int16_t, vrev32_s16, vst1_s16, vld1_s16, 4, 4)
LOOP1(Vrev32U8N, uint8_t, uint8_t, vrev32_u8, vst1_u8, vld1_u8, 8, 8)
LOOP1(Vrev32U16N, uint16_t, uint16_t, vrev32_u16, vst1_u16, vld1_u16, 4, 4)
LOOP1(Vrev32QS8N, int8_t, int8_t, vrev32q_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP1(Vrev32QS16N, int16_t, int16_t, vrev32q_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP1(Vrev32QU8N, uint8_t, uint8_t, vrev32q_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP1(Vrev32QU16N, uint16_t, uint16_t, vrev32q_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP1(Vrev64S8N, int8_t, int8_t, vrev64_s8, vst1_s8, vld1_s8, 8, 8)
LOOP1(Vrev64S16N, int16_t, int16_t, vrev64_s16, vst1_s16, vld1_s16, 4, 4)
LOOP1(Vrev64S32N, int32_t, int32_t, vrev64_s32, vst1_s32, vld1_s32, 2, 2)
LOOP1(Vrev64U8N, uint8_t, uint8_t, vrev64_u8, vst1_u8, vld1_u8, 8, 8)
LOOP1(Vrev64U16N, uint16_t, uint16_t, vrev64_u16, vst1_u16, vld1_u16, 4, 4)
LOOP1(Vrev64U32N, uint32_t, uint32_t, vrev64_u32, vst1_u32, vld1_u32, 2, 2)
LOOP1(Vrev64F32N, float32_t, float32_t, vrev64_f32, vst1_f32, vld1_f32, 2, 2)
LOOP1(Vrev64QS8N, int8_t, int8_t, vrev64q_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP1(Vrev64QS16N, int16_t, int16_t, vrev64q_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP1(Vrev64QS32N, int32_t, int32_t, vrev64q_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP1(Vrev64QU8N, uint8_t, uint8_t, vrev64q_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP1(Vrev64QU16N, uint16_t, uint16_t, vrev64q_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP1(Vrev64QU32N, uint32_t, uint32_t, vrev64q_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP1(Vrev64QF32N, float32_t, float32_t, vrev64q_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP1(VrndF32N, float32_t, float32_t, vrnd_f32, vst1_f32, vld1_f32, 2, 2)
LOOP1(VrndF64N, float64_t, float64_t, vrnd_f64, vst1_f64, vld1_f64, 1, 1)
LOOP1(Vrnd32XF32N, float32_t, float32_t, vrnd32x_f32, vst1_f32, vld1_f32, 2, 2)
LOOP1(Vrnd32XF64N, float64_t, float64_t, vrnd32x_f64, vst1_f64, vld1_f64, 1, 1)
LOOP1(Vrnd32XqF32N, float32_t, float32_t, vrnd32xq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP1(Vrnd32XqF64N, float64_t, float64_t, vrnd32xq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP1(Vrnd32ZF32N, float32_t, float32_t, vrnd32z_f32, vst1_f32, vld1_f32, 2, 2)
LOOP1(Vrnd32ZF64N, float64_t, float64_t, vrnd32z_f64, vst1_f64, vld1_f64, 1, 1)
LOOP1(Vrnd32ZqF32N, float32_t, float32_t, vrnd32zq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP1(Vrnd32ZqF64N, float64_t, float64_t, vrnd32zq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP1(Vrnd64XF32N, float32_t, float32_t, vrnd64x_f32, vst1_f32, vld1_f32, 2, 2)
LOOP1(Vrnd64XF64N, float64_t, float64_t, vrnd64x_f64, vst1_f64, vld1_f64, 1, 1)
LOOP1(Vrnd64XqF32N, float32_t, float32_t, vrnd64xq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP1(Vrnd64XqF64N, float64_t, float64_t, vrnd64xq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP1(Vrnd64ZF32N, float32_t, float32_t, vrnd64z_f32, vst1_f32, vld1_f32, 2, 2)
LOOP1(Vrnd64ZF64N, float64_t, float64_t, vrnd64z_f64, vst1_f64, vld1_f64, 1, 1)
LOOP1(Vrnd64ZqF32N, float32_t, float32_t, vrnd64zq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP1(Vrnd64ZqF64N, float64_t, float64_t, vrnd64zq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP1(VrndaF32N, float32_t, float32_t, vrnda_f32, vst1_f32, vld1_f32, 2, 2)
LOOP1(VrndaF64N, float64_t, float64_t, vrnda_f64, vst1_f64, vld1_f64, 1, 1)
LOOP1(VrndaqF32N, float32_t, float32_t, vrndaq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP1(VrndaqF64N, float64_t, float64_t, vrndaq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP1(VrndiF32N, float32_t, float32_t, vrndi_f32, vst1_f32, vld1_f32, 2, 2)
LOOP1(VrndiF64N, float64_t, float64_t, vrndi_f64, vst1_f64, vld1_f64, 1, 1)
LOOP1(VrndiqF32N, float32_t, float32_t, vrndiq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP1(VrndiqF64N, float64_t, float64_t, vrndiq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP1(VrndmF32N, float32_t, float32_t, vrndm_f32, vst1_f32, vld1_f32, 2, 2)
LOOP1(VrndmF64N, float64_t, float64_t, vrndm_f64, vst1_f64, vld1_f64, 1, 1)
LOOP1(VrndmqF32N, float32_t, float32_t, vrndmq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP1(VrndmqF64N, float64_t, float64_t, vrndmq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP1(VrndnF32N, float32_t, float32_t, vrndn_f32, vst1_f32, vld1_f32, 2, 2)
LOOP1(VrndnF64N, float64_t, float64_t, vrndn_f64, vst1_f64, vld1_f64, 1, 1)
LOOP1(VrndnqF32N, float32_t, float32_t, vrndnq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP1(VrndnqF64N, float64_t, float64_t, vrndnq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP1(VrndnsF32N, float32_t, float32_t, vrndns_f32, save, load, 1, 1)
LOOP1(VrndpF32N, float32_t, float32_t, vrndp_f32, vst1_f32, vld1_f32, 2, 2)
LOOP1(VrndpF64N, float64_t, float64_t, vrndp_f64, vst1_f64, vld1_f64, 1, 1)
LOOP1(VrndpqF32N, float32_t, float32_t, vrndpq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP1(VrndpqF64N, float64_t, float64_t, vrndpq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP1(VrndqF32N, float32_t, float32_t, vrndq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP1(VrndqF64N, float64_t, float64_t, vrndq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP1(VrndxF32N, float32_t, float32_t, vrndx_f32, vst1_f32, vld1_f32, 2, 2)
LOOP1(VrndxF64N, float64_t, float64_t, vrndx_f64, vst1_f64, vld1_f64, 1, 1)
LOOP1(VrndxqF32N, float32_t, float32_t, vrndxq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP1(VrndxqF64N, float64_t, float64_t, vrndxq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP1(VrsqrteU32N, uint32_t, uint32_t, vrsqrte_u32, vst1_u32, vld1_u32, 2, 2)
LOOP1(VrsqrteF32N, float32_t, float32_t, vrsqrte_f32, vst1_f32, vld1_f32, 2, 2)
LOOP1(VrsqrteF64N, float64_t, float64_t, vrsqrte_f64, vst1_f64, vld1_f64, 1, 1)
LOOP1(VrsqrtedF64N, float64_t, float64_t, vrsqrted_f64, save, load, 1, 1)
LOOP1(VrsqrteqU32N, uint32_t, uint32_t, vrsqrteq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP1(VrsqrteqF32N, float32_t, float32_t, vrsqrteq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP1(VrsqrteqF64N, float64_t, float64_t, vrsqrteq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP1(VrsqrtesF32N, float32_t, float32_t, vrsqrtes_f32, save, load, 1, 1)
LOOP1(Vsha1HU32N, uint32_t, uint32_t, vsha1h_u32, save, load, 1, 1)
LOOP1(VsqrtF32N, float32_t, float32_t, vsqrt_f32, vst1_f32, vld1_f32, 2, 2)
LOOP1(VsqrtF64N, float64_t, float64_t, vsqrt_f64, vst1_f64, vld1_f64, 1, 1)
LOOP1(VsqrtqF32N, float32_t, float32_t, vsqrtq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP1(VsqrtqF64N, float64_t, float64_t, vsqrtq_f64, vst1q_f64, vld1q_f64, 2, 2)

#define LOOP2(name, rtype, itype, f, set, load, rstep, istep) \
    void name(rtype *r, itype *v1, itype *v2, int32_t n)      \
    {                                                         \
        while (n >= rstep)                                    \
        {                                                     \
            set(r, f(load(v1), load(v2)));                    \
            r += rstep;                                       \
            n -= rstep;                                       \
            v1 += istep;                                      \
            v2 += istep;                                      \
        }                                                     \
    }

LOOP2(VabdS8N, int8_t, int8_t, vabd_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(VabdS16N, int16_t, int16_t, vabd_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(VabdS32N, int32_t, int32_t, vabd_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(VabdU8N, uint8_t, uint8_t, vabd_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(VabdU16N, uint16_t, uint16_t, vabd_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(VabdU32N, uint32_t, uint32_t, vabd_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(VabdF32N, float32_t, float32_t, vabd_f32, vst1_f32, vld1_f32, 2, 2)
LOOP2(VabdF64N, float64_t, float64_t, vabd_f64, vst1_f64, vld1_f64, 1, 1)
LOOP2(VabddF64N, float64_t, float64_t, vabdd_f64, save, load, 1, 1)
LOOP2(VabdqS8N, int8_t, int8_t, vabdq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP2(VabdqS16N, int16_t, int16_t, vabdq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(VabdqS32N, int32_t, int32_t, vabdq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(VabdqU8N, uint8_t, uint8_t, vabdq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(VabdqU16N, uint16_t, uint16_t, vabdq_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(VabdqU32N, uint32_t, uint32_t, vabdq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(VabdqF32N, float32_t, float32_t, vabdq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP2(VabdqF64N, float64_t, float64_t, vabdq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP2(VabdsF32N, float32_t, float32_t, vabds_f32, save, load, 1, 1)
LOOP2(VaddS8N, int8_t, int8_t, vadd_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(VaddS16N, int16_t, int16_t, vadd_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(VaddS32N, int32_t, int32_t, vadd_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(VaddS64N, int64_t, int64_t, vadd_s64, vst1_s64, vld1_s64, 1, 1)
LOOP2(VaddU8N, uint8_t, uint8_t, vadd_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(VaddU16N, uint16_t, uint16_t, vadd_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(VaddU32N, uint32_t, uint32_t, vadd_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(VaddU64N, uint64_t, uint64_t, vadd_u64, vst1_u64, vld1_u64, 1, 1)
LOOP2(VaddF32N, float32_t, float32_t, vadd_f32, vst1_f32, vld1_f32, 2, 2)
LOOP2(VaddF64N, float64_t, float64_t, vadd_f64, vst1_f64, vld1_f64, 1, 1)
LOOP2(VadddS64N, int64_t, int64_t, vaddd_s64, save, load, 1, 1)
LOOP2(VadddU64N, uint64_t, uint64_t, vaddd_u64, save, load, 1, 1)
LOOP2(VaddqS8N, int8_t, int8_t, vaddq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP2(VaddqS16N, int16_t, int16_t, vaddq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(VaddqS32N, int32_t, int32_t, vaddq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(VaddqS64N, int64_t, int64_t, vaddq_s64, vst1q_s64, vld1q_s64, 2, 2)
LOOP2(VaddqU8N, uint8_t, uint8_t, vaddq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(VaddqU16N, uint16_t, uint16_t, vaddq_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(VaddqU32N, uint32_t, uint32_t, vaddq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(VaddqU64N, uint64_t, uint64_t, vaddq_u64, vst1q_u64, vld1q_u64, 2, 2)
LOOP2(VaddqF32N, float32_t, float32_t, vaddq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP2(VaddqF64N, float64_t, float64_t, vaddq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP2(VaesdqU8N, uint8_t, uint8_t, vaesdq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(VaeseqU8N, uint8_t, uint8_t, vaeseq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(VandS8N, int8_t, int8_t, vand_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(VandS16N, int16_t, int16_t, vand_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(VandS32N, int32_t, int32_t, vand_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(VandS64N, int64_t, int64_t, vand_s64, vst1_s64, vld1_s64, 1, 1)
LOOP2(VandU8N, uint8_t, uint8_t, vand_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(VandU16N, uint16_t, uint16_t, vand_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(VandU32N, uint32_t, uint32_t, vand_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(VandU64N, uint64_t, uint64_t, vand_u64, vst1_u64, vld1_u64, 1, 1)
LOOP2(VandqS8N, int8_t, int8_t, vandq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP2(VandqS16N, int16_t, int16_t, vandq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(VandqS32N, int32_t, int32_t, vandq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(VandqS64N, int64_t, int64_t, vandq_s64, vst1q_s64, vld1q_s64, 2, 2)
LOOP2(VandqU8N, uint8_t, uint8_t, vandq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(VandqU16N, uint16_t, uint16_t, vandq_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(VandqU32N, uint32_t, uint32_t, vandq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(VandqU64N, uint64_t, uint64_t, vandq_u64, vst1q_u64, vld1q_u64, 2, 2)
LOOP2(VbicS8N, int8_t, int8_t, vbic_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(VbicS16N, int16_t, int16_t, vbic_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(VbicS32N, int32_t, int32_t, vbic_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(VbicS64N, int64_t, int64_t, vbic_s64, vst1_s64, vld1_s64, 1, 1)
LOOP2(VbicU8N, uint8_t, uint8_t, vbic_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(VbicU16N, uint16_t, uint16_t, vbic_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(VbicU32N, uint32_t, uint32_t, vbic_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(VbicU64N, uint64_t, uint64_t, vbic_u64, vst1_u64, vld1_u64, 1, 1)
LOOP2(VbicqS8N, int8_t, int8_t, vbicq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP2(VbicqS16N, int16_t, int16_t, vbicq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(VbicqS32N, int32_t, int32_t, vbicq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(VbicqS64N, int64_t, int64_t, vbicq_s64, vst1q_s64, vld1q_s64, 2, 2)
LOOP2(VbicqU8N, uint8_t, uint8_t, vbicq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(VbicqU16N, uint16_t, uint16_t, vbicq_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(VbicqU32N, uint32_t, uint32_t, vbicq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(VbicqU64N, uint64_t, uint64_t, vbicq_u64, vst1q_u64, vld1q_u64, 2, 2)
LOOP2(VcaddRot270F32N, float32_t, float32_t, vcadd_rot270_f32, vst1_f32, vld1_f32, 2, 2)
LOOP2(VcaddRot90F32N, float32_t, float32_t, vcadd_rot90_f32, vst1_f32, vld1_f32, 2, 2)
LOOP2(VcaddqRot270F32N, float32_t, float32_t, vcaddq_rot270_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP2(VcaddqRot270F64N, float64_t, float64_t, vcaddq_rot270_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP2(VcaddqRot90F32N, float32_t, float32_t, vcaddq_rot90_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP2(VcaddqRot90F64N, float64_t, float64_t, vcaddq_rot90_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP2(VcageF32N, uint32_t, float32_t, vcage_f32, vst1_u32, vld1_f32, 2, 2)
LOOP2(VcageF64N, uint64_t, float64_t, vcage_f64, vst1_u64, vld1_f64, 1, 1)
LOOP2(VcagedF64N, uint64_t, float64_t, vcaged_f64, save, load, 1, 1)
LOOP2(VcageqF32N, uint32_t, float32_t, vcageq_f32, vst1q_u32, vld1q_f32, 4, 4)
LOOP2(VcageqF64N, uint64_t, float64_t, vcageq_f64, vst1q_u64, vld1q_f64, 2, 2)
LOOP2(VcagesF32N, uint32_t, float32_t, vcages_f32, save, load, 1, 1)
LOOP2(VcagtF32N, uint32_t, float32_t, vcagt_f32, vst1_u32, vld1_f32, 2, 2)
LOOP2(VcagtF64N, uint64_t, float64_t, vcagt_f64, vst1_u64, vld1_f64, 1, 1)
LOOP2(VcagtdF64N, uint64_t, float64_t, vcagtd_f64, save, load, 1, 1)
LOOP2(VcagtqF32N, uint32_t, float32_t, vcagtq_f32, vst1q_u32, vld1q_f32, 4, 4)
LOOP2(VcagtqF64N, uint64_t, float64_t, vcagtq_f64, vst1q_u64, vld1q_f64, 2, 2)
LOOP2(VcagtsF32N, uint32_t, float32_t, vcagts_f32, save, load, 1, 1)
LOOP2(VcaleF32N, uint32_t, float32_t, vcale_f32, vst1_u32, vld1_f32, 2, 2)
LOOP2(VcaleF64N, uint64_t, float64_t, vcale_f64, vst1_u64, vld1_f64, 1, 1)
LOOP2(VcaledF64N, uint64_t, float64_t, vcaled_f64, save, load, 1, 1)
LOOP2(VcaleqF32N, uint32_t, float32_t, vcaleq_f32, vst1q_u32, vld1q_f32, 4, 4)
LOOP2(VcaleqF64N, uint64_t, float64_t, vcaleq_f64, vst1q_u64, vld1q_f64, 2, 2)
LOOP2(VcalesF32N, uint32_t, float32_t, vcales_f32, save, load, 1, 1)
LOOP2(VcaltF32N, uint32_t, float32_t, vcalt_f32, vst1_u32, vld1_f32, 2, 2)
LOOP2(VcaltF64N, uint64_t, float64_t, vcalt_f64, vst1_u64, vld1_f64, 1, 1)
LOOP2(VcaltdF64N, uint64_t, float64_t, vcaltd_f64, save, load, 1, 1)
LOOP2(VcaltqF32N, uint32_t, float32_t, vcaltq_f32, vst1q_u32, vld1q_f32, 4, 4)
LOOP2(VcaltqF64N, uint64_t, float64_t, vcaltq_f64, vst1q_u64, vld1q_f64, 2, 2)
LOOP2(VcaltsF32N, uint32_t, float32_t, vcalts_f32, save, load, 1, 1)
LOOP2(VceqS8N, uint8_t, int8_t, vceq_s8, vst1_u8, vld1_s8, 8, 8)
LOOP2(VceqS16N, uint16_t, int16_t, vceq_s16, vst1_u16, vld1_s16, 4, 4)
LOOP2(VceqS32N, uint32_t, int32_t, vceq_s32, vst1_u32, vld1_s32, 2, 2)
LOOP2(VceqS64N, uint64_t, int64_t, vceq_s64, vst1_u64, vld1_s64, 1, 1)
LOOP2(VceqU8N, uint8_t, uint8_t, vceq_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(VceqU16N, uint16_t, uint16_t, vceq_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(VceqU32N, uint32_t, uint32_t, vceq_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(VceqU64N, uint64_t, uint64_t, vceq_u64, vst1_u64, vld1_u64, 1, 1)
LOOP2(VceqF32N, uint32_t, float32_t, vceq_f32, vst1_u32, vld1_f32, 2, 2)
LOOP2(VceqF64N, uint64_t, float64_t, vceq_f64, vst1_u64, vld1_f64, 1, 1)
LOOP2(VceqdS64N, uint64_t, int64_t, vceqd_s64, save, load, 1, 1)
LOOP2(VceqdU64N, uint64_t, uint64_t, vceqd_u64, save, load, 1, 1)
LOOP2(VceqdF64N, uint64_t, float64_t, vceqd_f64, save, load, 1, 1)
LOOP2(VceqqS8N, uint8_t, int8_t, vceqq_s8, vst1q_u8, vld1q_s8, 16, 16)
LOOP2(VceqqS16N, uint16_t, int16_t, vceqq_s16, vst1q_u16, vld1q_s16, 8, 8)
LOOP2(VceqqS32N, uint32_t, int32_t, vceqq_s32, vst1q_u32, vld1q_s32, 4, 4)
LOOP2(VceqqS64N, uint64_t, int64_t, vceqq_s64, vst1q_u64, vld1q_s64, 2, 2)
LOOP2(VceqqU8N, uint8_t, uint8_t, vceqq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(VceqqU16N, uint16_t, uint16_t, vceqq_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(VceqqU32N, uint32_t, uint32_t, vceqq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(VceqqU64N, uint64_t, uint64_t, vceqq_u64, vst1q_u64, vld1q_u64, 2, 2)
LOOP2(VceqqF32N, uint32_t, float32_t, vceqq_f32, vst1q_u32, vld1q_f32, 4, 4)
LOOP2(VceqqF64N, uint64_t, float64_t, vceqq_f64, vst1q_u64, vld1q_f64, 2, 2)
LOOP2(VceqsF32N, uint32_t, float32_t, vceqs_f32, save, load, 1, 1)
LOOP2(VcgeS8N, uint8_t, int8_t, vcge_s8, vst1_u8, vld1_s8, 8, 8)
LOOP2(VcgeS16N, uint16_t, int16_t, vcge_s16, vst1_u16, vld1_s16, 4, 4)
LOOP2(VcgeS32N, uint32_t, int32_t, vcge_s32, vst1_u32, vld1_s32, 2, 2)
LOOP2(VcgeS64N, uint64_t, int64_t, vcge_s64, vst1_u64, vld1_s64, 1, 1)
LOOP2(VcgeU8N, uint8_t, uint8_t, vcge_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(VcgeU16N, uint16_t, uint16_t, vcge_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(VcgeU32N, uint32_t, uint32_t, vcge_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(VcgeU64N, uint64_t, uint64_t, vcge_u64, vst1_u64, vld1_u64, 1, 1)
LOOP2(VcgeF32N, uint32_t, float32_t, vcge_f32, vst1_u32, vld1_f32, 2, 2)
LOOP2(VcgeF64N, uint64_t, float64_t, vcge_f64, vst1_u64, vld1_f64, 1, 1)
LOOP2(VcgedS64N, uint64_t, int64_t, vcged_s64, save, load, 1, 1)
LOOP2(VcgedU64N, uint64_t, uint64_t, vcged_u64, save, load, 1, 1)
LOOP2(VcgedF64N, uint64_t, float64_t, vcged_f64, save, load, 1, 1)
LOOP2(VcgeqS8N, uint8_t, int8_t, vcgeq_s8, vst1q_u8, vld1q_s8, 16, 16)
LOOP2(VcgeqS16N, uint16_t, int16_t, vcgeq_s16, vst1q_u16, vld1q_s16, 8, 8)
LOOP2(VcgeqS32N, uint32_t, int32_t, vcgeq_s32, vst1q_u32, vld1q_s32, 4, 4)
LOOP2(VcgeqS64N, uint64_t, int64_t, vcgeq_s64, vst1q_u64, vld1q_s64, 2, 2)
LOOP2(VcgeqU8N, uint8_t, uint8_t, vcgeq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(VcgeqU16N, uint16_t, uint16_t, vcgeq_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(VcgeqU32N, uint32_t, uint32_t, vcgeq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(VcgeqU64N, uint64_t, uint64_t, vcgeq_u64, vst1q_u64, vld1q_u64, 2, 2)
LOOP2(VcgeqF32N, uint32_t, float32_t, vcgeq_f32, vst1q_u32, vld1q_f32, 4, 4)
LOOP2(VcgeqF64N, uint64_t, float64_t, vcgeq_f64, vst1q_u64, vld1q_f64, 2, 2)
LOOP2(VcgesF32N, uint32_t, float32_t, vcges_f32, save, load, 1, 1)
LOOP2(VcgtS8N, uint8_t, int8_t, vcgt_s8, vst1_u8, vld1_s8, 8, 8)
LOOP2(VcgtS16N, uint16_t, int16_t, vcgt_s16, vst1_u16, vld1_s16, 4, 4)
LOOP2(VcgtS32N, uint32_t, int32_t, vcgt_s32, vst1_u32, vld1_s32, 2, 2)
LOOP2(VcgtS64N, uint64_t, int64_t, vcgt_s64, vst1_u64, vld1_s64, 1, 1)
LOOP2(VcgtU8N, uint8_t, uint8_t, vcgt_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(VcgtU16N, uint16_t, uint16_t, vcgt_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(VcgtU32N, uint32_t, uint32_t, vcgt_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(VcgtU64N, uint64_t, uint64_t, vcgt_u64, vst1_u64, vld1_u64, 1, 1)
LOOP2(VcgtF32N, uint32_t, float32_t, vcgt_f32, vst1_u32, vld1_f32, 2, 2)
LOOP2(VcgtF64N, uint64_t, float64_t, vcgt_f64, vst1_u64, vld1_f64, 1, 1)
LOOP2(VcgtdS64N, uint64_t, int64_t, vcgtd_s64, save, load, 1, 1)
LOOP2(VcgtdU64N, uint64_t, uint64_t, vcgtd_u64, save, load, 1, 1)
LOOP2(VcgtdF64N, uint64_t, float64_t, vcgtd_f64, save, load, 1, 1)
LOOP2(VcgtqS8N, uint8_t, int8_t, vcgtq_s8, vst1q_u8, vld1q_s8, 16, 16)
LOOP2(VcgtqS16N, uint16_t, int16_t, vcgtq_s16, vst1q_u16, vld1q_s16, 8, 8)
LOOP2(VcgtqS32N, uint32_t, int32_t, vcgtq_s32, vst1q_u32, vld1q_s32, 4, 4)
LOOP2(VcgtqS64N, uint64_t, int64_t, vcgtq_s64, vst1q_u64, vld1q_s64, 2, 2)
LOOP2(VcgtqU8N, uint8_t, uint8_t, vcgtq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(VcgtqU16N, uint16_t, uint16_t, vcgtq_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(VcgtqU32N, uint32_t, uint32_t, vcgtq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(VcgtqU64N, uint64_t, uint64_t, vcgtq_u64, vst1q_u64, vld1q_u64, 2, 2)
LOOP2(VcgtqF32N, uint32_t, float32_t, vcgtq_f32, vst1q_u32, vld1q_f32, 4, 4)
LOOP2(VcgtqF64N, uint64_t, float64_t, vcgtq_f64, vst1q_u64, vld1q_f64, 2, 2)
LOOP2(VcgtsF32N, uint32_t, float32_t, vcgts_f32, save, load, 1, 1)
LOOP2(VcleS8N, uint8_t, int8_t, vcle_s8, vst1_u8, vld1_s8, 8, 8)
LOOP2(VcleS16N, uint16_t, int16_t, vcle_s16, vst1_u16, vld1_s16, 4, 4)
LOOP2(VcleS32N, uint32_t, int32_t, vcle_s32, vst1_u32, vld1_s32, 2, 2)
LOOP2(VcleS64N, uint64_t, int64_t, vcle_s64, vst1_u64, vld1_s64, 1, 1)
LOOP2(VcleU8N, uint8_t, uint8_t, vcle_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(VcleU16N, uint16_t, uint16_t, vcle_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(VcleU32N, uint32_t, uint32_t, vcle_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(VcleU64N, uint64_t, uint64_t, vcle_u64, vst1_u64, vld1_u64, 1, 1)
LOOP2(VcleF32N, uint32_t, float32_t, vcle_f32, vst1_u32, vld1_f32, 2, 2)
LOOP2(VcleF64N, uint64_t, float64_t, vcle_f64, vst1_u64, vld1_f64, 1, 1)
LOOP2(VcledS64N, uint64_t, int64_t, vcled_s64, save, load, 1, 1)
LOOP2(VcledU64N, uint64_t, uint64_t, vcled_u64, save, load, 1, 1)
LOOP2(VcledF64N, uint64_t, float64_t, vcled_f64, save, load, 1, 1)
LOOP2(VcleqS8N, uint8_t, int8_t, vcleq_s8, vst1q_u8, vld1q_s8, 16, 16)
LOOP2(VcleqS16N, uint16_t, int16_t, vcleq_s16, vst1q_u16, vld1q_s16, 8, 8)
LOOP2(VcleqS32N, uint32_t, int32_t, vcleq_s32, vst1q_u32, vld1q_s32, 4, 4)
LOOP2(VcleqS64N, uint64_t, int64_t, vcleq_s64, vst1q_u64, vld1q_s64, 2, 2)
LOOP2(VcleqU8N, uint8_t, uint8_t, vcleq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(VcleqU16N, uint16_t, uint16_t, vcleq_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(VcleqU32N, uint32_t, uint32_t, vcleq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(VcleqU64N, uint64_t, uint64_t, vcleq_u64, vst1q_u64, vld1q_u64, 2, 2)
LOOP2(VcleqF32N, uint32_t, float32_t, vcleq_f32, vst1q_u32, vld1q_f32, 4, 4)
LOOP2(VcleqF64N, uint64_t, float64_t, vcleq_f64, vst1q_u64, vld1q_f64, 2, 2)
LOOP2(VclesF32N, uint32_t, float32_t, vcles_f32, save, load, 1, 1)
LOOP2(VcltS8N, uint8_t, int8_t, vclt_s8, vst1_u8, vld1_s8, 8, 8)
LOOP2(VcltS16N, uint16_t, int16_t, vclt_s16, vst1_u16, vld1_s16, 4, 4)
LOOP2(VcltS32N, uint32_t, int32_t, vclt_s32, vst1_u32, vld1_s32, 2, 2)
LOOP2(VcltS64N, uint64_t, int64_t, vclt_s64, vst1_u64, vld1_s64, 1, 1)
LOOP2(VcltU8N, uint8_t, uint8_t, vclt_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(VcltU16N, uint16_t, uint16_t, vclt_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(VcltU32N, uint32_t, uint32_t, vclt_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(VcltU64N, uint64_t, uint64_t, vclt_u64, vst1_u64, vld1_u64, 1, 1)
LOOP2(VcltF32N, uint32_t, float32_t, vclt_f32, vst1_u32, vld1_f32, 2, 2)
LOOP2(VcltF64N, uint64_t, float64_t, vclt_f64, vst1_u64, vld1_f64, 1, 1)
LOOP2(VcltdS64N, uint64_t, int64_t, vcltd_s64, save, load, 1, 1)
LOOP2(VcltdU64N, uint64_t, uint64_t, vcltd_u64, save, load, 1, 1)
LOOP2(VcltdF64N, uint64_t, float64_t, vcltd_f64, save, load, 1, 1)
LOOP2(VcltqS8N, uint8_t, int8_t, vcltq_s8, vst1q_u8, vld1q_s8, 16, 16)
LOOP2(VcltqS16N, uint16_t, int16_t, vcltq_s16, vst1q_u16, vld1q_s16, 8, 8)
LOOP2(VcltqS32N, uint32_t, int32_t, vcltq_s32, vst1q_u32, vld1q_s32, 4, 4)
LOOP2(VcltqS64N, uint64_t, int64_t, vcltq_s64, vst1q_u64, vld1q_s64, 2, 2)
LOOP2(VcltqU8N, uint8_t, uint8_t, vcltq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(VcltqU16N, uint16_t, uint16_t, vcltq_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(VcltqU32N, uint32_t, uint32_t, vcltq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(VcltqU64N, uint64_t, uint64_t, vcltq_u64, vst1q_u64, vld1q_u64, 2, 2)
LOOP2(VcltqF32N, uint32_t, float32_t, vcltq_f32, vst1q_u32, vld1q_f32, 4, 4)
LOOP2(VcltqF64N, uint64_t, float64_t, vcltq_f64, vst1q_u64, vld1q_f64, 2, 2)
LOOP2(VcltsF32N, uint32_t, float32_t, vclts_f32, save, load, 1, 1)
LOOP2(VcombineS8N, int8_t, int8_t, vcombine_s8, vst1q_s8, vld1_s8, 16, 8)
LOOP2(VcombineS16N, int16_t, int16_t, vcombine_s16, vst1q_s16, vld1_s16, 8, 4)
LOOP2(VcombineS32N, int32_t, int32_t, vcombine_s32, vst1q_s32, vld1_s32, 4, 2)
LOOP2(VcombineS64N, int64_t, int64_t, vcombine_s64, vst1q_s64, vld1_s64, 2, 1)
LOOP2(VcombineU8N, uint8_t, uint8_t, vcombine_u8, vst1q_u8, vld1_u8, 16, 8)
LOOP2(VcombineU16N, uint16_t, uint16_t, vcombine_u16, vst1q_u16, vld1_u16, 8, 4)
LOOP2(VcombineU32N, uint32_t, uint32_t, vcombine_u32, vst1q_u32, vld1_u32, 4, 2)
LOOP2(VcombineU64N, uint64_t, uint64_t, vcombine_u64, vst1q_u64, vld1_u64, 2, 1)
LOOP2(VcombineF32N, float32_t, float32_t, vcombine_f32, vst1q_f32, vld1_f32, 4, 2)
LOOP2(VcombineF64N, float64_t, float64_t, vcombine_f64, vst1q_f64, vld1_f64, 2, 1)
LOOP2(VdivF32N, float32_t, float32_t, vdiv_f32, vst1_f32, vld1_f32, 2, 2)
LOOP2(VdivF64N, float64_t, float64_t, vdiv_f64, vst1_f64, vld1_f64, 1, 1)
LOOP2(VdivqF32N, float32_t, float32_t, vdivq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP2(VdivqF64N, float64_t, float64_t, vdivq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP2(VeorS8N, int8_t, int8_t, veor_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(VeorS16N, int16_t, int16_t, veor_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(VeorS32N, int32_t, int32_t, veor_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(VeorS64N, int64_t, int64_t, veor_s64, vst1_s64, vld1_s64, 1, 1)
LOOP2(VeorU8N, uint8_t, uint8_t, veor_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(VeorU16N, uint16_t, uint16_t, veor_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(VeorU32N, uint32_t, uint32_t, veor_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(VeorU64N, uint64_t, uint64_t, veor_u64, vst1_u64, vld1_u64, 1, 1)
LOOP2(VeorqS8N, int8_t, int8_t, veorq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP2(VeorqS16N, int16_t, int16_t, veorq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(VeorqS32N, int32_t, int32_t, veorq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(VeorqS64N, int64_t, int64_t, veorq_s64, vst1q_s64, vld1q_s64, 2, 2)
LOOP2(VeorqU8N, uint8_t, uint8_t, veorq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(VeorqU16N, uint16_t, uint16_t, veorq_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(VeorqU32N, uint32_t, uint32_t, veorq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(VeorqU64N, uint64_t, uint64_t, veorq_u64, vst1q_u64, vld1q_u64, 2, 2)
LOOP2(VhaddS8N, int8_t, int8_t, vhadd_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(VhaddS16N, int16_t, int16_t, vhadd_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(VhaddS32N, int32_t, int32_t, vhadd_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(VhaddU8N, uint8_t, uint8_t, vhadd_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(VhaddU16N, uint16_t, uint16_t, vhadd_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(VhaddU32N, uint32_t, uint32_t, vhadd_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(VhaddqS8N, int8_t, int8_t, vhaddq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP2(VhaddqS16N, int16_t, int16_t, vhaddq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(VhaddqS32N, int32_t, int32_t, vhaddq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(VhaddqU8N, uint8_t, uint8_t, vhaddq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(VhaddqU16N, uint16_t, uint16_t, vhaddq_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(VhaddqU32N, uint32_t, uint32_t, vhaddq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(VhsubS8N, int8_t, int8_t, vhsub_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(VhsubS16N, int16_t, int16_t, vhsub_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(VhsubS32N, int32_t, int32_t, vhsub_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(VhsubU8N, uint8_t, uint8_t, vhsub_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(VhsubU16N, uint16_t, uint16_t, vhsub_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(VhsubU32N, uint32_t, uint32_t, vhsub_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(VhsubqS8N, int8_t, int8_t, vhsubq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP2(VhsubqS16N, int16_t, int16_t, vhsubq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(VhsubqS32N, int32_t, int32_t, vhsubq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(VhsubqU8N, uint8_t, uint8_t, vhsubq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(VhsubqU16N, uint16_t, uint16_t, vhsubq_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(VhsubqU32N, uint32_t, uint32_t, vhsubq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(VmaxS8N, int8_t, int8_t, vmax_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(VmaxS16N, int16_t, int16_t, vmax_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(VmaxS32N, int32_t, int32_t, vmax_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(VmaxU8N, uint8_t, uint8_t, vmax_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(VmaxU16N, uint16_t, uint16_t, vmax_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(VmaxU32N, uint32_t, uint32_t, vmax_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(VmaxF32N, float32_t, float32_t, vmax_f32, vst1_f32, vld1_f32, 2, 2)
LOOP2(VmaxF64N, float64_t, float64_t, vmax_f64, vst1_f64, vld1_f64, 1, 1)
LOOP2(VmaxnmF32N, float32_t, float32_t, vmaxnm_f32, vst1_f32, vld1_f32, 2, 2)
LOOP2(VmaxnmF64N, float64_t, float64_t, vmaxnm_f64, vst1_f64, vld1_f64, 1, 1)
LOOP2(VmaxnmqF32N, float32_t, float32_t, vmaxnmq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP2(VmaxnmqF64N, float64_t, float64_t, vmaxnmq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP2(VmaxqS8N, int8_t, int8_t, vmaxq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP2(VmaxqS16N, int16_t, int16_t, vmaxq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(VmaxqS32N, int32_t, int32_t, vmaxq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(VmaxqU8N, uint8_t, uint8_t, vmaxq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(VmaxqU16N, uint16_t, uint16_t, vmaxq_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(VmaxqU32N, uint32_t, uint32_t, vmaxq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(VmaxqF32N, float32_t, float32_t, vmaxq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP2(VmaxqF64N, float64_t, float64_t, vmaxq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP2(VminS8N, int8_t, int8_t, vmin_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(VminS16N, int16_t, int16_t, vmin_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(VminS32N, int32_t, int32_t, vmin_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(VminU8N, uint8_t, uint8_t, vmin_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(VminU16N, uint16_t, uint16_t, vmin_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(VminU32N, uint32_t, uint32_t, vmin_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(VminF32N, float32_t, float32_t, vmin_f32, vst1_f32, vld1_f32, 2, 2)
LOOP2(VminF64N, float64_t, float64_t, vmin_f64, vst1_f64, vld1_f64, 1, 1)
LOOP2(VminnmF32N, float32_t, float32_t, vminnm_f32, vst1_f32, vld1_f32, 2, 2)
LOOP2(VminnmF64N, float64_t, float64_t, vminnm_f64, vst1_f64, vld1_f64, 1, 1)
LOOP2(VminnmqF32N, float32_t, float32_t, vminnmq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP2(VminnmqF64N, float64_t, float64_t, vminnmq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP2(VminqS8N, int8_t, int8_t, vminq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP2(VminqS16N, int16_t, int16_t, vminq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(VminqS32N, int32_t, int32_t, vminq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(VminqU8N, uint8_t, uint8_t, vminq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(VminqU16N, uint16_t, uint16_t, vminq_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(VminqU32N, uint32_t, uint32_t, vminq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(VminqF32N, float32_t, float32_t, vminq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP2(VminqF64N, float64_t, float64_t, vminq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP2(VmulS8N, int8_t, int8_t, vmul_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(VmulS16N, int16_t, int16_t, vmul_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(VmulS32N, int32_t, int32_t, vmul_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(VmulU8N, uint8_t, uint8_t, vmul_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(VmulU16N, uint16_t, uint16_t, vmul_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(VmulU32N, uint32_t, uint32_t, vmul_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(VmulF32N, float32_t, float32_t, vmul_f32, vst1_f32, vld1_f32, 2, 2)
LOOP2(VmulF64N, float64_t, float64_t, vmul_f64, vst1_f64, vld1_f64, 1, 1)
LOOP2(VmulqS8N, int8_t, int8_t, vmulq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP2(VmulqS16N, int16_t, int16_t, vmulq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(VmulqS32N, int32_t, int32_t, vmulq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(VmulqU8N, uint8_t, uint8_t, vmulq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(VmulqU16N, uint16_t, uint16_t, vmulq_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(VmulqU32N, uint32_t, uint32_t, vmulq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(VmulqF32N, float32_t, float32_t, vmulq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP2(VmulqF64N, float64_t, float64_t, vmulq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP2(VmulxF32N, float32_t, float32_t, vmulx_f32, vst1_f32, vld1_f32, 2, 2)
LOOP2(VmulxF64N, float64_t, float64_t, vmulx_f64, vst1_f64, vld1_f64, 1, 1)
LOOP2(VmulxdF64N, float64_t, float64_t, vmulxd_f64, save, load, 1, 1)
LOOP2(VmulxqF32N, float32_t, float32_t, vmulxq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP2(VmulxqF64N, float64_t, float64_t, vmulxq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP2(VmulxsF32N, float32_t, float32_t, vmulxs_f32, save, load, 1, 1)
LOOP2(VornS8N, int8_t, int8_t, vorn_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(VornS16N, int16_t, int16_t, vorn_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(VornS32N, int32_t, int32_t, vorn_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(VornS64N, int64_t, int64_t, vorn_s64, vst1_s64, vld1_s64, 1, 1)
LOOP2(VornU8N, uint8_t, uint8_t, vorn_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(VornU16N, uint16_t, uint16_t, vorn_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(VornU32N, uint32_t, uint32_t, vorn_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(VornU64N, uint64_t, uint64_t, vorn_u64, vst1_u64, vld1_u64, 1, 1)
LOOP2(VornqS8N, int8_t, int8_t, vornq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP2(VornqS16N, int16_t, int16_t, vornq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(VornqS32N, int32_t, int32_t, vornq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(VornqS64N, int64_t, int64_t, vornq_s64, vst1q_s64, vld1q_s64, 2, 2)
LOOP2(VornqU8N, uint8_t, uint8_t, vornq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(VornqU16N, uint16_t, uint16_t, vornq_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(VornqU32N, uint32_t, uint32_t, vornq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(VornqU64N, uint64_t, uint64_t, vornq_u64, vst1q_u64, vld1q_u64, 2, 2)
LOOP2(VorrS8N, int8_t, int8_t, vorr_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(VorrS16N, int16_t, int16_t, vorr_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(VorrS32N, int32_t, int32_t, vorr_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(VorrS64N, int64_t, int64_t, vorr_s64, vst1_s64, vld1_s64, 1, 1)
LOOP2(VorrU8N, uint8_t, uint8_t, vorr_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(VorrU16N, uint16_t, uint16_t, vorr_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(VorrU32N, uint32_t, uint32_t, vorr_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(VorrU64N, uint64_t, uint64_t, vorr_u64, vst1_u64, vld1_u64, 1, 1)
LOOP2(VorrqS8N, int8_t, int8_t, vorrq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP2(VorrqS16N, int16_t, int16_t, vorrq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(VorrqS32N, int32_t, int32_t, vorrq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(VorrqS64N, int64_t, int64_t, vorrq_s64, vst1q_s64, vld1q_s64, 2, 2)
LOOP2(VorrqU8N, uint8_t, uint8_t, vorrq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(VorrqU16N, uint16_t, uint16_t, vorrq_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(VorrqU32N, uint32_t, uint32_t, vorrq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(VorrqU64N, uint64_t, uint64_t, vorrq_u64, vst1q_u64, vld1q_u64, 2, 2)
LOOP2(VpaddS8N, int8_t, int8_t, vpadd_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(VpaddS16N, int16_t, int16_t, vpadd_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(VpaddS32N, int32_t, int32_t, vpadd_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(VpaddU8N, uint8_t, uint8_t, vpadd_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(VpaddU16N, uint16_t, uint16_t, vpadd_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(VpaddU32N, uint32_t, uint32_t, vpadd_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(VpaddF32N, float32_t, float32_t, vpadd_f32, vst1_f32, vld1_f32, 2, 2)
LOOP2(VpaddqS8N, int8_t, int8_t, vpaddq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP2(VpaddqS16N, int16_t, int16_t, vpaddq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(VpaddqS32N, int32_t, int32_t, vpaddq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(VpaddqS64N, int64_t, int64_t, vpaddq_s64, vst1q_s64, vld1q_s64, 2, 2)
LOOP2(VpaddqU8N, uint8_t, uint8_t, vpaddq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(VpaddqU16N, uint16_t, uint16_t, vpaddq_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(VpaddqU32N, uint32_t, uint32_t, vpaddq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(VpaddqU64N, uint64_t, uint64_t, vpaddq_u64, vst1q_u64, vld1q_u64, 2, 2)
LOOP2(VpaddqF32N, float32_t, float32_t, vpaddq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP2(VpaddqF64N, float64_t, float64_t, vpaddq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP2(VpmaxS8N, int8_t, int8_t, vpmax_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(VpmaxS16N, int16_t, int16_t, vpmax_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(VpmaxS32N, int32_t, int32_t, vpmax_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(VpmaxU8N, uint8_t, uint8_t, vpmax_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(VpmaxU16N, uint16_t, uint16_t, vpmax_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(VpmaxU32N, uint32_t, uint32_t, vpmax_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(VpmaxF32N, float32_t, float32_t, vpmax_f32, vst1_f32, vld1_f32, 2, 2)
LOOP2(VpmaxnmF32N, float32_t, float32_t, vpmaxnm_f32, vst1_f32, vld1_f32, 2, 2)
LOOP2(VpmaxnmqF32N, float32_t, float32_t, vpmaxnmq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP2(VpmaxnmqF64N, float64_t, float64_t, vpmaxnmq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP2(VpmaxqS8N, int8_t, int8_t, vpmaxq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP2(VpmaxqS16N, int16_t, int16_t, vpmaxq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(VpmaxqS32N, int32_t, int32_t, vpmaxq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(VpmaxqU8N, uint8_t, uint8_t, vpmaxq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(VpmaxqU16N, uint16_t, uint16_t, vpmaxq_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(VpmaxqU32N, uint32_t, uint32_t, vpmaxq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(VpmaxqF32N, float32_t, float32_t, vpmaxq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP2(VpmaxqF64N, float64_t, float64_t, vpmaxq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP2(VpminS8N, int8_t, int8_t, vpmin_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(VpminS16N, int16_t, int16_t, vpmin_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(VpminS32N, int32_t, int32_t, vpmin_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(VpminU8N, uint8_t, uint8_t, vpmin_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(VpminU16N, uint16_t, uint16_t, vpmin_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(VpminU32N, uint32_t, uint32_t, vpmin_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(VpminF32N, float32_t, float32_t, vpmin_f32, vst1_f32, vld1_f32, 2, 2)
LOOP2(VpminnmF32N, float32_t, float32_t, vpminnm_f32, vst1_f32, vld1_f32, 2, 2)
LOOP2(VpminnmqF32N, float32_t, float32_t, vpminnmq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP2(VpminnmqF64N, float64_t, float64_t, vpminnmq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP2(VpminqS8N, int8_t, int8_t, vpminq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP2(VpminqS16N, int16_t, int16_t, vpminq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(VpminqS32N, int32_t, int32_t, vpminq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(VpminqU8N, uint8_t, uint8_t, vpminq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(VpminqU16N, uint16_t, uint16_t, vpminq_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(VpminqU32N, uint32_t, uint32_t, vpminq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(VpminqF32N, float32_t, float32_t, vpminq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP2(VpminqF64N, float64_t, float64_t, vpminq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP2(VqaddS8N, int8_t, int8_t, vqadd_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(VqaddS16N, int16_t, int16_t, vqadd_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(VqaddS32N, int32_t, int32_t, vqadd_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(VqaddS64N, int64_t, int64_t, vqadd_s64, vst1_s64, vld1_s64, 1, 1)
LOOP2(VqaddU8N, uint8_t, uint8_t, vqadd_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(VqaddU16N, uint16_t, uint16_t, vqadd_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(VqaddU32N, uint32_t, uint32_t, vqadd_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(VqaddU64N, uint64_t, uint64_t, vqadd_u64, vst1_u64, vld1_u64, 1, 1)
LOOP2(VqaddbS8N, int8_t, int8_t, vqaddb_s8, save, load, 1, 1)
LOOP2(VqaddbU8N, uint8_t, uint8_t, vqaddb_u8, save, load, 1, 1)
LOOP2(VqadddS64N, int64_t, int64_t, vqaddd_s64, save, load, 1, 1)
LOOP2(VqadddU64N, uint64_t, uint64_t, vqaddd_u64, save, load, 1, 1)
LOOP2(VqaddhS16N, int16_t, int16_t, vqaddh_s16, save, load, 1, 1)
LOOP2(VqaddhU16N, uint16_t, uint16_t, vqaddh_u16, save, load, 1, 1)
LOOP2(VqaddqS8N, int8_t, int8_t, vqaddq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP2(VqaddqS16N, int16_t, int16_t, vqaddq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(VqaddqS32N, int32_t, int32_t, vqaddq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(VqaddqS64N, int64_t, int64_t, vqaddq_s64, vst1q_s64, vld1q_s64, 2, 2)
LOOP2(VqaddqU8N, uint8_t, uint8_t, vqaddq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(VqaddqU16N, uint16_t, uint16_t, vqaddq_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(VqaddqU32N, uint32_t, uint32_t, vqaddq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(VqaddqU64N, uint64_t, uint64_t, vqaddq_u64, vst1q_u64, vld1q_u64, 2, 2)
LOOP2(VqaddsS32N, int32_t, int32_t, vqadds_s32, save, load, 1, 1)
LOOP2(VqaddsU32N, uint32_t, uint32_t, vqadds_u32, save, load, 1, 1)
LOOP2(VqdmulhS16N, int16_t, int16_t, vqdmulh_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(VqdmulhS32N, int32_t, int32_t, vqdmulh_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(VqdmulhhS16N, int16_t, int16_t, vqdmulhh_s16, save, load, 1, 1)
LOOP2(VqdmulhqS16N, int16_t, int16_t, vqdmulhq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(VqdmulhqS32N, int32_t, int32_t, vqdmulhq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(VqdmulhsS32N, int32_t, int32_t, vqdmulhs_s32, save, load, 1, 1)
LOOP2(VqrdmulhS16N, int16_t, int16_t, vqrdmulh_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(VqrdmulhS32N, int32_t, int32_t, vqrdmulh_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(VqrdmulhhS16N, int16_t, int16_t, vqrdmulhh_s16, save, load, 1, 1)
LOOP2(VqrdmulhqS16N, int16_t, int16_t, vqrdmulhq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(VqrdmulhqS32N, int32_t, int32_t, vqrdmulhq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(VqrdmulhsS32N, int32_t, int32_t, vqrdmulhs_s32, save, load, 1, 1)
LOOP2(VqrshlS8N, int8_t, int8_t, vqrshl_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(VqrshlS16N, int16_t, int16_t, vqrshl_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(VqrshlS32N, int32_t, int32_t, vqrshl_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(VqrshlS64N, int64_t, int64_t, vqrshl_s64, vst1_s64, vld1_s64, 1, 1)
LOOP2(VqrshlbS8N, int8_t, int8_t, vqrshlb_s8, save, load, 1, 1)
LOOP2(VqrshldS64N, int64_t, int64_t, vqrshld_s64, save, load, 1, 1)
LOOP2(VqrshlhS16N, int16_t, int16_t, vqrshlh_s16, save, load, 1, 1)
LOOP2(VqrshlqS8N, int8_t, int8_t, vqrshlq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP2(VqrshlqS16N, int16_t, int16_t, vqrshlq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(VqrshlqS32N, int32_t, int32_t, vqrshlq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(VqrshlqS64N, int64_t, int64_t, vqrshlq_s64, vst1q_s64, vld1q_s64, 2, 2)
LOOP2(VqrshlsS32N, int32_t, int32_t, vqrshls_s32, save, load, 1, 1)
LOOP2(VqshlS8N, int8_t, int8_t, vqshl_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(VqshlS16N, int16_t, int16_t, vqshl_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(VqshlS32N, int32_t, int32_t, vqshl_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(VqshlS64N, int64_t, int64_t, vqshl_s64, vst1_s64, vld1_s64, 1, 1)
LOOP2(VqshlbS8N, int8_t, int8_t, vqshlb_s8, save, load, 1, 1)
LOOP2(VqshldS64N, int64_t, int64_t, vqshld_s64, save, load, 1, 1)
LOOP2(VqshlhS16N, int16_t, int16_t, vqshlh_s16, save, load, 1, 1)
LOOP2(VqshlqS8N, int8_t, int8_t, vqshlq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP2(VqshlqS16N, int16_t, int16_t, vqshlq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(VqshlqS32N, int32_t, int32_t, vqshlq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(VqshlqS64N, int64_t, int64_t, vqshlq_s64, vst1q_s64, vld1q_s64, 2, 2)
LOOP2(VqshlsS32N, int32_t, int32_t, vqshls_s32, save, load, 1, 1)
LOOP2(VqsubS8N, int8_t, int8_t, vqsub_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(VqsubS16N, int16_t, int16_t, vqsub_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(VqsubS32N, int32_t, int32_t, vqsub_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(VqsubS64N, int64_t, int64_t, vqsub_s64, vst1_s64, vld1_s64, 1, 1)
LOOP2(VqsubU8N, uint8_t, uint8_t, vqsub_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(VqsubU16N, uint16_t, uint16_t, vqsub_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(VqsubU32N, uint32_t, uint32_t, vqsub_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(VqsubU64N, uint64_t, uint64_t, vqsub_u64, vst1_u64, vld1_u64, 1, 1)
LOOP2(VqsubbS8N, int8_t, int8_t, vqsubb_s8, save, load, 1, 1)
LOOP2(VqsubbU8N, uint8_t, uint8_t, vqsubb_u8, save, load, 1, 1)
LOOP2(VqsubdS64N, int64_t, int64_t, vqsubd_s64, save, load, 1, 1)
LOOP2(VqsubdU64N, uint64_t, uint64_t, vqsubd_u64, save, load, 1, 1)
LOOP2(VqsubhS16N, int16_t, int16_t, vqsubh_s16, save, load, 1, 1)
LOOP2(VqsubhU16N, uint16_t, uint16_t, vqsubh_u16, save, load, 1, 1)
LOOP2(VqsubqS8N, int8_t, int8_t, vqsubq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP2(VqsubqS16N, int16_t, int16_t, vqsubq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(VqsubqS32N, int32_t, int32_t, vqsubq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(VqsubqS64N, int64_t, int64_t, vqsubq_s64, vst1q_s64, vld1q_s64, 2, 2)
LOOP2(VqsubqU8N, uint8_t, uint8_t, vqsubq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(VqsubqU16N, uint16_t, uint16_t, vqsubq_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(VqsubqU32N, uint32_t, uint32_t, vqsubq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(VqsubqU64N, uint64_t, uint64_t, vqsubq_u64, vst1q_u64, vld1q_u64, 2, 2)
LOOP2(VqsubsS32N, int32_t, int32_t, vqsubs_s32, save, load, 1, 1)
LOOP2(VqsubsU32N, uint32_t, uint32_t, vqsubs_u32, save, load, 1, 1)
LOOP2(Vqtbl1QU8N, uint8_t, uint8_t, vqtbl1q_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(Vrax1QU64N, uint64_t, uint64_t, vrax1q_u64, vst1q_u64, vld1q_u64, 2, 2)
LOOP2(VrecpsF32N, float32_t, float32_t, vrecps_f32, vst1_f32, vld1_f32, 2, 2)
LOOP2(VrecpsF64N, float64_t, float64_t, vrecps_f64, vst1_f64, vld1_f64, 1, 1)
LOOP2(VrecpsdF64N, float64_t, float64_t, vrecpsd_f64, save, load, 1, 1)
LOOP2(VrecpsqF32N, float32_t, float32_t, vrecpsq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP2(VrecpsqF64N, float64_t, float64_t, vrecpsq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP2(VrecpssF32N, float32_t, float32_t, vrecpss_f32, save, load, 1, 1)
LOOP2(VrhaddS8N, int8_t, int8_t, vrhadd_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(VrhaddS16N, int16_t, int16_t, vrhadd_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(VrhaddS32N, int32_t, int32_t, vrhadd_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(VrhaddU8N, uint8_t, uint8_t, vrhadd_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(VrhaddU16N, uint16_t, uint16_t, vrhadd_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(VrhaddU32N, uint32_t, uint32_t, vrhadd_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(VrhaddqS8N, int8_t, int8_t, vrhaddq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP2(VrhaddqS16N, int16_t, int16_t, vrhaddq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(VrhaddqS32N, int32_t, int32_t, vrhaddq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(VrhaddqU8N, uint8_t, uint8_t, vrhaddq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(VrhaddqU16N, uint16_t, uint16_t, vrhaddq_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(VrhaddqU32N, uint32_t, uint32_t, vrhaddq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(VrshlS8N, int8_t, int8_t, vrshl_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(VrshlS16N, int16_t, int16_t, vrshl_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(VrshlS32N, int32_t, int32_t, vrshl_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(VrshlS64N, int64_t, int64_t, vrshl_s64, vst1_s64, vld1_s64, 1, 1)
LOOP2(VrshldS64N, int64_t, int64_t, vrshld_s64, save, load, 1, 1)
LOOP2(VrshlqS8N, int8_t, int8_t, vrshlq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP2(VrshlqS16N, int16_t, int16_t, vrshlq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(VrshlqS32N, int32_t, int32_t, vrshlq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(VrshlqS64N, int64_t, int64_t, vrshlq_s64, vst1q_s64, vld1q_s64, 2, 2)
LOOP2(VrsqrtsF32N, float32_t, float32_t, vrsqrts_f32, vst1_f32, vld1_f32, 2, 2)
LOOP2(VrsqrtsF64N, float64_t, float64_t, vrsqrts_f64, vst1_f64, vld1_f64, 1, 1)
LOOP2(VrsqrtsdF64N, float64_t, float64_t, vrsqrtsd_f64, save, load, 1, 1)
LOOP2(VrsqrtsqF32N, float32_t, float32_t, vrsqrtsq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP2(VrsqrtsqF64N, float64_t, float64_t, vrsqrtsq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP2(VrsqrtssF32N, float32_t, float32_t, vrsqrtss_f32, save, load, 1, 1)
LOOP2(Vsha1Su1QU32N, uint32_t, uint32_t, vsha1su1q_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(Vsha256Su0QU32N, uint32_t, uint32_t, vsha256su0q_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(Vsha512Su0QU64N, uint64_t, uint64_t, vsha512su0q_u64, vst1q_u64, vld1q_u64, 2, 2)
LOOP2(VshlS8N, int8_t, int8_t, vshl_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(VshlS16N, int16_t, int16_t, vshl_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(VshlS32N, int32_t, int32_t, vshl_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(VshlS64N, int64_t, int64_t, vshl_s64, vst1_s64, vld1_s64, 1, 1)
LOOP2(VshldS64N, int64_t, int64_t, vshld_s64, save, load, 1, 1)
LOOP2(VshlqS8N, int8_t, int8_t, vshlq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP2(VshlqS16N, int16_t, int16_t, vshlq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(VshlqS32N, int32_t, int32_t, vshlq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(VshlqS64N, int64_t, int64_t, vshlq_s64, vst1q_s64, vld1q_s64, 2, 2)
LOOP2(Vsm4EkeyqU32N, uint32_t, uint32_t, vsm4ekeyq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(Vsm4EqU32N, uint32_t, uint32_t, vsm4eq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(VsubS8N, int8_t, int8_t, vsub_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(VsubS16N, int16_t, int16_t, vsub_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(VsubS32N, int32_t, int32_t, vsub_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(VsubS64N, int64_t, int64_t, vsub_s64, vst1_s64, vld1_s64, 1, 1)
LOOP2(VsubU8N, uint8_t, uint8_t, vsub_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(VsubU16N, uint16_t, uint16_t, vsub_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(VsubU32N, uint32_t, uint32_t, vsub_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(VsubU64N, uint64_t, uint64_t, vsub_u64, vst1_u64, vld1_u64, 1, 1)
LOOP2(VsubF32N, float32_t, float32_t, vsub_f32, vst1_f32, vld1_f32, 2, 2)
LOOP2(VsubF64N, float64_t, float64_t, vsub_f64, vst1_f64, vld1_f64, 1, 1)
LOOP2(VsubdS64N, int64_t, int64_t, vsubd_s64, save, load, 1, 1)
LOOP2(VsubdU64N, uint64_t, uint64_t, vsubd_u64, save, load, 1, 1)
LOOP2(VsubqS8N, int8_t, int8_t, vsubq_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP2(VsubqS16N, int16_t, int16_t, vsubq_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(VsubqS32N, int32_t, int32_t, vsubq_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(VsubqS64N, int64_t, int64_t, vsubq_s64, vst1q_s64, vld1q_s64, 2, 2)
LOOP2(VsubqU8N, uint8_t, uint8_t, vsubq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(VsubqU16N, uint16_t, uint16_t, vsubq_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(VsubqU32N, uint32_t, uint32_t, vsubq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(VsubqU64N, uint64_t, uint64_t, vsubq_u64, vst1q_u64, vld1q_u64, 2, 2)
LOOP2(VsubqF32N, float32_t, float32_t, vsubq_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP2(VsubqF64N, float64_t, float64_t, vsubq_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP2(Vtbl1S8N, int8_t, int8_t, vtbl1_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(Vtbl1U8N, uint8_t, uint8_t, vtbl1_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(Vtrn1S8N, int8_t, int8_t, vtrn1_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(Vtrn1S16N, int16_t, int16_t, vtrn1_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(Vtrn1S32N, int32_t, int32_t, vtrn1_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(Vtrn1U8N, uint8_t, uint8_t, vtrn1_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(Vtrn1U16N, uint16_t, uint16_t, vtrn1_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(Vtrn1U32N, uint32_t, uint32_t, vtrn1_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(Vtrn1F32N, float32_t, float32_t, vtrn1_f32, vst1_f32, vld1_f32, 2, 2)
LOOP2(Vtrn1QS8N, int8_t, int8_t, vtrn1q_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP2(Vtrn1QS16N, int16_t, int16_t, vtrn1q_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(Vtrn1QS32N, int32_t, int32_t, vtrn1q_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(Vtrn1QS64N, int64_t, int64_t, vtrn1q_s64, vst1q_s64, vld1q_s64, 2, 2)
LOOP2(Vtrn1QU8N, uint8_t, uint8_t, vtrn1q_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(Vtrn1QU16N, uint16_t, uint16_t, vtrn1q_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(Vtrn1QU32N, uint32_t, uint32_t, vtrn1q_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(Vtrn1QU64N, uint64_t, uint64_t, vtrn1q_u64, vst1q_u64, vld1q_u64, 2, 2)
LOOP2(Vtrn1QF32N, float32_t, float32_t, vtrn1q_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP2(Vtrn1QF64N, float64_t, float64_t, vtrn1q_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP2(Vtrn2S8N, int8_t, int8_t, vtrn2_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(Vtrn2S16N, int16_t, int16_t, vtrn2_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(Vtrn2S32N, int32_t, int32_t, vtrn2_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(Vtrn2U8N, uint8_t, uint8_t, vtrn2_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(Vtrn2U16N, uint16_t, uint16_t, vtrn2_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(Vtrn2U32N, uint32_t, uint32_t, vtrn2_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(Vtrn2F32N, float32_t, float32_t, vtrn2_f32, vst1_f32, vld1_f32, 2, 2)
LOOP2(Vtrn2QS8N, int8_t, int8_t, vtrn2q_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP2(Vtrn2QS16N, int16_t, int16_t, vtrn2q_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(Vtrn2QS32N, int32_t, int32_t, vtrn2q_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(Vtrn2QS64N, int64_t, int64_t, vtrn2q_s64, vst1q_s64, vld1q_s64, 2, 2)
LOOP2(Vtrn2QU8N, uint8_t, uint8_t, vtrn2q_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(Vtrn2QU16N, uint16_t, uint16_t, vtrn2q_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(Vtrn2QU32N, uint32_t, uint32_t, vtrn2q_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(Vtrn2QU64N, uint64_t, uint64_t, vtrn2q_u64, vst1q_u64, vld1q_u64, 2, 2)
LOOP2(Vtrn2QF32N, float32_t, float32_t, vtrn2q_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP2(Vtrn2QF64N, float64_t, float64_t, vtrn2q_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP2(VtstS8N, uint8_t, int8_t, vtst_s8, vst1_u8, vld1_s8, 8, 8)
LOOP2(VtstS16N, uint16_t, int16_t, vtst_s16, vst1_u16, vld1_s16, 4, 4)
LOOP2(VtstS32N, uint32_t, int32_t, vtst_s32, vst1_u32, vld1_s32, 2, 2)
LOOP2(VtstS64N, uint64_t, int64_t, vtst_s64, vst1_u64, vld1_s64, 1, 1)
LOOP2(VtstU8N, uint8_t, uint8_t, vtst_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(VtstU16N, uint16_t, uint16_t, vtst_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(VtstU32N, uint32_t, uint32_t, vtst_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(VtstU64N, uint64_t, uint64_t, vtst_u64, vst1_u64, vld1_u64, 1, 1)
LOOP2(VtstdS64N, uint64_t, int64_t, vtstd_s64, save, load, 1, 1)
LOOP2(VtstdU64N, uint64_t, uint64_t, vtstd_u64, save, load, 1, 1)
LOOP2(VtstqS8N, uint8_t, int8_t, vtstq_s8, vst1q_u8, vld1q_s8, 16, 16)
LOOP2(VtstqS16N, uint16_t, int16_t, vtstq_s16, vst1q_u16, vld1q_s16, 8, 8)
LOOP2(VtstqS32N, uint32_t, int32_t, vtstq_s32, vst1q_u32, vld1q_s32, 4, 4)
LOOP2(VtstqS64N, uint64_t, int64_t, vtstq_s64, vst1q_u64, vld1q_s64, 2, 2)
LOOP2(VtstqU8N, uint8_t, uint8_t, vtstq_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(VtstqU16N, uint16_t, uint16_t, vtstq_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(VtstqU32N, uint32_t, uint32_t, vtstq_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(VtstqU64N, uint64_t, uint64_t, vtstq_u64, vst1q_u64, vld1q_u64, 2, 2)
LOOP2(Vuzp1S8N, int8_t, int8_t, vuzp1_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(Vuzp1S16N, int16_t, int16_t, vuzp1_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(Vuzp1S32N, int32_t, int32_t, vuzp1_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(Vuzp1U8N, uint8_t, uint8_t, vuzp1_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(Vuzp1U16N, uint16_t, uint16_t, vuzp1_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(Vuzp1U32N, uint32_t, uint32_t, vuzp1_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(Vuzp1F32N, float32_t, float32_t, vuzp1_f32, vst1_f32, vld1_f32, 2, 2)
LOOP2(Vuzp1QS8N, int8_t, int8_t, vuzp1q_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP2(Vuzp1QS16N, int16_t, int16_t, vuzp1q_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(Vuzp1QS32N, int32_t, int32_t, vuzp1q_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(Vuzp1QS64N, int64_t, int64_t, vuzp1q_s64, vst1q_s64, vld1q_s64, 2, 2)
LOOP2(Vuzp1QU8N, uint8_t, uint8_t, vuzp1q_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(Vuzp1QU16N, uint16_t, uint16_t, vuzp1q_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(Vuzp1QU32N, uint32_t, uint32_t, vuzp1q_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(Vuzp1QU64N, uint64_t, uint64_t, vuzp1q_u64, vst1q_u64, vld1q_u64, 2, 2)
LOOP2(Vuzp1QF32N, float32_t, float32_t, vuzp1q_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP2(Vuzp1QF64N, float64_t, float64_t, vuzp1q_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP2(Vuzp2S8N, int8_t, int8_t, vuzp2_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(Vuzp2S16N, int16_t, int16_t, vuzp2_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(Vuzp2S32N, int32_t, int32_t, vuzp2_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(Vuzp2U8N, uint8_t, uint8_t, vuzp2_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(Vuzp2U16N, uint16_t, uint16_t, vuzp2_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(Vuzp2U32N, uint32_t, uint32_t, vuzp2_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(Vuzp2F32N, float32_t, float32_t, vuzp2_f32, vst1_f32, vld1_f32, 2, 2)
LOOP2(Vuzp2QS8N, int8_t, int8_t, vuzp2q_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP2(Vuzp2QS16N, int16_t, int16_t, vuzp2q_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(Vuzp2QS32N, int32_t, int32_t, vuzp2q_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(Vuzp2QS64N, int64_t, int64_t, vuzp2q_s64, vst1q_s64, vld1q_s64, 2, 2)
LOOP2(Vuzp2QU8N, uint8_t, uint8_t, vuzp2q_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(Vuzp2QU16N, uint16_t, uint16_t, vuzp2q_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(Vuzp2QU32N, uint32_t, uint32_t, vuzp2q_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(Vuzp2QU64N, uint64_t, uint64_t, vuzp2q_u64, vst1q_u64, vld1q_u64, 2, 2)
LOOP2(Vuzp2QF32N, float32_t, float32_t, vuzp2q_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP2(Vuzp2QF64N, float64_t, float64_t, vuzp2q_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP2(Vzip1S8N, int8_t, int8_t, vzip1_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(Vzip1S16N, int16_t, int16_t, vzip1_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(Vzip1S32N, int32_t, int32_t, vzip1_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(Vzip1U8N, uint8_t, uint8_t, vzip1_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(Vzip1U16N, uint16_t, uint16_t, vzip1_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(Vzip1U32N, uint32_t, uint32_t, vzip1_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(Vzip1F32N, float32_t, float32_t, vzip1_f32, vst1_f32, vld1_f32, 2, 2)
LOOP2(Vzip1QS8N, int8_t, int8_t, vzip1q_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP2(Vzip1QS16N, int16_t, int16_t, vzip1q_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(Vzip1QS32N, int32_t, int32_t, vzip1q_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(Vzip1QS64N, int64_t, int64_t, vzip1q_s64, vst1q_s64, vld1q_s64, 2, 2)
LOOP2(Vzip1QU8N, uint8_t, uint8_t, vzip1q_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(Vzip1QU16N, uint16_t, uint16_t, vzip1q_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(Vzip1QU32N, uint32_t, uint32_t, vzip1q_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(Vzip1QU64N, uint64_t, uint64_t, vzip1q_u64, vst1q_u64, vld1q_u64, 2, 2)
LOOP2(Vzip1QF32N, float32_t, float32_t, vzip1q_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP2(Vzip1QF64N, float64_t, float64_t, vzip1q_f64, vst1q_f64, vld1q_f64, 2, 2)
LOOP2(Vzip2S8N, int8_t, int8_t, vzip2_s8, vst1_s8, vld1_s8, 8, 8)
LOOP2(Vzip2S16N, int16_t, int16_t, vzip2_s16, vst1_s16, vld1_s16, 4, 4)
LOOP2(Vzip2S32N, int32_t, int32_t, vzip2_s32, vst1_s32, vld1_s32, 2, 2)
LOOP2(Vzip2U8N, uint8_t, uint8_t, vzip2_u8, vst1_u8, vld1_u8, 8, 8)
LOOP2(Vzip2U16N, uint16_t, uint16_t, vzip2_u16, vst1_u16, vld1_u16, 4, 4)
LOOP2(Vzip2U32N, uint32_t, uint32_t, vzip2_u32, vst1_u32, vld1_u32, 2, 2)
LOOP2(Vzip2F32N, float32_t, float32_t, vzip2_f32, vst1_f32, vld1_f32, 2, 2)
LOOP2(Vzip2QS8N, int8_t, int8_t, vzip2q_s8, vst1q_s8, vld1q_s8, 16, 16)
LOOP2(Vzip2QS16N, int16_t, int16_t, vzip2q_s16, vst1q_s16, vld1q_s16, 8, 8)
LOOP2(Vzip2QS32N, int32_t, int32_t, vzip2q_s32, vst1q_s32, vld1q_s32, 4, 4)
LOOP2(Vzip2QS64N, int64_t, int64_t, vzip2q_s64, vst1q_s64, vld1q_s64, 2, 2)
LOOP2(Vzip2QU8N, uint8_t, uint8_t, vzip2q_u8, vst1q_u8, vld1q_u8, 16, 16)
LOOP2(Vzip2QU16N, uint16_t, uint16_t, vzip2q_u16, vst1q_u16, vld1q_u16, 8, 8)
LOOP2(Vzip2QU32N, uint32_t, uint32_t, vzip2q_u32, vst1q_u32, vld1q_u32, 4, 4)
LOOP2(Vzip2QU64N, uint64_t, uint64_t, vzip2q_u64, vst1q_u64, vld1q_u64, 2, 2)
LOOP2(Vzip2QF32N, float32_t, float32_t, vzip2q_f32, vst1q_f32, vld1q_f32, 4, 4)
LOOP2(Vzip2QF64N, float64_t, float64_t, vzip2q_f64, vst1q_f64, vld1q_f64, 2, 2)


================================================
FILE: arm/neon/loops.go
================================================
package neon

import (
	"github.com/alivanz/go-simd/arm"
)

/*
#include <arm_neon.h>
*/
import "C"

// Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdS8N VabdS8N
//go:noescape
func VabdS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdS16N VabdS16N
//go:noescape
func VabdS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdS32N VabdS32N
//go:noescape
func VabdS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdU8N VabdU8N
//go:noescape
func VabdU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdU16N VabdU16N
//go:noescape
func VabdU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdU32N VabdU32N
//go:noescape
func VabdU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdF32N VabdF32N
//go:noescape
func VabdF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdF64N VabdF64N
//go:noescape
func VabdF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabddF64N VabddF64N
//go:noescape
func VabddF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdqS8N VabdqS8N
//go:noescape
func VabdqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdqS16N VabdqS16N
//go:noescape
func VabdqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdqS32N VabdqS32N
//go:noescape
func VabdqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdqU8N VabdqU8N
//go:noescape
func VabdqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdqU16N VabdqU16N
//go:noescape
func VabdqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdqU32N VabdqU32N
//go:noescape
func VabdqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdqF32N VabdqF32N
//go:noescape
func VabdqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdqF64N VabdqF64N
//go:noescape
func VabdqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabdsF32N VabdsF32N
//go:noescape
func VabdsF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabsS8N VabsS8N
//go:noescape
func VabsS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabsS16N VabsS16N
//go:noescape
func VabsS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabsS32N VabsS32N
//go:noescape
func VabsS32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabsS64N VabsS64N
//go:noescape
func VabsS64N(r *arm.Int64, v0 *arm.Int64, n int32)

// Floating-point Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabsF32N VabsF32N
//go:noescape
func VabsF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabsF64N VabsF64N
//go:noescape
func VabsF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabsdS64N VabsdS64N
//go:noescape
func VabsdS64N(r *arm.Int64, v0 *arm.Int64, n int32)

// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabsqS8N VabsqS8N
//go:noescape
func VabsqS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabsqS16N VabsqS16N
//go:noescape
func VabsqS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabsqS32N VabsqS32N
//go:noescape
func VabsqS32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabsqS64N VabsqS64N
//go:noescape
func VabsqS64N(r *arm.Int64, v0 *arm.Int64, n int32)

// Floating-point Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabsqF32N VabsqF32N
//go:noescape
func VabsqF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VabsqF64N VabsqF64N
//go:noescape
func VabsqF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddS8N VaddS8N
//go:noescape
func VaddS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddS16N VaddS16N
//go:noescape
func VaddS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddS32N VaddS32N
//go:noescape
func VaddS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddS64N VaddS64N
//go:noescape
func VaddS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddU8N VaddU8N
//go:noescape
func VaddU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddU16N VaddU16N
//go:noescape
func VaddU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddU32N VaddU32N
//go:noescape
func VaddU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddU64N VaddU64N
//go:noescape
func VaddU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Floating-point Add (vector). This instruction adds corresponding vector elements in the two source SIMD&FP registers, writes the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VaddF32N VaddF32N
//go:noescape
func VaddF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Add (vector). This instruction adds corresponding vector elements in the two source SIMD&FP registers, writes the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VaddF64N VaddF64N
//go:noescape
func VaddF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VadddS64N VadddS64N
//go:noescape
func VadddS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VadddU64N VadddU64N
//go:noescape
func VadddU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddqS8N VaddqS8N
//go:noescape
func VaddqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddqS16N VaddqS16N
//go:noescape
func VaddqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddqS32N VaddqS32N
//go:noescape
func VaddqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddqS64N VaddqS64N
//go:noescape
func VaddqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddqU8N VaddqU8N
//go:noescape
func VaddqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddqU16N VaddqU16N
//go:noescape
func VaddqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddqU32N VaddqU32N
//go:noescape
func VaddqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VaddqU64N VaddqU64N
//go:noescape
func VaddqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Floating-point Add (vector). This instruction adds corresponding vector elements in the two source SIMD&FP registers, writes the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VaddqF32N VaddqF32N
//go:noescape
func VaddqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Add (vector). This instruction adds corresponding vector elements in the two source SIMD&FP registers, writes the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VaddqF64N VaddqF64N
//go:noescape
func VaddqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.
//
//go:linkname VaddvS8N VaddvS8N
//go:noescape
func VaddvS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.
//
//go:linkname VaddvS16N VaddvS16N
//go:noescape
func VaddvS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Add across vector
//
//go:linkname VaddvS32N VaddvS32N
//go:noescape
func VaddvS32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.
//
//go:linkname VaddvU8N VaddvU8N
//go:noescape
func VaddvU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.
//
//go:linkname VaddvU16N VaddvU16N
//go:noescape
func VaddvU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)

// Add across vector
//
//go:linkname VaddvU32N VaddvU32N
//go:noescape
func VaddvU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)

// Floating-point add across vector
//
//go:linkname VaddvF32N VaddvF32N
//go:noescape
func VaddvF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.
//
//go:linkname VaddvqS8N VaddvqS8N
//go:noescape
func VaddvqS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.
//
//go:linkname VaddvqS16N VaddvqS16N
//go:noescape
func VaddvqS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.
//
//go:linkname VaddvqS32N VaddvqS32N
//go:noescape
func VaddvqS32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Add across vector
//
//go:linkname VaddvqS64N VaddvqS64N
//go:noescape
func VaddvqS64N(r *arm.Int64, v0 *arm.Int64, n int32)

// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.
//
//go:linkname VaddvqU8N VaddvqU8N
//go:noescape
func VaddvqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.
//
//go:linkname VaddvqU16N VaddvqU16N
//go:noescape
func VaddvqU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)

// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.
//
//go:linkname VaddvqU32N VaddvqU32N
//go:noescape
func VaddvqU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)

// Add across vector
//
//go:linkname VaddvqU64N VaddvqU64N
//go:noescape
func VaddvqU64N(r *arm.Uint64, v0 *arm.Uint64, n int32)

// Floating-point add across vector
//
//go:linkname VaddvqF32N VaddvqF32N
//go:noescape
func VaddvqF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point add across vector
//
//go:linkname VaddvqF64N VaddvqF64N
//go:noescape
func VaddvqF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// AES single round decryption.
//
//go:linkname VaesdqU8N VaesdqU8N
//go:noescape
func VaesdqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// AES single round encryption.
//
//go:linkname VaeseqU8N VaeseqU8N
//go:noescape
func VaeseqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// AES inverse mix columns.
//
//go:linkname VaesimcqU8N VaesimcqU8N
//go:noescape
func VaesimcqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// AES mix columns.
//
//go:linkname VaesmcqU8N VaesmcqU8N
//go:noescape
func VaesmcqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandS8N VandS8N
//go:noescape
func VandS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandS16N VandS16N
//go:noescape
func VandS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandS32N VandS32N
//go:noescape
func VandS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandS64N VandS64N
//go:noescape
func VandS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandU8N VandU8N
//go:noescape
func VandU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandU16N VandU16N
//go:noescape
func VandU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandU32N VandU32N
//go:noescape
func VandU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandU64N VandU64N
//go:noescape
func VandU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandqS8N VandqS8N
//go:noescape
func VandqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandqS16N VandqS16N
//go:noescape
func VandqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandqS32N VandqS32N
//go:noescape
func VandqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandqS64N VandqS64N
//go:noescape
func VandqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandqU8N VandqU8N
//go:noescape
func VandqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandqU16N VandqU16N
//go:noescape
func VandqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandqU32N VandqU32N
//go:noescape
func VandqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VandqU64N VandqU64N
//go:noescape
func VandqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicS8N VbicS8N
//go:noescape
func VbicS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicS16N VbicS16N
//go:noescape
func VbicS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicS32N VbicS32N
//go:noescape
func VbicS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicS64N VbicS64N
//go:noescape
func VbicS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicU8N VbicU8N
//go:noescape
func VbicU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicU16N VbicU16N
//go:noescape
func VbicU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicU32N VbicU32N
//go:noescape
func VbicU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicU64N VbicU64N
//go:noescape
func VbicU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicqS8N VbicqS8N
//go:noescape
func VbicqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicqS16N VbicqS16N
//go:noescape
func VbicqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicqS32N VbicqS32N
//go:noescape
func VbicqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicqS64N VbicqS64N
//go:noescape
func VbicqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicqU8N VbicqU8N
//go:noescape
func VbicqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicqU16N VbicqU16N
//go:noescape
func VbicqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicqU32N VbicqU32N
//go:noescape
func VbicqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname VbicqU64N VbicqU64N
//go:noescape
func VbicqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Floating-point Complex Add.
//
//go:linkname VcaddRot270F32N VcaddRot270F32N
//go:noescape
func VcaddRot270F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Complex Add.
//
//go:linkname VcaddRot90F32N VcaddRot90F32N
//go:noescape
func VcaddRot90F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Complex Add.
//
//go:linkname VcaddqRot270F32N VcaddqRot270F32N
//go:noescape
func VcaddqRot270F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Complex Add.
//
//go:linkname VcaddqRot270F64N VcaddqRot270F64N
//go:noescape
func VcaddqRot270F64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Complex Add.
//
//go:linkname VcaddqRot90F32N VcaddqRot90F32N
//go:noescape
func VcaddqRot90F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Complex Add.
//
//go:linkname VcaddqRot90F64N VcaddqRot90F64N
//go:noescape
func VcaddqRot90F64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcageF32N VcageF32N
//go:noescape
func VcageF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcageF64N VcageF64N
//go:noescape
func VcageF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcagedF64N VcagedF64N
//go:noescape
func VcagedF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcageqF32N VcageqF32N
//go:noescape
func VcageqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcageqF64N VcageqF64N
//go:noescape
func VcageqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcagesF32N VcagesF32N
//go:noescape
func VcagesF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcagtF32N VcagtF32N
//go:noescape
func VcagtF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcagtF64N VcagtF64N
//go:noescape
func VcagtF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcagtdF64N VcagtdF64N
//go:noescape
func VcagtdF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcagtqF32N VcagtqF32N
//go:noescape
func VcagtqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcagtqF64N VcagtqF64N
//go:noescape
func VcagtqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcagtsF32N VcagtsF32N
//go:noescape
func VcagtsF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point absolute compare less than or equal
//
//go:linkname VcaleF32N VcaleF32N
//go:noescape
func VcaleF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point absolute compare less than or equal
//
//go:linkname VcaleF64N VcaleF64N
//go:noescape
func VcaleF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point absolute compare less than or equal
//
//go:linkname VcaledF64N VcaledF64N
//go:noescape
func VcaledF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point absolute compare less than or equal
//
//go:linkname VcaleqF32N VcaleqF32N
//go:noescape
func VcaleqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point absolute compare less than or equal
//
//go:linkname VcaleqF64N VcaleqF64N
//go:noescape
func VcaleqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point absolute compare less than or equal
//
//go:linkname VcalesF32N VcalesF32N
//go:noescape
func VcalesF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point absolute compare less than
//
//go:linkname VcaltF32N VcaltF32N
//go:noescape
func VcaltF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point absolute compare less than
//
//go:linkname VcaltF64N VcaltF64N
//go:noescape
func VcaltF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point absolute compare less than
//
//go:linkname VcaltdF64N VcaltdF64N
//go:noescape
func VcaltdF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point absolute compare less than
//
//go:linkname VcaltqF32N VcaltqF32N
//go:noescape
func VcaltqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point absolute compare less than
//
//go:linkname VcaltqF64N VcaltqF64N
//go:noescape
func VcaltqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point absolute compare less than
//
//go:linkname VcaltsF32N VcaltsF32N
//go:noescape
func VcaltsF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqS8N VceqS8N
//go:noescape
func VceqS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqS16N VceqS16N
//go:noescape
func VceqS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqS32N VceqS32N
//go:noescape
func VceqS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqS64N VceqS64N
//go:noescape
func VceqS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqU8N VceqU8N
//go:noescape
func VceqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqU16N VceqU16N
//go:noescape
func VceqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqU32N VceqU32N
//go:noescape
func VceqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqU64N VceqU64N
//go:noescape
func VceqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqF32N VceqF32N
//go:noescape
func VceqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqF64N VceqF64N
//go:noescape
func VceqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqdS64N VceqdS64N
//go:noescape
func VceqdS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqdU64N VceqdU64N
//go:noescape
func VceqdU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqdF64N VceqdF64N
//go:noescape
func VceqdF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqqS8N VceqqS8N
//go:noescape
func VceqqS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqqS16N VceqqS16N
//go:noescape
func VceqqS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqqS32N VceqqS32N
//go:noescape
func VceqqS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqqS64N VceqqS64N
//go:noescape
func VceqqS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqqU8N VceqqU8N
//go:noescape
func VceqqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqqU16N VceqqU16N
//go:noescape
func VceqqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqqU32N VceqqU32N
//go:noescape
func VceqqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqqU64N VceqqU64N
//go:noescape
func VceqqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqqF32N VceqqF32N
//go:noescape
func VceqqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqqF64N VceqqF64N
//go:noescape
func VceqqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqsF32N VceqsF32N
//go:noescape
func VceqsF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzS8N VceqzS8N
//go:noescape
func VceqzS8N(r *arm.Uint8, v0 *arm.Int8, n int32)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzS16N VceqzS16N
//go:noescape
func VceqzS16N(r *arm.Uint16, v0 *arm.Int16, n int32)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzS32N VceqzS32N
//go:noescape
func VceqzS32N(r *arm.Uint32, v0 *arm.Int32, n int32)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzS64N VceqzS64N
//go:noescape
func VceqzS64N(r *arm.Uint64, v0 *arm.Int64, n int32)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzU8N VceqzU8N
//go:noescape
func VceqzU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzU16N VceqzU16N
//go:noescape
func VceqzU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzU32N VceqzU32N
//go:noescape
func VceqzU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzU64N VceqzU64N
//go:noescape
func VceqzU64N(r *arm.Uint64, v0 *arm.Uint64, n int32)

// Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzF32N VceqzF32N
//go:noescape
func VceqzF32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzF64N VceqzF64N
//go:noescape
func VceqzF64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzdS64N VceqzdS64N
//go:noescape
func VceqzdS64N(r *arm.Uint64, v0 *arm.Int64, n int32)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzdU64N VceqzdU64N
//go:noescape
func VceqzdU64N(r *arm.Uint64, v0 *arm.Uint64, n int32)

// Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzdF64N VceqzdF64N
//go:noescape
func VceqzdF64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzqS8N VceqzqS8N
//go:noescape
func VceqzqS8N(r *arm.Uint8, v0 *arm.Int8, n int32)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzqS16N VceqzqS16N
//go:noescape
func VceqzqS16N(r *arm.Uint16, v0 *arm.Int16, n int32)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzqS32N VceqzqS32N
//go:noescape
func VceqzqS32N(r *arm.Uint32, v0 *arm.Int32, n int32)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzqS64N VceqzqS64N
//go:noescape
func VceqzqS64N(r *arm.Uint64, v0 *arm.Int64, n int32)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzqU8N VceqzqU8N
//go:noescape
func VceqzqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzqU16N VceqzqU16N
//go:noescape
func VceqzqU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzqU32N VceqzqU32N
//go:noescape
func VceqzqU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)

// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzqU64N VceqzqU64N
//go:noescape
func VceqzqU64N(r *arm.Uint64, v0 *arm.Uint64, n int32)

// Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzqF32N VceqzqF32N
//go:noescape
func VceqzqF32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzqF64N VceqzqF64N
//go:noescape
func VceqzqF64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VceqzsF32N VceqzsF32N
//go:noescape
func VceqzsF32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeS8N VcgeS8N
//go:noescape
func VcgeS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeS16N VcgeS16N
//go:noescape
func VcgeS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeS32N VcgeS32N
//go:noescape
func VcgeS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeS64N VcgeS64N
//go:noescape
func VcgeS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeU8N VcgeU8N
//go:noescape
func VcgeU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeU16N VcgeU16N
//go:noescape
func VcgeU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeU32N VcgeU32N
//go:noescape
func VcgeU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeU64N VcgeU64N
//go:noescape
func VcgeU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeF32N VcgeF32N
//go:noescape
func VcgeF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeF64N VcgeF64N
//go:noescape
func VcgeF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgedS64N VcgedS64N
//go:noescape
func VcgedS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgedU64N VcgedU64N
//go:noescape
func VcgedU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgedF64N VcgedF64N
//go:noescape
func VcgedF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeqS8N VcgeqS8N
//go:noescape
func VcgeqS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeqS16N VcgeqS16N
//go:noescape
func VcgeqS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeqS32N VcgeqS32N
//go:noescape
func VcgeqS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeqS64N VcgeqS64N
//go:noescape
func VcgeqS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeqU8N VcgeqU8N
//go:noescape
func VcgeqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeqU16N VcgeqU16N
//go:noescape
func VcgeqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeqU32N VcgeqU32N
//go:noescape
func VcgeqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeqU64N VcgeqU64N
//go:noescape
func VcgeqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeqF32N VcgeqF32N
//go:noescape
func VcgeqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgeqF64N VcgeqF64N
//go:noescape
func VcgeqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgesF32N VcgesF32N
//go:noescape
func VcgesF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezS8N VcgezS8N
//go:noescape
func VcgezS8N(r *arm.Uint8, v0 *arm.Int8, n int32)

// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezS16N VcgezS16N
//go:noescape
func VcgezS16N(r *arm.Uint16, v0 *arm.Int16, n int32)

// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezS32N VcgezS32N
//go:noescape
func VcgezS32N(r *arm.Uint32, v0 *arm.Int32, n int32)

// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezS64N VcgezS64N
//go:noescape
func VcgezS64N(r *arm.Uint64, v0 *arm.Int64, n int32)

// Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezF32N VcgezF32N
//go:noescape
func VcgezF32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezF64N VcgezF64N
//go:noescape
func VcgezF64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezdS64N VcgezdS64N
//go:noescape
func VcgezdS64N(r *arm.Uint64, v0 *arm.Int64, n int32)

// Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezdF64N VcgezdF64N
//go:noescape
func VcgezdF64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezqS8N VcgezqS8N
//go:noescape
func VcgezqS8N(r *arm.Uint8, v0 *arm.Int8, n int32)

// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezqS16N VcgezqS16N
//go:noescape
func VcgezqS16N(r *arm.Uint16, v0 *arm.Int16, n int32)

// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezqS32N VcgezqS32N
//go:noescape
func VcgezqS32N(r *arm.Uint32, v0 *arm.Int32, n int32)

// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezqS64N VcgezqS64N
//go:noescape
func VcgezqS64N(r *arm.Uint64, v0 *arm.Int64, n int32)

// Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezqF32N VcgezqF32N
//go:noescape
func VcgezqF32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezqF64N VcgezqF64N
//go:noescape
func VcgezqF64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgezsF32N VcgezsF32N
//go:noescape
func VcgezsF32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtS8N VcgtS8N
//go:noescape
func VcgtS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtS16N VcgtS16N
//go:noescape
func VcgtS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtS32N VcgtS32N
//go:noescape
func VcgtS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtS64N VcgtS64N
//go:noescape
func VcgtS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtU8N VcgtU8N
//go:noescape
func VcgtU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtU16N VcgtU16N
//go:noescape
func VcgtU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtU32N VcgtU32N
//go:noescape
func VcgtU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtU64N VcgtU64N
//go:noescape
func VcgtU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtF32N VcgtF32N
//go:noescape
func VcgtF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtF64N VcgtF64N
//go:noescape
func VcgtF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtdS64N VcgtdS64N
//go:noescape
func VcgtdS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtdU64N VcgtdU64N
//go:noescape
func VcgtdU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtdF64N VcgtdF64N
//go:noescape
func VcgtdF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtqS8N VcgtqS8N
//go:noescape
func VcgtqS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtqS16N VcgtqS16N
//go:noescape
func VcgtqS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtqS32N VcgtqS32N
//go:noescape
func VcgtqS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtqS64N VcgtqS64N
//go:noescape
func VcgtqS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtqU8N VcgtqU8N
//go:noescape
func VcgtqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtqU16N VcgtqU16N
//go:noescape
func VcgtqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtqU32N VcgtqU32N
//go:noescape
func VcgtqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtqU64N VcgtqU64N
//go:noescape
func VcgtqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtqF32N VcgtqF32N
//go:noescape
func VcgtqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtqF64N VcgtqF64N
//go:noescape
func VcgtqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtsF32N VcgtsF32N
//go:noescape
func VcgtsF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzS8N VcgtzS8N
//go:noescape
func VcgtzS8N(r *arm.Uint8, v0 *arm.Int8, n int32)

// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzS16N VcgtzS16N
//go:noescape
func VcgtzS16N(r *arm.Uint16, v0 *arm.Int16, n int32)

// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzS32N VcgtzS32N
//go:noescape
func VcgtzS32N(r *arm.Uint32, v0 *arm.Int32, n int32)

// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzS64N VcgtzS64N
//go:noescape
func VcgtzS64N(r *arm.Uint64, v0 *arm.Int64, n int32)

// Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzF32N VcgtzF32N
//go:noescape
func VcgtzF32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzF64N VcgtzF64N
//go:noescape
func VcgtzF64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzdS64N VcgtzdS64N
//go:noescape
func VcgtzdS64N(r *arm.Uint64, v0 *arm.Int64, n int32)

// Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzdF64N VcgtzdF64N
//go:noescape
func VcgtzdF64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzqS8N VcgtzqS8N
//go:noescape
func VcgtzqS8N(r *arm.Uint8, v0 *arm.Int8, n int32)

// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzqS16N VcgtzqS16N
//go:noescape
func VcgtzqS16N(r *arm.Uint16, v0 *arm.Int16, n int32)

// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzqS32N VcgtzqS32N
//go:noescape
func VcgtzqS32N(r *arm.Uint32, v0 *arm.Int32, n int32)

// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzqS64N VcgtzqS64N
//go:noescape
func VcgtzqS64N(r *arm.Uint64, v0 *arm.Int64, n int32)

// Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzqF32N VcgtzqF32N
//go:noescape
func VcgtzqF32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzqF64N VcgtzqF64N
//go:noescape
func VcgtzqF64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcgtzsF32N VcgtzsF32N
//go:noescape
func VcgtzsF32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Compare signed less than or equal
//
//go:linkname VcleS8N VcleS8N
//go:noescape
func VcleS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Compare signed less than or equal
//
//go:linkname VcleS16N VcleS16N
//go:noescape
func VcleS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Compare signed less than or equal
//
//go:linkname VcleS32N VcleS32N
//go:noescape
func VcleS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Compare signed less than or equal
//
//go:linkname VcleS64N VcleS64N
//go:noescape
func VcleS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Compare unsigned less than or equal
//
//go:linkname VcleU8N VcleU8N
//go:noescape
func VcleU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Compare unsigned less than or equal
//
//go:linkname VcleU16N VcleU16N
//go:noescape
func VcleU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Compare unsigned less than or equal
//
//go:linkname VcleU32N VcleU32N
//go:noescape
func VcleU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Compare unsigned less than or equal
//
//go:linkname VcleU64N VcleU64N
//go:noescape
func VcleU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Floating-point compare less than or equal
//
//go:linkname VcleF32N VcleF32N
//go:noescape
func VcleF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point compare less than or equal
//
//go:linkname VcleF64N VcleF64N
//go:noescape
func VcleF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Compare signed less than or equal
//
//go:linkname VcledS64N VcledS64N
//go:noescape
func VcledS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Compare unsigned less than or equal
//
//go:linkname VcledU64N VcledU64N
//go:noescape
func VcledU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Floating-point compare less than or equal
//
//go:linkname VcledF64N VcledF64N
//go:noescape
func VcledF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Compare signed less than or equal
//
//go:linkname VcleqS8N VcleqS8N
//go:noescape
func VcleqS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Compare signed less than or equal
//
//go:linkname VcleqS16N VcleqS16N
//go:noescape
func VcleqS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Compare signed less than or equal
//
//go:linkname VcleqS32N VcleqS32N
//go:noescape
func VcleqS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Compare signed less than or equal
//
//go:linkname VcleqS64N VcleqS64N
//go:noescape
func VcleqS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Compare unsigned less than or equal
//
//go:linkname VcleqU8N VcleqU8N
//go:noescape
func VcleqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Compare unsigned less than or equal
//
//go:linkname VcleqU16N VcleqU16N
//go:noescape
func VcleqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Compare unsigned less than or equal
//
//go:linkname VcleqU32N VcleqU32N
//go:noescape
func VcleqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Compare unsigned less than or equal
//
//go:linkname VcleqU64N VcleqU64N
//go:noescape
func VcleqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Floating-point compare less than or equal
//
//go:linkname VcleqF32N VcleqF32N
//go:noescape
func VcleqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point compare less than or equal
//
//go:linkname VcleqF64N VcleqF64N
//go:noescape
func VcleqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point compare less than or equal
//
//go:linkname VclesF32N VclesF32N
//go:noescape
func VclesF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezS8N VclezS8N
//go:noescape
func VclezS8N(r *arm.Uint8, v0 *arm.Int8, n int32)

// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezS16N VclezS16N
//go:noescape
func VclezS16N(r *arm.Uint16, v0 *arm.Int16, n int32)

// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezS32N VclezS32N
//go:noescape
func VclezS32N(r *arm.Uint32, v0 *arm.Int32, n int32)

// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezS64N VclezS64N
//go:noescape
func VclezS64N(r *arm.Uint64, v0 *arm.Int64, n int32)

// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezF32N VclezF32N
//go:noescape
func VclezF32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezF64N VclezF64N
//go:noescape
func VclezF64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezdS64N VclezdS64N
//go:noescape
func VclezdS64N(r *arm.Uint64, v0 *arm.Int64, n int32)

// Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezdF64N VclezdF64N
//go:noescape
func VclezdF64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezqS8N VclezqS8N
//go:noescape
func VclezqS8N(r *arm.Uint8, v0 *arm.Int8, n int32)

// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezqS16N VclezqS16N
//go:noescape
func VclezqS16N(r *arm.Uint16, v0 *arm.Int16, n int32)

// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezqS32N VclezqS32N
//go:noescape
func VclezqS32N(r *arm.Uint32, v0 *arm.Int32, n int32)

// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezqS64N VclezqS64N
//go:noescape
func VclezqS64N(r *arm.Uint64, v0 *arm.Int64, n int32)

// Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezqF32N VclezqF32N
//go:noescape
func VclezqF32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezqF64N VclezqF64N
//go:noescape
func VclezqF64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VclezsF32N VclezsF32N
//go:noescape
func VclezsF32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.
//
//go:linkname VclsS8N VclsS8N
//go:noescape
func VclsS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.
//
//go:linkname VclsS16N VclsS16N
//go:noescape
func VclsS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.
//
//go:linkname VclsS32N VclsS32N
//go:noescape
func VclsS32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.
//
//go:linkname VclsU8N VclsU8N
//go:noescape
func VclsU8N(r *arm.Int8, v0 *arm.Uint8, n int32)

// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.
//
//go:linkname VclsU16N VclsU16N
//go:noescape
func VclsU16N(r *arm.Int16, v0 *arm.Uint16, n int32)

// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.
//
//go:linkname VclsU32N VclsU32N
//go:noescape
func VclsU32N(r *arm.Int32, v0 *arm.Uint32, n int32)

// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.
//
//go:linkname VclsqS8N VclsqS8N
//go:noescape
func VclsqS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.
//
//go:linkname VclsqS16N VclsqS16N
//go:noescape
func VclsqS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.
//
//go:linkname VclsqS32N VclsqS32N
//go:noescape
func VclsqS32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.
//
//go:linkname VclsqU8N VclsqU8N
//go:noescape
func VclsqU8N(r *arm.Int8, v0 *arm.Uint8, n int32)

// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.
//
//go:linkname VclsqU16N VclsqU16N
//go:noescape
func VclsqU16N(r *arm.Int16, v0 *arm.Uint16, n int32)

// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.
//
//go:linkname VclsqU32N VclsqU32N
//go:noescape
func VclsqU32N(r *arm.Int32, v0 *arm.Uint32, n int32)

// Compare signed less than
//
//go:linkname VcltS8N VcltS8N
//go:noescape
func VcltS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Compare signed less than
//
//go:linkname VcltS16N VcltS16N
//go:noescape
func VcltS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Compare signed less than
//
//go:linkname VcltS32N VcltS32N
//go:noescape
func VcltS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Compare signed less than
//
//go:linkname VcltS64N VcltS64N
//go:noescape
func VcltS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Compare unsigned less than
//
//go:linkname VcltU8N VcltU8N
//go:noescape
func VcltU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Compare unsigned less than
//
//go:linkname VcltU16N VcltU16N
//go:noescape
func VcltU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Compare unsigned less than
//
//go:linkname VcltU32N VcltU32N
//go:noescape
func VcltU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Compare unsigned less than
//
//go:linkname VcltU64N VcltU64N
//go:noescape
func VcltU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Floating-point compare less than
//
//go:linkname VcltF32N VcltF32N
//go:noescape
func VcltF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point compare less than
//
//go:linkname VcltF64N VcltF64N
//go:noescape
func VcltF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Compare signed less than
//
//go:linkname VcltdS64N VcltdS64N
//go:noescape
func VcltdS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Compare unsigned less than
//
//go:linkname VcltdU64N VcltdU64N
//go:noescape
func VcltdU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Floating-point compare less than
//
//go:linkname VcltdF64N VcltdF64N
//go:noescape
func VcltdF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Compare signed less than
//
//go:linkname VcltqS8N VcltqS8N
//go:noescape
func VcltqS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Compare signed less than
//
//go:linkname VcltqS16N VcltqS16N
//go:noescape
func VcltqS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Compare signed less than
//
//go:linkname VcltqS32N VcltqS32N
//go:noescape
func VcltqS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Compare signed less than
//
//go:linkname VcltqS64N VcltqS64N
//go:noescape
func VcltqS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Compare unsigned less than
//
//go:linkname VcltqU8N VcltqU8N
//go:noescape
func VcltqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Compare unsigned less than
//
//go:linkname VcltqU16N VcltqU16N
//go:noescape
func VcltqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Compare unsigned less than
//
//go:linkname VcltqU32N VcltqU32N
//go:noescape
func VcltqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Compare unsigned less than
//
//go:linkname VcltqU64N VcltqU64N
//go:noescape
func VcltqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Floating-point compare less than
//
//go:linkname VcltqF32N VcltqF32N
//go:noescape
func VcltqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point compare less than
//
//go:linkname VcltqF64N VcltqF64N
//go:noescape
func VcltqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point compare less than
//
//go:linkname VcltsF32N VcltsF32N
//go:noescape
func VcltsF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzS8N VcltzS8N
//go:noescape
func VcltzS8N(r *arm.Uint8, v0 *arm.Int8, n int32)

// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzS16N VcltzS16N
//go:noescape
func VcltzS16N(r *arm.Uint16, v0 *arm.Int16, n int32)

// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzS32N VcltzS32N
//go:noescape
func VcltzS32N(r *arm.Uint32, v0 *arm.Int32, n int32)

// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzS64N VcltzS64N
//go:noescape
func VcltzS64N(r *arm.Uint64, v0 *arm.Int64, n int32)

// Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzF32N VcltzF32N
//go:noescape
func VcltzF32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzF64N VcltzF64N
//go:noescape
func VcltzF64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzdS64N VcltzdS64N
//go:noescape
func VcltzdS64N(r *arm.Uint64, v0 *arm.Int64, n int32)

// Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzdF64N VcltzdF64N
//go:noescape
func VcltzdF64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzqS8N VcltzqS8N
//go:noescape
func VcltzqS8N(r *arm.Uint8, v0 *arm.Int8, n int32)

// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzqS16N VcltzqS16N
//go:noescape
func VcltzqS16N(r *arm.Uint16, v0 *arm.Int16, n int32)

// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzqS32N VcltzqS32N
//go:noescape
func VcltzqS32N(r *arm.Uint32, v0 *arm.Int32, n int32)

// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzqS64N VcltzqS64N
//go:noescape
func VcltzqS64N(r *arm.Uint64, v0 *arm.Int64, n int32)

// Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzqF32N VcltzqF32N
//go:noescape
func VcltzqF32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzqF64N VcltzqF64N
//go:noescape
func VcltzqF64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VcltzsF32N VcltzsF32N
//go:noescape
func VcltzsF32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VclzS8N VclzS8N
//go:noescape
func VclzS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VclzS16N VclzS16N
//go:noescape
func VclzS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VclzS32N VclzS32N
//go:noescape
func VclzS32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VclzU8N VclzU8N
//go:noescape
func VclzU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VclzU16N VclzU16N
//go:noescape
func VclzU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)

// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VclzU32N VclzU32N
//go:noescape
func VclzU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)

// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VclzqS8N VclzqS8N
//go:noescape
func VclzqS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VclzqS16N VclzqS16N
//go:noescape
func VclzqS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VclzqS32N VclzqS32N
//go:noescape
func VclzqS32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VclzqU8N VclzqU8N
//go:noescape
func VclzqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VclzqU16N VclzqU16N
//go:noescape
func VclzqU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)

// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VclzqU32N VclzqU32N
//go:noescape
func VclzqU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)

// Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VcntS8N VcntS8N
//go:noescape
func VcntS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VcntU8N VcntU8N
//go:noescape
func VcntU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VcntqS8N VcntqS8N
//go:noescape
func VcntqS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VcntqU8N VcntqU8N
//go:noescape
func VcntqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// Join two smaller vectors into a single larger vector
//
//go:linkname VcombineS8N VcombineS8N
//go:noescape
func VcombineS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Join two smaller vectors into a single larger vector
//
//go:linkname VcombineS16N VcombineS16N
//go:noescape
func VcombineS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Join two smaller vectors into a single larger vector
//
//go:linkname VcombineS32N VcombineS32N
//go:noescape
func VcombineS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Join two smaller vectors into a single larger vector
//
//go:linkname VcombineS64N VcombineS64N
//go:noescape
func VcombineS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Join two smaller vectors into a single larger vector
//
//go:linkname VcombineU8N VcombineU8N
//go:noescape
func VcombineU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Join two smaller vectors into a single larger vector
//
//go:linkname VcombineU16N VcombineU16N
//go:noescape
func VcombineU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Join two smaller vectors into a single larger vector
//
//go:linkname VcombineU32N VcombineU32N
//go:noescape
func VcombineU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Join two smaller vectors into a single larger vector
//
//go:linkname VcombineU64N VcombineU64N
//go:noescape
func VcombineU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Join two smaller vectors into a single larger vector
//
//go:linkname VcombineF32N VcombineF32N
//go:noescape
func VcombineF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Join two smaller vectors into a single larger vector
//
//go:linkname VcombineF64N VcombineF64N
//go:noescape
func VcombineF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtF32S32N VcvtF32S32N
//go:noescape
func VcvtF32S32N(r *arm.Float32, v0 *arm.Int32, n int32)

// Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtF32U32N VcvtF32U32N
//go:noescape
func VcvtF32U32N(r *arm.Float32, v0 *arm.Uint32, n int32)

// Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtF64S64N VcvtF64S64N
//go:noescape
func VcvtF64S64N(r *arm.Float64, v0 *arm.Int64, n int32)

// Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtF64U64N VcvtF64U64N
//go:noescape
func VcvtF64U64N(r *arm.Float64, v0 *arm.Uint64, n int32)

// Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtS32F32N VcvtS32F32N
//go:noescape
func VcvtS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)

// Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtS64F64N VcvtS64F64N
//go:noescape
func VcvtS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)

// Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.
//
//go:linkname VcvtU32F32N VcvtU32F32N
//go:noescape
func VcvtU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.
//
//go:linkname VcvtU64F64N VcvtU64F64N
//go:noescape
func VcvtU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtaS32F32N VcvtaS32F32N
//go:noescape
func VcvtaS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)

// Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtaS64F64N VcvtaS64F64N
//go:noescape
func VcvtaS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)

// Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtaU32F32N VcvtaU32F32N
//go:noescape
func VcvtaU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtaU64F64N VcvtaU64F64N
//go:noescape
func VcvtaU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtadS64F64N VcvtadS64F64N
//go:noescape
func VcvtadS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)

// Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtadU64F64N VcvtadU64F64N
//go:noescape
func VcvtadU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtaqS32F32N VcvtaqS32F32N
//go:noescape
func VcvtaqS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)

// Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtaqS64F64N VcvtaqS64F64N
//go:noescape
func VcvtaqS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)

// Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtaqU32F32N VcvtaqU32F32N
//go:noescape
func VcvtaqU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtaqU64F64N VcvtaqU64F64N
//go:noescape
func VcvtaqU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtasS32F32N VcvtasS32F32N
//go:noescape
func VcvtasS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)

// Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtasU32F32N VcvtasU32F32N
//go:noescape
func VcvtasU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtdF64S64N VcvtdF64S64N
//go:noescape
func VcvtdF64S64N(r *arm.Float64, v0 *arm.Int64, n int32)

// Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtdF64U64N VcvtdF64U64N
//go:noescape
func VcvtdF64U64N(r *arm.Float64, v0 *arm.Uint64, n int32)

// Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtdS64F64N VcvtdS64F64N
//go:noescape
func VcvtdS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)

// Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.
//
//go:linkname VcvtdU64F64N VcvtdU64F64N
//go:noescape
func VcvtdU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtmS32F32N VcvtmS32F32N
//go:noescape
func VcvtmS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)

// Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtmS64F64N VcvtmS64F64N
//go:noescape
func VcvtmS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)

// Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtmU32F32N VcvtmU32F32N
//go:noescape
func VcvtmU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtmU64F64N VcvtmU64F64N
//go:noescape
func VcvtmU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtmdS64F64N VcvtmdS64F64N
//go:noescape
func VcvtmdS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)

// Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtmdU64F64N VcvtmdU64F64N
//go:noescape
func VcvtmdU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtmqS32F32N VcvtmqS32F32N
//go:noescape
func VcvtmqS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)

// Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtmqS64F64N VcvtmqS64F64N
//go:noescape
func VcvtmqS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)

// Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtmqU32F32N VcvtmqU32F32N
//go:noescape
func VcvtmqU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtmqU64F64N VcvtmqU64F64N
//go:noescape
func VcvtmqU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtmsS32F32N VcvtmsS32F32N
//go:noescape
func VcvtmsS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)

// Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtmsU32F32N VcvtmsU32F32N
//go:noescape
func VcvtmsU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtnS32F32N VcvtnS32F32N
//go:noescape
func VcvtnS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)

// Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtnS64F64N VcvtnS64F64N
//go:noescape
func VcvtnS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)

// Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtnU32F32N VcvtnU32F32N
//go:noescape
func VcvtnU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtnU64F64N VcvtnU64F64N
//go:noescape
func VcvtnU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtndS64F64N VcvtndS64F64N
//go:noescape
func VcvtndS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)

// Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtndU64F64N VcvtndU64F64N
//go:noescape
func VcvtndU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtnqS32F32N VcvtnqS32F32N
//go:noescape
func VcvtnqS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)

// Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtnqS64F64N VcvtnqS64F64N
//go:noescape
func VcvtnqS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)

// Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtnqU32F32N VcvtnqU32F32N
//go:noescape
func VcvtnqU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtnqU64F64N VcvtnqU64F64N
//go:noescape
func VcvtnqU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtnsS32F32N VcvtnsS32F32N
//go:noescape
func VcvtnsS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)

// Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtnsU32F32N VcvtnsU32F32N
//go:noescape
func VcvtnsU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtpS32F32N VcvtpS32F32N
//go:noescape
func VcvtpS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)

// Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtpS64F64N VcvtpS64F64N
//go:noescape
func VcvtpS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)

// Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtpU32F32N VcvtpU32F32N
//go:noescape
func VcvtpU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtpU64F64N VcvtpU64F64N
//go:noescape
func VcvtpU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtpdS64F64N VcvtpdS64F64N
//go:noescape
func VcvtpdS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)

// Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtpdU64F64N VcvtpdU64F64N
//go:noescape
func VcvtpdU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtpqS32F32N VcvtpqS32F32N
//go:noescape
func VcvtpqS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)

// Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtpqS64F64N VcvtpqS64F64N
//go:noescape
func VcvtpqS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)

// Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtpqU32F32N VcvtpqU32F32N
//go:noescape
func VcvtpqU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtpqU64F64N VcvtpqU64F64N
//go:noescape
func VcvtpqU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtpsS32F32N VcvtpsS32F32N
//go:noescape
func VcvtpsS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)

// Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtpsU32F32N VcvtpsU32F32N
//go:noescape
func VcvtpsU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtqF32S32N VcvtqF32S32N
//go:noescape
func VcvtqF32S32N(r *arm.Float32, v0 *arm.Int32, n int32)

// Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtqF32U32N VcvtqF32U32N
//go:noescape
func VcvtqF32U32N(r *arm.Float32, v0 *arm.Uint32, n int32)

// Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtqF64S64N VcvtqF64S64N
//go:noescape
func VcvtqF64S64N(r *arm.Float64, v0 *arm.Int64, n int32)

// Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtqF64U64N VcvtqF64U64N
//go:noescape
func VcvtqF64U64N(r *arm.Float64, v0 *arm.Uint64, n int32)

// Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtqS32F32N VcvtqS32F32N
//go:noescape
func VcvtqS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)

// Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtqS64F64N VcvtqS64F64N
//go:noescape
func VcvtqS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)

// Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.
//
//go:linkname VcvtqU32F32N VcvtqU32F32N
//go:noescape
func VcvtqU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.
//
//go:linkname VcvtqU64F64N VcvtqU64F64N
//go:noescape
func VcvtqU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtsF32S32N VcvtsF32S32N
//go:noescape
func VcvtsF32S32N(r *arm.Float32, v0 *arm.Int32, n int32)

// Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtsF32U32N VcvtsF32U32N
//go:noescape
func VcvtsF32U32N(r *arm.Float32, v0 *arm.Uint32, n int32)

// Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VcvtsS32F32N VcvtsS32F32N
//go:noescape
func VcvtsS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)

// Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.
//
//go:linkname VcvtsU32F32N VcvtsU32F32N
//go:noescape
func VcvtsU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Floating-point Divide (vector). This instruction divides the floating-point values in the elements in the first source SIMD&FP register, by the floating-point values in the corresponding elements in the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VdivF32N VdivF32N
//go:noescape
func VdivF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Divide (vector). This instruction divides the floating-point values in the elements in the first source SIMD&FP register, by the floating-point values in the corresponding elements in the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VdivF64N VdivF64N
//go:noescape
func VdivF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Divide (vector). This instruction divides the floating-point values in the elements in the first source SIMD&FP register, by the floating-point values in the corresponding elements in the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VdivqF32N VdivqF32N
//go:noescape
func VdivqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Divide (vector). This instruction divides the floating-point values in the elements in the first source SIMD&FP register, by the floating-point values in the corresponding elements in the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VdivqF64N VdivqF64N
//go:noescape
func VdivqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupNS8N VdupNS8N
//go:noescape
func VdupNS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupNS16N VdupNS16N
//go:noescape
func VdupNS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupNS32N VdupNS32N
//go:noescape
func VdupNS32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Insert vector element from another vector element. This instruction copies the vector element of the source SIMD&FP register to the specified vector element of the destination SIMD&FP register.
//
//go:linkname VdupNS64N VdupNS64N
//go:noescape
func VdupNS64N(r *arm.Int64, v0 *arm.Int64, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupNU8N VdupNU8N
//go:noescape
func VdupNU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupNU16N VdupNU16N
//go:noescape
func VdupNU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupNU32N VdupNU32N
//go:noescape
func VdupNU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)

// Insert vector element from another vector element. This instruction copies the vector element of the source SIMD&FP register to the specified vector element of the destination SIMD&FP register.
//
//go:linkname VdupNU64N VdupNU64N
//go:noescape
func VdupNU64N(r *arm.Uint64, v0 *arm.Uint64, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupNF32N VdupNF32N
//go:noescape
func VdupNF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Insert vector element from another vector element. This instruction copies the vector element of the source SIMD&FP register to the specified vector element of the destination SIMD&FP register.
//
//go:linkname VdupNF64N VdupNF64N
//go:noescape
func VdupNF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupqNS8N VdupqNS8N
//go:noescape
func VdupqNS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupqNS16N VdupqNS16N
//go:noescape
func VdupqNS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupqNS32N VdupqNS32N
//go:noescape
func VdupqNS32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupqNS64N VdupqNS64N
//go:noescape
func VdupqNS64N(r *arm.Int64, v0 *arm.Int64, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupqNU8N VdupqNU8N
//go:noescape
func VdupqNU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupqNU16N VdupqNU16N
//go:noescape
func VdupqNU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupqNU32N VdupqNU32N
//go:noescape
func VdupqNU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupqNU64N VdupqNU64N
//go:noescape
func VdupqNU64N(r *arm.Uint64, v0 *arm.Uint64, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupqNF32N VdupqNF32N
//go:noescape
func VdupqNF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VdupqNF64N VdupqNF64N
//go:noescape
func VdupqNF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorS8N VeorS8N
//go:noescape
func VeorS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorS16N VeorS16N
//go:noescape
func VeorS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorS32N VeorS32N
//go:noescape
func VeorS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorS64N VeorS64N
//go:noescape
func VeorS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorU8N VeorU8N
//go:noescape
func VeorU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorU16N VeorU16N
//go:noescape
func VeorU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorU32N VeorU32N
//go:noescape
func VeorU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorU64N VeorU64N
//go:noescape
func VeorU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorqS8N VeorqS8N
//go:noescape
func VeorqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorqS16N VeorqS16N
//go:noescape
func VeorqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorqS32N VeorqS32N
//go:noescape
func VeorqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorqS64N VeorqS64N
//go:noescape
func VeorqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorqU8N VeorqU8N
//go:noescape
func VeorqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorqU16N VeorqU16N
//go:noescape
func VeorqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorqU32N VeorqU32N
//go:noescape
func VeorqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.
//
//go:linkname VeorqU64N VeorqU64N
//go:noescape
func VeorqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetHighS8N VgetHighS8N
//go:noescape
func VgetHighS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetHighS16N VgetHighS16N
//go:noescape
func VgetHighS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetHighS32N VgetHighS32N
//go:noescape
func VgetHighS32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetHighS64N VgetHighS64N
//go:noescape
func VgetHighS64N(r *arm.Int64, v0 *arm.Int64, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetHighU8N VgetHighU8N
//go:noescape
func VgetHighU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetHighU16N VgetHighU16N
//go:noescape
func VgetHighU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetHighU32N VgetHighU32N
//go:noescape
func VgetHighU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetHighU64N VgetHighU64N
//go:noescape
func VgetHighU64N(r *arm.Uint64, v0 *arm.Uint64, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetHighF32N VgetHighF32N
//go:noescape
func VgetHighF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetHighF64N VgetHighF64N
//go:noescape
func VgetHighF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetLowS8N VgetLowS8N
//go:noescape
func VgetLowS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetLowS16N VgetLowS16N
//go:noescape
func VgetLowS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetLowS32N VgetLowS32N
//go:noescape
func VgetLowS32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetLowS64N VgetLowS64N
//go:noescape
func VgetLowS64N(r *arm.Int64, v0 *arm.Int64, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetLowU8N VgetLowU8N
//go:noescape
func VgetLowU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetLowU16N VgetLowU16N
//go:noescape
func VgetLowU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetLowU32N VgetLowU32N
//go:noescape
func VgetLowU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetLowU64N VgetLowU64N
//go:noescape
func VgetLowU64N(r *arm.Uint64, v0 *arm.Uint64, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetLowF32N VgetLowF32N
//go:noescape
func VgetLowF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VgetLowF64N VgetLowF64N
//go:noescape
func VgetLowF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhaddS8N VhaddS8N
//go:noescape
func VhaddS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhaddS16N VhaddS16N
//go:noescape
func VhaddS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhaddS32N VhaddS32N
//go:noescape
func VhaddS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhaddU8N VhaddU8N
//go:noescape
func VhaddU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhaddU16N VhaddU16N
//go:noescape
func VhaddU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhaddU32N VhaddU32N
//go:noescape
func VhaddU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhaddqS8N VhaddqS8N
//go:noescape
func VhaddqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhaddqS16N VhaddqS16N
//go:noescape
func VhaddqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhaddqS32N VhaddqS32N
//go:noescape
func VhaddqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhaddqU8N VhaddqU8N
//go:noescape
func VhaddqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhaddqU16N VhaddqU16N
//go:noescape
func VhaddqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhaddqU32N VhaddqU32N
//go:noescape
func VhaddqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhsubS8N VhsubS8N
//go:noescape
func VhsubS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhsubS16N VhsubS16N
//go:noescape
func VhsubS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhsubS32N VhsubS32N
//go:noescape
func VhsubS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhsubU8N VhsubU8N
//go:noescape
func VhsubU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhsubU16N VhsubU16N
//go:noescape
func VhsubU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhsubU32N VhsubU32N
//go:noescape
func VhsubU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhsubqS8N VhsubqS8N
//go:noescape
func VhsubqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhsubqS16N VhsubqS16N
//go:noescape
func VhsubqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhsubqS32N VhsubqS32N
//go:noescape
func VhsubqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhsubqU8N VhsubqU8N
//go:noescape
func VhsubqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhsubqU16N VhsubqU16N
//go:noescape
func VhsubqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VhsubqU32N VhsubqU32N
//go:noescape
func VhsubqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxS8N VmaxS8N
//go:noescape
func VmaxS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxS16N VmaxS16N
//go:noescape
func VmaxS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxS32N VmaxS32N
//go:noescape
func VmaxS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxU8N VmaxU8N
//go:noescape
func VmaxU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxU16N VmaxU16N
//go:noescape
func VmaxU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxU32N VmaxU32N
//go:noescape
func VmaxU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Floating-point Maximum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the larger of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxF32N VmaxF32N
//go:noescape
func VmaxF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Maximum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the larger of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxF64N VmaxF64N
//go:noescape
func VmaxF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Maximum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the larger of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxnmF32N VmaxnmF32N
//go:noescape
func VmaxnmF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Maximum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the larger of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxnmF64N VmaxnmF64N
//go:noescape
func VmaxnmF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Maximum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the larger of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxnmqF32N VmaxnmqF32N
//go:noescape
func VmaxnmqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Maximum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the larger of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxnmqF64N VmaxnmqF64N
//go:noescape
func VmaxnmqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VmaxnmvF32N VmaxnmvF32N
//go:noescape
func VmaxnmvF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Maximum Number across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VmaxnmvqF32N VmaxnmvqF32N
//go:noescape
func VmaxnmvqF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VmaxnmvqF64N VmaxnmvqF64N
//go:noescape
func VmaxnmvqF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxqS8N VmaxqS8N
//go:noescape
func VmaxqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxqS16N VmaxqS16N
//go:noescape
func VmaxqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxqS32N VmaxqS32N
//go:noescape
func VmaxqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxqU8N VmaxqU8N
//go:noescape
func VmaxqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxqU16N VmaxqU16N
//go:noescape
func VmaxqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxqU32N VmaxqU32N
//go:noescape
func VmaxqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Floating-point Maximum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the larger of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxqF32N VmaxqF32N
//go:noescape
func VmaxqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Maximum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the larger of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxqF64N VmaxqF64N
//go:noescape
func VmaxqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VmaxvS8N VmaxvS8N
//go:noescape
func VmaxvS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VmaxvS16N VmaxvS16N
//go:noescape
func VmaxvS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxvS32N VmaxvS32N
//go:noescape
func VmaxvS32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VmaxvU8N VmaxvU8N
//go:noescape
func VmaxvU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VmaxvU16N VmaxvU16N
//go:noescape
func VmaxvU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)

// Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmaxvU32N VmaxvU32N
//go:noescape
func VmaxvU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)

// Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VmaxvF32N VmaxvF32N
//go:noescape
func VmaxvF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VmaxvqS8N VmaxvqS8N
//go:noescape
func VmaxvqS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VmaxvqS16N VmaxvqS16N
//go:noescape
func VmaxvqS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VmaxvqS32N VmaxvqS32N
//go:noescape
func VmaxvqS32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VmaxvqU8N VmaxvqU8N
//go:noescape
func VmaxvqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VmaxvqU16N VmaxvqU16N
//go:noescape
func VmaxvqU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)

// Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VmaxvqU32N VmaxvqU32N
//go:noescape
func VmaxvqU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)

// Floating-point Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VmaxvqF32N VmaxvqF32N
//go:noescape
func VmaxvqF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VmaxvqF64N VmaxvqF64N
//go:noescape
func VmaxvqF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminS8N VminS8N
//go:noescape
func VminS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminS16N VminS16N
//go:noescape
func VminS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminS32N VminS32N
//go:noescape
func VminS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminU8N VminU8N
//go:noescape
func VminU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminU16N VminU16N
//go:noescape
func VminU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminU32N VminU32N
//go:noescape
func VminU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Floating-point minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminF32N VminF32N
//go:noescape
func VminF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminF64N VminF64N
//go:noescape
func VminF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Minimum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the smaller of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminnmF32N VminnmF32N
//go:noescape
func VminnmF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Minimum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the smaller of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminnmF64N VminnmF64N
//go:noescape
func VminnmF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Minimum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the smaller of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminnmqF32N VminnmqF32N
//go:noescape
func VminnmqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Minimum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the smaller of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminnmqF64N VminnmqF64N
//go:noescape
func VminnmqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VminnmvF32N VminnmvF32N
//go:noescape
func VminnmvF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Minimum Number across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VminnmvqF32N VminnmvqF32N
//go:noescape
func VminnmvqF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VminnmvqF64N VminnmvqF64N
//go:noescape
func VminnmvqF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminqS8N VminqS8N
//go:noescape
func VminqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminqS16N VminqS16N
//go:noescape
func VminqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminqS32N VminqS32N
//go:noescape
func VminqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminqU8N VminqU8N
//go:noescape
func VminqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminqU16N VminqU16N
//go:noescape
func VminqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminqU32N VminqU32N
//go:noescape
func VminqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Floating-point minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminqF32N VminqF32N
//go:noescape
func VminqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminqF64N VminqF64N
//go:noescape
func VminqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VminvS8N VminvS8N
//go:noescape
func VminvS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VminvS16N VminvS16N
//go:noescape
func VminvS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminvS32N VminvS32N
//go:noescape
func VminvS32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VminvU8N VminvU8N
//go:noescape
func VminvU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VminvU16N VminvU16N
//go:noescape
func VminvU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)

// Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VminvU32N VminvU32N
//go:noescape
func VminvU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)

// Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VminvF32N VminvF32N
//go:noescape
func VminvF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VminvqS8N VminvqS8N
//go:noescape
func VminvqS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VminvqS16N VminvqS16N
//go:noescape
func VminvqS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VminvqS32N VminvqS32N
//go:noescape
func VminvqS32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VminvqU8N VminvqU8N
//go:noescape
func VminvqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VminvqU16N VminvqU16N
//go:noescape
func VminvqU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)

// Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VminvqU32N VminvqU32N
//go:noescape
func VminvqU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)

// Floating-point Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VminvqF32N VminvqF32N
//go:noescape
func VminvqF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VminvqF64N VminvqF64N
//go:noescape
func VminvqF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovNS8N VmovNS8N
//go:noescape
func VmovNS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovNS16N VmovNS16N
//go:noescape
func VmovNS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovNS32N VmovNS32N
//go:noescape
func VmovNS32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovNS64N VmovNS64N
//go:noescape
func VmovNS64N(r *arm.Int64, v0 *arm.Int64, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovNU8N VmovNU8N
//go:noescape
func VmovNU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovNU16N VmovNU16N
//go:noescape
func VmovNU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovNU32N VmovNU32N
//go:noescape
func VmovNU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovNU64N VmovNU64N
//go:noescape
func VmovNU64N(r *arm.Uint64, v0 *arm.Uint64, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovNF32N VmovNF32N
//go:noescape
func VmovNF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovNF64N VmovNF64N
//go:noescape
func VmovNF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovqNS8N VmovqNS8N
//go:noescape
func VmovqNS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovqNS16N VmovqNS16N
//go:noescape
func VmovqNS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovqNS32N VmovqNS32N
//go:noescape
func VmovqNS32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovqNS64N VmovqNS64N
//go:noescape
func VmovqNS64N(r *arm.Int64, v0 *arm.Int64, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovqNU8N VmovqNU8N
//go:noescape
func VmovqNU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovqNU16N VmovqNU16N
//go:noescape
func VmovqNU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovqNU32N VmovqNU32N
//go:noescape
func VmovqNU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovqNU64N VmovqNU64N
//go:noescape
func VmovqNU64N(r *arm.Uint64, v0 *arm.Uint64, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovqNF32N VmovqNF32N
//go:noescape
func VmovqNF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.
//
//go:linkname VmovqNF64N VmovqNF64N
//go:noescape
func VmovqNF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulS8N VmulS8N
//go:noescape
func VmulS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulS16N VmulS16N
//go:noescape
func VmulS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulS32N VmulS32N
//go:noescape
func VmulS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulU8N VmulU8N
//go:noescape
func VmulU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulU16N VmulU16N
//go:noescape
func VmulU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulU32N VmulU32N
//go:noescape
func VmulU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulF32N VmulF32N
//go:noescape
func VmulF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulF64N VmulF64N
//go:noescape
func VmulF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulqS8N VmulqS8N
//go:noescape
func VmulqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulqS16N VmulqS16N
//go:noescape
func VmulqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulqS32N VmulqS32N
//go:noescape
func VmulqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulqU8N VmulqU8N
//go:noescape
func VmulqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulqU16N VmulqU16N
//go:noescape
func VmulqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulqU32N VmulqU32N
//go:noescape
func VmulqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulqF32N VmulqF32N
//go:noescape
func VmulqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulqF64N VmulqF64N
//go:noescape
func VmulqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulxF32N VmulxF32N
//go:noescape
func VmulxF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulxF64N VmulxF64N
//go:noescape
func VmulxF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulxdF64N VmulxdF64N
//go:noescape
func VmulxdF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulxqF32N VmulxqF32N
//go:noescape
func VmulxqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulxqF64N VmulxqF64N
//go:noescape
func VmulxqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmulxsF32N VmulxsF32N
//go:noescape
func VmulxsF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmvnS8N VmvnS8N
//go:noescape
func VmvnS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmvnS16N VmvnS16N
//go:noescape
func VmvnS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmvnS32N VmvnS32N
//go:noescape
func VmvnS32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmvnU8N VmvnU8N
//go:noescape
func VmvnU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmvnU16N VmvnU16N
//go:noescape
func VmvnU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)

// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmvnU32N VmvnU32N
//go:noescape
func VmvnU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)

// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmvnqS8N VmvnqS8N
//go:noescape
func VmvnqS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmvnqS16N VmvnqS16N
//go:noescape
func VmvnqS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmvnqS32N VmvnqS32N
//go:noescape
func VmvnqS32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmvnqU8N VmvnqU8N
//go:noescape
func VmvnqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmvnqU16N VmvnqU16N
//go:noescape
func VmvnqU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)

// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VmvnqU32N VmvnqU32N
//go:noescape
func VmvnqU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)

// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VnegS8N VnegS8N
//go:noescape
func VnegS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VnegS16N VnegS16N
//go:noescape
func VnegS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VnegS32N VnegS32N
//go:noescape
func VnegS32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VnegS64N VnegS64N
//go:noescape
func VnegS64N(r *arm.Int64, v0 *arm.Int64, n int32)

// Floating-point Negate (vector). This instruction negates the value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VnegF32N VnegF32N
//go:noescape
func VnegF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Negate (vector). This instruction negates the value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VnegF64N VnegF64N
//go:noescape
func VnegF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VnegdS64N VnegdS64N
//go:noescape
func VnegdS64N(r *arm.Int64, v0 *arm.Int64, n int32)

// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VnegqS8N VnegqS8N
//go:noescape
func VnegqS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VnegqS16N VnegqS16N
//go:noescape
func VnegqS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VnegqS32N VnegqS32N
//go:noescape
func VnegqS32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VnegqS64N VnegqS64N
//go:noescape
func VnegqS64N(r *arm.Int64, v0 *arm.Int64, n int32)

// Floating-point Negate (vector). This instruction negates the value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VnegqF32N VnegqF32N
//go:noescape
func VnegqF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Negate (vector). This instruction negates the value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VnegqF64N VnegqF64N
//go:noescape
func VnegqF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornS8N VornS8N
//go:noescape
func VornS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornS16N VornS16N
//go:noescape
func VornS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornS32N VornS32N
//go:noescape
func VornS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornS64N VornS64N
//go:noescape
func VornS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornU8N VornU8N
//go:noescape
func VornU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornU16N VornU16N
//go:noescape
func VornU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornU32N VornU32N
//go:noescape
func VornU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornU64N VornU64N
//go:noescape
func VornU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornqS8N VornqS8N
//go:noescape
func VornqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornqS16N VornqS16N
//go:noescape
func VornqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornqS32N VornqS32N
//go:noescape
func VornqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornqS64N VornqS64N
//go:noescape
func VornqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornqU8N VornqU8N
//go:noescape
func VornqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornqU16N VornqU16N
//go:noescape
func VornqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornqU32N VornqU32N
//go:noescape
func VornqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VornqU64N VornqU64N
//go:noescape
func VornqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrS8N VorrS8N
//go:noescape
func VorrS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrS16N VorrS16N
//go:noescape
func VorrS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrS32N VorrS32N
//go:noescape
func VorrS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrS64N VorrS64N
//go:noescape
func VorrS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrU8N VorrU8N
//go:noescape
func VorrU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrU16N VorrU16N
//go:noescape
func VorrU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrU32N VorrU32N
//go:noescape
func VorrU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrU64N VorrU64N
//go:noescape
func VorrU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrqS8N VorrqS8N
//go:noescape
func VorrqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrqS16N VorrqS16N
//go:noescape
func VorrqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrqS32N VorrqS32N
//go:noescape
func VorrqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrqS64N VorrqS64N
//go:noescape
func VorrqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrqU8N VorrqU8N
//go:noescape
func VorrqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrqU16N VorrqU16N
//go:noescape
func VorrqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrqU32N VorrqU32N
//go:noescape
func VorrqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.
//
//go:linkname VorrqU64N VorrqU64N
//go:noescape
func VorrqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpaddS8N VpaddS8N
//go:noescape
func VpaddS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpaddS16N VpaddS16N
//go:noescape
func VpaddS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpaddS32N VpaddS32N
//go:noescape
func VpaddS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpaddU8N VpaddU8N
//go:noescape
func VpaddU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpaddU16N VpaddU16N
//go:noescape
func VpaddU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpaddU32N VpaddU32N
//go:noescape
func VpaddU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpaddF32N VpaddF32N
//go:noescape
func VpaddF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpadddS64N VpadddS64N
//go:noescape
func VpadddS64N(r *arm.Int64, v0 *arm.Int64, n int32)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpadddU64N VpadddU64N
//go:noescape
func VpadddU64N(r *arm.Uint64, v0 *arm.Uint64, n int32)

// Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpadddF64N VpadddF64N
//go:noescape
func VpadddF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpaddqS8N VpaddqS8N
//go:noescape
func VpaddqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpaddqS16N VpaddqS16N
//go:noescape
func VpaddqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpaddqS32N VpaddqS32N
//go:noescape
func VpaddqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpaddqS64N VpaddqS64N
//go:noescape
func VpaddqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpaddqU8N VpaddqU8N
//go:noescape
func VpaddqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpaddqU16N VpaddqU16N
//go:noescape
func VpaddqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpaddqU32N VpaddqU32N
//go:noescape
func VpaddqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpaddqU64N VpaddqU64N
//go:noescape
func VpaddqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpaddqF32N VpaddqF32N
//go:noescape
func VpaddqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpaddqF64N VpaddqF64N
//go:noescape
func VpaddqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpaddsF32N VpaddsF32N
//go:noescape
func VpaddsF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpmaxS8N VpmaxS8N
//go:noescape
func VpmaxS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpmaxS16N VpmaxS16N
//go:noescape
func VpmaxS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpmaxS32N VpmaxS32N
//go:noescape
func VpmaxS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpmaxU8N VpmaxU8N
//go:noescape
func VpmaxU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpmaxU16N VpmaxU16N
//go:noescape
func VpmaxU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpmaxU32N VpmaxU32N
//go:noescape
func VpmaxU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpmaxF32N VpmaxF32N
//go:noescape
func VpmaxF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpmaxnmF32N VpmaxnmF32N
//go:noescape
func VpmaxnmF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpmaxnmqF32N VpmaxnmqF32N
//go:noescape
func VpmaxnmqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpmaxnmqF64N VpmaxnmqF64N
//go:noescape
func VpmaxnmqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpmaxnmqdF64N VpmaxnmqdF64N
//go:noescape
func VpmaxnmqdF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpmaxnmsF32N VpmaxnmsF32N
//go:noescape
func VpmaxnmsF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpmaxqS8N VpmaxqS8N
//go:noescape
func VpmaxqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpmaxqS16N VpmaxqS16N
//go:noescape
func VpmaxqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpmaxqS32N VpmaxqS32N
//go:noescape
func VpmaxqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpmaxqU8N VpmaxqU8N
//go:noescape
func VpmaxqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpmaxqU16N VpmaxqU16N
//go:noescape
func VpmaxqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpmaxqU32N VpmaxqU32N
//go:noescape
func VpmaxqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpmaxqF32N VpmaxqF32N
//go:noescape
func VpmaxqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpmaxqF64N VpmaxqF64N
//go:noescape
func VpmaxqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpmaxqdF64N VpmaxqdF64N
//go:noescape
func VpmaxqdF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpmaxsF32N VpmaxsF32N
//go:noescape
func VpmaxsF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpminS8N VpminS8N
//go:noescape
func VpminS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpminS16N VpminS16N
//go:noescape
func VpminS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpminS32N VpminS32N
//go:noescape
func VpminS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpminU8N VpminU8N
//go:noescape
func VpminU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpminU16N VpminU16N
//go:noescape
func VpminU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpminU32N VpminU32N
//go:noescape
func VpminU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpminF32N VpminF32N
//go:noescape
func VpminF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpminnmF32N VpminnmF32N
//go:noescape
func VpminnmF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpminnmqF32N VpminnmqF32N
//go:noescape
func VpminnmqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpminnmqF64N VpminnmqF64N
//go:noescape
func VpminnmqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpminnmqdF64N VpminnmqdF64N
//go:noescape
func VpminnmqdF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpminnmsF32N VpminnmsF32N
//go:noescape
func VpminnmsF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpminqS8N VpminqS8N
//go:noescape
func VpminqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpminqS16N VpminqS16N
//go:noescape
func VpminqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpminqS32N VpminqS32N
//go:noescape
func VpminqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpminqU8N VpminqU8N
//go:noescape
func VpminqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpminqU16N VpminqU16N
//go:noescape
func VpminqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VpminqU32N VpminqU32N
//go:noescape
func VpminqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpminqF32N VpminqF32N
//go:noescape
func VpminqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpminqF64N VpminqF64N
//go:noescape
func VpminqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpminqdF64N VpminqdF64N
//go:noescape
func VpminqdF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
//
//go:linkname VpminsF32N VpminsF32N
//go:noescape
func VpminsF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqabsS8N VqabsS8N
//go:noescape
func VqabsS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqabsS16N VqabsS16N
//go:noescape
func VqabsS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqabsS32N VqabsS32N
//go:noescape
func VqabsS32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqabsS64N VqabsS64N
//go:noescape
func VqabsS64N(r *arm.Int64, v0 *arm.Int64, n int32)

// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqabsbS8N VqabsbS8N
//go:noescape
func VqabsbS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqabsdS64N VqabsdS64N
//go:noescape
func VqabsdS64N(r *arm.Int64, v0 *arm.Int64, n int32)

// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqabshS16N VqabshS16N
//go:noescape
func VqabshS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqabsqS8N VqabsqS8N
//go:noescape
func VqabsqS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqabsqS16N VqabsqS16N
//go:noescape
func VqabsqS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqabsqS32N VqabsqS32N
//go:noescape
func VqabsqS32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqabsqS64N VqabsqS64N
//go:noescape
func VqabsqS64N(r *arm.Int64, v0 *arm.Int64, n int32)

// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqabssS32N VqabssS32N
//go:noescape
func VqabssS32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddS8N VqaddS8N
//go:noescape
func VqaddS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddS16N VqaddS16N
//go:noescape
func VqaddS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddS32N VqaddS32N
//go:noescape
func VqaddS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddS64N VqaddS64N
//go:noescape
func VqaddS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddU8N VqaddU8N
//go:noescape
func VqaddU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddU16N VqaddU16N
//go:noescape
func VqaddU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddU32N VqaddU32N
//go:noescape
func VqaddU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddU64N VqaddU64N
//go:noescape
func VqaddU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddbS8N VqaddbS8N
//go:noescape
func VqaddbS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddbU8N VqaddbU8N
//go:noescape
func VqaddbU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqadddS64N VqadddS64N
//go:noescape
func VqadddS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqadddU64N VqadddU64N
//go:noescape
func VqadddU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddhS16N VqaddhS16N
//go:noescape
func VqaddhS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddhU16N VqaddhU16N
//go:noescape
func VqaddhU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddqS8N VqaddqS8N
//go:noescape
func VqaddqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddqS16N VqaddqS16N
//go:noescape
func VqaddqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddqS32N VqaddqS32N
//go:noescape
func VqaddqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddqS64N VqaddqS64N
//go:noescape
func VqaddqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddqU8N VqaddqU8N
//go:noescape
func VqaddqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddqU16N VqaddqU16N
//go:noescape
func VqaddqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddqU32N VqaddqU32N
//go:noescape
func VqaddqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddqU64N VqaddqU64N
//go:noescape
func VqaddqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddsS32N VqaddsS32N
//go:noescape
func VqaddsS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqaddsU32N VqaddsU32N
//go:noescape
func VqaddsU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqdmulhS16N VqdmulhS16N
//go:noescape
func VqdmulhS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqdmulhS32N VqdmulhS32N
//go:noescape
func VqdmulhS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqdmulhhS16N VqdmulhhS16N
//go:noescape
func VqdmulhhS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqdmulhqS16N VqdmulhqS16N
//go:noescape
func VqdmulhqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqdmulhqS32N VqdmulhqS32N
//go:noescape
func VqdmulhqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqdmulhsS32N VqdmulhsS32N
//go:noescape
func VqdmulhsS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqnegS8N VqnegS8N
//go:noescape
func VqnegS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqnegS16N VqnegS16N
//go:noescape
func VqnegS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqnegS32N VqnegS32N
//go:noescape
func VqnegS32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqnegS64N VqnegS64N
//go:noescape
func VqnegS64N(r *arm.Int64, v0 *arm.Int64, n int32)

// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqnegbS8N VqnegbS8N
//go:noescape
func VqnegbS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqnegdS64N VqnegdS64N
//go:noescape
func VqnegdS64N(r *arm.Int64, v0 *arm.Int64, n int32)

// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqneghS16N VqneghS16N
//go:noescape
func VqneghS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqnegqS8N VqnegqS8N
//go:noescape
func VqnegqS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqnegqS16N VqnegqS16N
//go:noescape
func VqnegqS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqnegqS32N VqnegqS32N
//go:noescape
func VqnegqS32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqnegqS64N VqnegqS64N
//go:noescape
func VqnegqS64N(r *arm.Int64, v0 *arm.Int64, n int32)

// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.
//
//go:linkname VqnegsS32N VqnegsS32N
//go:noescape
func VqnegsS32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrdmulhS16N VqrdmulhS16N
//go:noescape
func VqrdmulhS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrdmulhS32N VqrdmulhS32N
//go:noescape
func VqrdmulhS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrdmulhhS16N VqrdmulhhS16N
//go:noescape
func VqrdmulhhS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrdmulhqS16N VqrdmulhqS16N
//go:noescape
func VqrdmulhqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrdmulhqS32N VqrdmulhqS32N
//go:noescape
func VqrdmulhqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrdmulhsS32N VqrdmulhsS32N
//go:noescape
func VqrdmulhsS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlS8N VqrshlS8N
//go:noescape
func VqrshlS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlS16N VqrshlS16N
//go:noescape
func VqrshlS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlS32N VqrshlS32N
//go:noescape
func VqrshlS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlS64N VqrshlS64N
//go:noescape
func VqrshlS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlbS8N VqrshlbS8N
//go:noescape
func VqrshlbS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshldS64N VqrshldS64N
//go:noescape
func VqrshldS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlhS16N VqrshlhS16N
//go:noescape
func VqrshlhS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlqS8N VqrshlqS8N
//go:noescape
func VqrshlqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlqS16N VqrshlqS16N
//go:noescape
func VqrshlqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlqS32N VqrshlqS32N
//go:noescape
func VqrshlqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlqS64N VqrshlqS64N
//go:noescape
func VqrshlqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqrshlsS32N VqrshlsS32N
//go:noescape
func VqrshlsS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlS8N VqshlS8N
//go:noescape
func VqshlS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlS16N VqshlS16N
//go:noescape
func VqshlS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlS32N VqshlS32N
//go:noescape
func VqshlS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlS64N VqshlS64N
//go:noescape
func VqshlS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlbS8N VqshlbS8N
//go:noescape
func VqshlbS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshldS64N VqshldS64N
//go:noescape
func VqshldS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlhS16N VqshlhS16N
//go:noescape
func VqshlhS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlqS8N VqshlqS8N
//go:noescape
func VqshlqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlqS16N VqshlqS16N
//go:noescape
func VqshlqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlqS32N VqshlqS32N
//go:noescape
func VqshlqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlqS64N VqshlqS64N
//go:noescape
func VqshlqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqshlsS32N VqshlsS32N
//go:noescape
func VqshlsS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubS8N VqsubS8N
//go:noescape
func VqsubS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubS16N VqsubS16N
//go:noescape
func VqsubS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubS32N VqsubS32N
//go:noescape
func VqsubS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubS64N VqsubS64N
//go:noescape
func VqsubS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubU8N VqsubU8N
//go:noescape
func VqsubU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubU16N VqsubU16N
//go:noescape
func VqsubU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubU32N VqsubU32N
//go:noescape
func VqsubU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubU64N VqsubU64N
//go:noescape
func VqsubU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubbS8N VqsubbS8N
//go:noescape
func VqsubbS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubbU8N VqsubbU8N
//go:noescape
func VqsubbU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubdS64N VqsubdS64N
//go:noescape
func VqsubdS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubdU64N VqsubdU64N
//go:noescape
func VqsubdU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubhS16N VqsubhS16N
//go:noescape
func VqsubhS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubhU16N VqsubhU16N
//go:noescape
func VqsubhU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubqS8N VqsubqS8N
//go:noescape
func VqsubqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubqS16N VqsubqS16N
//go:noescape
func VqsubqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubqS32N VqsubqS32N
//go:noescape
func VqsubqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubqS64N VqsubqS64N
//go:noescape
func VqsubqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubqU8N VqsubqU8N
//go:noescape
func VqsubqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubqU16N VqsubqU16N
//go:noescape
func VqsubqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubqU32N VqsubqU32N
//go:noescape
func VqsubqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubqU64N VqsubqU64N
//go:noescape
func VqsubqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubsS32N VqsubsS32N
//go:noescape
func VqsubsS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VqsubsU32N VqsubsU32N
//go:noescape
func VqsubsU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vqtbl1QU8N Vqtbl1QU8N
//go:noescape
func Vqtbl1QU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Rotate and Exclusive OR rotates each 64-bit element of the 128-bit vector in a source SIMD&FP register left by 1, performs a bitwise exclusive OR of the resulting 128-bit vector and the vector in another source SIMD&FP register, and writes the result to the destination SIMD&FP register.
//
//go:linkname Vrax1QU64N Vrax1QU64N
//go:noescape
func Vrax1QU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrbitS8N VrbitS8N
//go:noescape
func VrbitS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrbitU8N VrbitU8N
//go:noescape
func VrbitU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrbitqS8N VrbitqS8N
//go:noescape
func VrbitqS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrbitqU8N VrbitqU8N
//go:noescape
func VrbitqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// Unsigned Reciprocal Estimate. This instruction reads each vector element from the source SIMD&FP register, calculates an approximate inverse for the unsigned integer value, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpeU32N VrecpeU32N
//go:noescape
func VrecpeU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)

// Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpeF32N VrecpeF32N
//go:noescape
func VrecpeF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpeF64N VrecpeF64N
//go:noescape
func VrecpeF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpedF64N VrecpedF64N
//go:noescape
func VrecpedF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Unsigned Reciprocal Estimate. This instruction reads each vector element from the source SIMD&FP register, calculates an approximate inverse for the unsigned integer value, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpeqU32N VrecpeqU32N
//go:noescape
func VrecpeqU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)

// Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpeqF32N VrecpeqF32N
//go:noescape
func VrecpeqF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpeqF64N VrecpeqF64N
//go:noescape
func VrecpeqF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpesF32N VrecpesF32N
//go:noescape
func VrecpesF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpsF32N VrecpsF32N
//go:noescape
func VrecpsF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpsF64N VrecpsF64N
//go:noescape
func VrecpsF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpsdF64N VrecpsdF64N
//go:noescape
func VrecpsdF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpsqF32N VrecpsqF32N
//go:noescape
func VrecpsqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpsqF64N VrecpsqF64N
//go:noescape
func VrecpsqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpssF32N VrecpssF32N
//go:noescape
func VrecpssF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Reciprocal exponent (scalar). This instruction finds an approximate reciprocal exponent for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpxdF64N VrecpxdF64N
//go:noescape
func VrecpxdF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Reciprocal exponent (scalar). This instruction finds an approximate reciprocal exponent for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrecpxsF32N VrecpxsF32N
//go:noescape
func VrecpxsF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretF32S32N VreinterpretF32S32N
//go:noescape
func VreinterpretF32S32N(r *arm.Float32, v0 *arm.Int32, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretF32U32N VreinterpretF32U32N
//go:noescape
func VreinterpretF32U32N(r *arm.Float32, v0 *arm.Uint32, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretF64S64N VreinterpretF64S64N
//go:noescape
func VreinterpretF64S64N(r *arm.Float64, v0 *arm.Int64, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretF64U64N VreinterpretF64U64N
//go:noescape
func VreinterpretF64U64N(r *arm.Float64, v0 *arm.Uint64, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS16U16N VreinterpretS16U16N
//go:noescape
func VreinterpretS16U16N(r *arm.Int16, v0 *arm.Uint16, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS32U32N VreinterpretS32U32N
//go:noescape
func VreinterpretS32U32N(r *arm.Int32, v0 *arm.Uint32, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS32F32N VreinterpretS32F32N
//go:noescape
func VreinterpretS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS64U64N VreinterpretS64U64N
//go:noescape
func VreinterpretS64U64N(r *arm.Int64, v0 *arm.Uint64, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS64F64N VreinterpretS64F64N
//go:noescape
func VreinterpretS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretS8U8N VreinterpretS8U8N
//go:noescape
func VreinterpretS8U8N(r *arm.Int8, v0 *arm.Uint8, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU16S16N VreinterpretU16S16N
//go:noescape
func VreinterpretU16S16N(r *arm.Uint16, v0 *arm.Int16, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU32S32N VreinterpretU32S32N
//go:noescape
func VreinterpretU32S32N(r *arm.Uint32, v0 *arm.Int32, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU32F32N VreinterpretU32F32N
//go:noescape
func VreinterpretU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU64S64N VreinterpretU64S64N
//go:noescape
func VreinterpretU64S64N(r *arm.Uint64, v0 *arm.Int64, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU64F64N VreinterpretU64F64N
//go:noescape
func VreinterpretU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretU8S8N VreinterpretU8S8N
//go:noescape
func VreinterpretU8S8N(r *arm.Uint8, v0 *arm.Int8, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqF32S32N VreinterpretqF32S32N
//go:noescape
func VreinterpretqF32S32N(r *arm.Float32, v0 *arm.Int32, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqF32U32N VreinterpretqF32U32N
//go:noescape
func VreinterpretqF32U32N(r *arm.Float32, v0 *arm.Uint32, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqF64S64N VreinterpretqF64S64N
//go:noescape
func VreinterpretqF64S64N(r *arm.Float64, v0 *arm.Int64, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqF64U64N VreinterpretqF64U64N
//go:noescape
func VreinterpretqF64U64N(r *arm.Float64, v0 *arm.Uint64, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS16U16N VreinterpretqS16U16N
//go:noescape
func VreinterpretqS16U16N(r *arm.Int16, v0 *arm.Uint16, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS32U32N VreinterpretqS32U32N
//go:noescape
func VreinterpretqS32U32N(r *arm.Int32, v0 *arm.Uint32, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS32F32N VreinterpretqS32F32N
//go:noescape
func VreinterpretqS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS64U64N VreinterpretqS64U64N
//go:noescape
func VreinterpretqS64U64N(r *arm.Int64, v0 *arm.Uint64, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS64F64N VreinterpretqS64F64N
//go:noescape
func VreinterpretqS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqS8U8N VreinterpretqS8U8N
//go:noescape
func VreinterpretqS8U8N(r *arm.Int8, v0 *arm.Uint8, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU16S16N VreinterpretqU16S16N
//go:noescape
func VreinterpretqU16S16N(r *arm.Uint16, v0 *arm.Int16, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU32S32N VreinterpretqU32S32N
//go:noescape
func VreinterpretqU32S32N(r *arm.Uint32, v0 *arm.Int32, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU32F32N VreinterpretqU32F32N
//go:noescape
func VreinterpretqU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU64S64N VreinterpretqU64S64N
//go:noescape
func VreinterpretqU64S64N(r *arm.Uint64, v0 *arm.Int64, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU64F64N VreinterpretqU64F64N
//go:noescape
func VreinterpretqU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)

// Vector reinterpret cast operation
//
//go:linkname VreinterpretqU8S8N VreinterpretqU8S8N
//go:noescape
func VreinterpretqU8S8N(r *arm.Uint8, v0 *arm.Int8, n int32)

// Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev16S8N Vrev16S8N
//go:noescape
func Vrev16S8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev16U8N Vrev16U8N
//go:noescape
func Vrev16U8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev16QS8N Vrev16QS8N
//go:noescape
func Vrev16QS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev16QU8N Vrev16QU8N
//go:noescape
func Vrev16QU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev32S8N Vrev32S8N
//go:noescape
func Vrev32S8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev32S16N Vrev32S16N
//go:noescape
func Vrev32S16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev32U8N Vrev32U8N
//go:noescape
func Vrev32U8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev32U16N Vrev32U16N
//go:noescape
func Vrev32U16N(r *arm.Uint16, v0 *arm.Uint16, n int32)

// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev32QS8N Vrev32QS8N
//go:noescape
func Vrev32QS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev32QS16N Vrev32QS16N
//go:noescape
func Vrev32QS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev32QU8N Vrev32QU8N
//go:noescape
func Vrev32QU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev32QU16N Vrev32QU16N
//go:noescape
func Vrev32QU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64S8N Vrev64S8N
//go:noescape
func Vrev64S8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64S16N Vrev64S16N
//go:noescape
func Vrev64S16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64S32N Vrev64S32N
//go:noescape
func Vrev64S32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64U8N Vrev64U8N
//go:noescape
func Vrev64U8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64U16N Vrev64U16N
//go:noescape
func Vrev64U16N(r *arm.Uint16, v0 *arm.Uint16, n int32)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64U32N Vrev64U32N
//go:noescape
func Vrev64U32N(r *arm.Uint32, v0 *arm.Uint32, n int32)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64F32N Vrev64F32N
//go:noescape
func Vrev64F32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64QS8N Vrev64QS8N
//go:noescape
func Vrev64QS8N(r *arm.Int8, v0 *arm.Int8, n int32)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64QS16N Vrev64QS16N
//go:noescape
func Vrev64QS16N(r *arm.Int16, v0 *arm.Int16, n int32)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64QS32N Vrev64QS32N
//go:noescape
func Vrev64QS32N(r *arm.Int32, v0 *arm.Int32, n int32)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64QU8N Vrev64QU8N
//go:noescape
func Vrev64QU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64QU16N Vrev64QU16N
//go:noescape
func Vrev64QU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64QU32N Vrev64QU32N
//go:noescape
func Vrev64QU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)

// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vrev64QF32N Vrev64QF32N
//go:noescape
func Vrev64QF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrhaddS8N VrhaddS8N
//go:noescape
func VrhaddS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrhaddS16N VrhaddS16N
//go:noescape
func VrhaddS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrhaddS32N VrhaddS32N
//go:noescape
func VrhaddS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrhaddU8N VrhaddU8N
//go:noescape
func VrhaddU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrhaddU16N VrhaddU16N
//go:noescape
func VrhaddU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrhaddU32N VrhaddU32N
//go:noescape
func VrhaddU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrhaddqS8N VrhaddqS8N
//go:noescape
func VrhaddqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrhaddqS16N VrhaddqS16N
//go:noescape
func VrhaddqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrhaddqS32N VrhaddqS32N
//go:noescape
func VrhaddqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrhaddqU8N VrhaddqU8N
//go:noescape
func VrhaddqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrhaddqU16N VrhaddqU16N
//go:noescape
func VrhaddqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrhaddqU32N VrhaddqU32N
//go:noescape
func VrhaddqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Floating-point Round to Integral, toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndF32N VrndF32N
//go:noescape
func VrndF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Round to Integral, toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndF64N VrndF64N
//go:noescape
func VrndF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Round to 32-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd32XF32N Vrnd32XF32N
//go:noescape
func Vrnd32XF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Round to 32-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd32XF64N Vrnd32XF64N
//go:noescape
func Vrnd32XF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Round to 32-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd32XqF32N Vrnd32XqF32N
//go:noescape
func Vrnd32XqF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Round to 32-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd32XqF64N Vrnd32XqF64N
//go:noescape
func Vrnd32XqF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Round to 32-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd32ZF32N Vrnd32ZF32N
//go:noescape
func Vrnd32ZF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Round to 32-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd32ZF64N Vrnd32ZF64N
//go:noescape
func Vrnd32ZF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Round to 32-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd32ZqF32N Vrnd32ZqF32N
//go:noescape
func Vrnd32ZqF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Round to 32-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd32ZqF64N Vrnd32ZqF64N
//go:noescape
func Vrnd32ZqF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Round to 64-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd64XF32N Vrnd64XF32N
//go:noescape
func Vrnd64XF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Round to 64-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd64XF64N Vrnd64XF64N
//go:noescape
func Vrnd64XF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Round to 64-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd64XqF32N Vrnd64XqF32N
//go:noescape
func Vrnd64XqF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Round to 64-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd64XqF64N Vrnd64XqF64N
//go:noescape
func Vrnd64XqF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Round to 64-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd64ZF32N Vrnd64ZF32N
//go:noescape
func Vrnd64ZF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Round to 64-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd64ZF64N Vrnd64ZF64N
//go:noescape
func Vrnd64ZF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Round to 64-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd64ZqF32N Vrnd64ZqF32N
//go:noescape
func Vrnd64ZqF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Round to 64-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname Vrnd64ZqF64N Vrnd64ZqF64N
//go:noescape
func Vrnd64ZqF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Round to Integral, to nearest with ties to Away (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest with Ties to Away rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndaF32N VrndaF32N
//go:noescape
func VrndaF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Round to Integral, to nearest with ties to Away (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest with Ties to Away rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndaF64N VrndaF64N
//go:noescape
func VrndaF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Round to Integral, to nearest with ties to Away (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest with Ties to Away rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndaqF32N VrndaqF32N
//go:noescape
func VrndaqF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Round to Integral, to nearest with ties to Away (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest with Ties to Away rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndaqF64N VrndaqF64N
//go:noescape
func VrndaqF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Round to Integral, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndiF32N VrndiF32N
//go:noescape
func VrndiF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Round to Integral, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndiF64N VrndiF64N
//go:noescape
func VrndiF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Round to Integral, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndiqF32N VrndiqF32N
//go:noescape
func VrndiqF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Round to Integral, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndiqF64N VrndiqF64N
//go:noescape
func VrndiqF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Round to Integral, toward Minus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndmF32N VrndmF32N
//go:noescape
func VrndmF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Round to Integral, toward Minus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndmF64N VrndmF64N
//go:noescape
func VrndmF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Round to Integral, toward Minus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndmqF32N VrndmqF32N
//go:noescape
func VrndmqF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Round to Integral, toward Minus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndmqF64N VrndmqF64N
//go:noescape
func VrndmqF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndnF32N VrndnF32N
//go:noescape
func VrndnF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndnF64N VrndnF64N
//go:noescape
func VrndnF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndnqF32N VrndnqF32N
//go:noescape
func VrndnqF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndnqF64N VrndnqF64N
//go:noescape
func VrndnqF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndnsF32N VrndnsF32N
//go:noescape
func VrndnsF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Round to Integral, toward Plus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndpF32N VrndpF32N
//go:noescape
func VrndpF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Round to Integral, toward Plus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndpF64N VrndpF64N
//go:noescape
func VrndpF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Round to Integral, toward Plus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndpqF32N VrndpqF32N
//go:noescape
func VrndpqF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Round to Integral, toward Plus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndpqF64N VrndpqF64N
//go:noescape
func VrndpqF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Round to Integral, toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndqF32N VrndqF32N
//go:noescape
func VrndqF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Round to Integral, toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndqF64N VrndqF64N
//go:noescape
func VrndqF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Round to Integral exact, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndxF32N VrndxF32N
//go:noescape
func VrndxF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Round to Integral exact, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndxF64N VrndxF64N
//go:noescape
func VrndxF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Round to Integral exact, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndxqF32N VrndxqF32N
//go:noescape
func VrndxqF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Round to Integral exact, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
//
//go:linkname VrndxqF64N VrndxqF64N
//go:noescape
func VrndxqF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrshlS8N VrshlS8N
//go:noescape
func VrshlS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrshlS16N VrshlS16N
//go:noescape
func VrshlS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrshlS32N VrshlS32N
//go:noescape
func VrshlS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrshlS64N VrshlS64N
//go:noescape
func VrshlS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrshldS64N VrshldS64N
//go:noescape
func VrshldS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrshlqS8N VrshlqS8N
//go:noescape
func VrshlqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrshlqS16N VrshlqS16N
//go:noescape
func VrshlqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrshlqS32N VrshlqS32N
//go:noescape
func VrshlqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrshlqS64N VrshlqS64N
//go:noescape
func VrshlqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Unsigned Reciprocal Square Root Estimate. This instruction reads each vector element from the source SIMD&FP register, calculates an approximate inverse square root for each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VrsqrteU32N VrsqrteU32N
//go:noescape
func VrsqrteU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)

// Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrsqrteF32N VrsqrteF32N
//go:noescape
func VrsqrteF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrsqrteF64N VrsqrteF64N
//go:noescape
func VrsqrteF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrsqrtedF64N VrsqrtedF64N
//go:noescape
func VrsqrtedF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Unsigned Reciprocal Square Root Estimate. This instruction reads each vector element from the source SIMD&FP register, calculates an approximate inverse square root for each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
//
//go:linkname VrsqrteqU32N VrsqrteqU32N
//go:noescape
func VrsqrteqU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)

// Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrsqrteqF32N VrsqrteqF32N
//go:noescape
func VrsqrteqF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrsqrteqF64N VrsqrteqF64N
//go:noescape
func VrsqrteqF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrsqrtesF32N VrsqrtesF32N
//go:noescape
func VrsqrtesF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrsqrtsF32N VrsqrtsF32N
//go:noescape
func VrsqrtsF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrsqrtsF64N VrsqrtsF64N
//go:noescape
func VrsqrtsF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrsqrtsdF64N VrsqrtsdF64N
//go:noescape
func VrsqrtsdF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrsqrtsqF32N VrsqrtsqF32N
//go:noescape
func VrsqrtsqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrsqrtsqF64N VrsqrtsqF64N
//go:noescape
func VrsqrtsqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VrsqrtssF32N VrsqrtssF32N
//go:noescape
func VrsqrtssF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// SHA1 fixed rotate.
//
//go:linkname Vsha1HU32N Vsha1HU32N
//go:noescape
func Vsha1HU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)

// SHA1 schedule update 1.
//
//go:linkname Vsha1Su1QU32N Vsha1Su1QU32N
//go:noescape
func Vsha1Su1QU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// SHA256 schedule update 0.
//
//go:linkname Vsha256Su0QU32N Vsha256Su0QU32N
//go:noescape
func Vsha256Su0QU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// SHA512 Schedule Update 0 takes the values from the two 128-bit source SIMD&FP registers and produces a 128-bit output value that combines the gamma0 functions of two iterations of the SHA512 schedule update that are performed after the first 16 iterations within a block. It returns this value to the destination SIMD&FP register.
//
//go:linkname Vsha512Su0QU64N Vsha512Su0QU64N
//go:noescape
func Vsha512Su0QU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VshlS8N VshlS8N
//go:noescape
func VshlS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VshlS16N VshlS16N
//go:noescape
func VshlS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VshlS32N VshlS32N
//go:noescape
func VshlS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VshlS64N VshlS64N
//go:noescape
func VshlS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VshldS64N VshldS64N
//go:noescape
func VshldS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VshlqS8N VshlqS8N
//go:noescape
func VshlqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VshlqS16N VshlqS16N
//go:noescape
func VshlqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VshlqS32N VshlqS32N
//go:noescape
func VshlqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VshlqS64N VshlqS64N
//go:noescape
func VshlqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// SM4 Key takes an input as a 128-bit vector from the first source SIMD&FP register and a 128-bit constant from the second SIMD&FP register. It derives four iterations of the output key, in accordance with the SM4 standard, returning the 128-bit result to the destination SIMD&FP register.
//
//go:linkname Vsm4EkeyqU32N Vsm4EkeyqU32N
//go:noescape
func Vsm4EkeyqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// SM4 Encode takes input data as a 128-bit vector from the first source SIMD&FP register, and four iterations of the round key held as the elements of the 128-bit vector in the second source SIMD&FP register. It encrypts the data by four rounds, in accordance with the SM4 standard, returning the 128-bit result to the destination SIMD&FP register.
//
//go:linkname Vsm4EqU32N Vsm4EqU32N
//go:noescape
func Vsm4EqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Floating-point Square Root (vector). This instruction calculates the square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsqrtF32N VsqrtF32N
//go:noescape
func VsqrtF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Square Root (vector). This instruction calculates the square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsqrtF64N VsqrtF64N
//go:noescape
func VsqrtF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Floating-point Square Root (vector). This instruction calculates the square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsqrtqF32N VsqrtqF32N
//go:noescape
func VsqrtqF32N(r *arm.Float32, v0 *arm.Float32, n int32)

// Floating-point Square Root (vector). This instruction calculates the square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsqrtqF64N VsqrtqF64N
//go:noescape
func VsqrtqF64N(r *arm.Float64, v0 *arm.Float64, n int32)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubS8N VsubS8N
//go:noescape
func VsubS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubS16N VsubS16N
//go:noescape
func VsubS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubS32N VsubS32N
//go:noescape
func VsubS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubS64N VsubS64N
//go:noescape
func VsubS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubU8N VsubU8N
//go:noescape
func VsubU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubU16N VsubU16N
//go:noescape
func VsubU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubU32N VsubU32N
//go:noescape
func VsubU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubU64N VsubU64N
//go:noescape
func VsubU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Floating-point Subtract (vector). This instruction subtracts the elements in the vector in the second source SIMD&FP register, from the corresponding elements in the vector in the first source SIMD&FP register, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubF32N VsubF32N
//go:noescape
func VsubF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Subtract (vector). This instruction subtracts the elements in the vector in the second source SIMD&FP register, from the corresponding elements in the vector in the first source SIMD&FP register, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubF64N VsubF64N
//go:noescape
func VsubF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubdS64N VsubdS64N
//go:noescape
func VsubdS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubdU64N VsubdU64N
//go:noescape
func VsubdU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubqS8N VsubqS8N
//go:noescape
func VsubqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubqS16N VsubqS16N
//go:noescape
func VsubqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubqS32N VsubqS32N
//go:noescape
func VsubqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubqS64N VsubqS64N
//go:noescape
func VsubqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubqU8N VsubqU8N
//go:noescape
func VsubqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubqU16N VsubqU16N
//go:noescape
func VsubqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubqU32N VsubqU32N
//go:noescape
func VsubqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubqU64N VsubqU64N
//go:noescape
func VsubqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Floating-point Subtract (vector). This instruction subtracts the elements in the vector in the second source SIMD&FP register, from the corresponding elements in the vector in the first source SIMD&FP register, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubqF32N VsubqF32N
//go:noescape
func VsubqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Floating-point Subtract (vector). This instruction subtracts the elements in the vector in the second source SIMD&FP register, from the corresponding elements in the vector in the first source SIMD&FP register, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname VsubqF64N VsubqF64N
//go:noescape
func VsubqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vtbl1S8N Vtbl1S8N
//go:noescape
func Vtbl1S8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.
//
//go:linkname Vtbl1U8N Vtbl1U8N
//go:noescape
func Vtbl1U8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1S8N Vtrn1S8N
//go:noescape
func Vtrn1S8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1S16N Vtrn1S16N
//go:noescape
func Vtrn1S16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1S32N Vtrn1S32N
//go:noescape
func Vtrn1S32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1U8N Vtrn1U8N
//go:noescape
func Vtrn1U8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1U16N Vtrn1U16N
//go:noescape
func Vtrn1U16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1U32N Vtrn1U32N
//go:noescape
func Vtrn1U32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1F32N Vtrn1F32N
//go:noescape
func Vtrn1F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1QS8N Vtrn1QS8N
//go:noescape
func Vtrn1QS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1QS16N Vtrn1QS16N
//go:noescape
func Vtrn1QS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1QS32N Vtrn1QS32N
//go:noescape
func Vtrn1QS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1QS64N Vtrn1QS64N
//go:noescape
func Vtrn1QS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1QU8N Vtrn1QU8N
//go:noescape
func Vtrn1QU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1QU16N Vtrn1QU16N
//go:noescape
func Vtrn1QU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1QU32N Vtrn1QU32N
//go:noescape
func Vtrn1QU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1QU64N Vtrn1QU64N
//go:noescape
func Vtrn1QU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1QF32N Vtrn1QF32N
//go:noescape
func Vtrn1QF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn1QF64N Vtrn1QF64N
//go:noescape
func Vtrn1QF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2S8N Vtrn2S8N
//go:noescape
func Vtrn2S8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2S16N Vtrn2S16N
//go:noescape
func Vtrn2S16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2S32N Vtrn2S32N
//go:noescape
func Vtrn2S32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2U8N Vtrn2U8N
//go:noescape
func Vtrn2U8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2U16N Vtrn2U16N
//go:noescape
func Vtrn2U16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2U32N Vtrn2U32N
//go:noescape
func Vtrn2U32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2F32N Vtrn2F32N
//go:noescape
func Vtrn2F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2QS8N Vtrn2QS8N
//go:noescape
func Vtrn2QS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2QS16N Vtrn2QS16N
//go:noescape
func Vtrn2QS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2QS32N Vtrn2QS32N
//go:noescape
func Vtrn2QS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2QS64N Vtrn2QS64N
//go:noescape
func Vtrn2QS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2QU8N Vtrn2QU8N
//go:noescape
func Vtrn2QU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2QU16N Vtrn2QU16N
//go:noescape
func Vtrn2QU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2QU32N Vtrn2QU32N
//go:noescape
func Vtrn2QU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2QU64N Vtrn2QU64N
//go:noescape
func Vtrn2QU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2QF32N Vtrn2QF32N
//go:noescape
func Vtrn2QF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.
//
//go:linkname Vtrn2QF64N Vtrn2QF64N
//go:noescape
func Vtrn2QF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstS8N VtstS8N
//go:noescape
func VtstS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstS16N VtstS16N
//go:noescape
func VtstS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstS32N VtstS32N
//go:noescape
func VtstS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstS64N VtstS64N
//go:noescape
func VtstS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstU8N VtstU8N
//go:noescape
func VtstU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstU16N VtstU16N
//go:noescape
func VtstU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstU32N VtstU32N
//go:noescape
func VtstU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstU64N VtstU64N
//go:noescape
func VtstU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstdS64N VtstdS64N
//go:noescape
func VtstdS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstdU64N VtstdU64N
//go:noescape
func VtstdU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstqS8N VtstqS8N
//go:noescape
func VtstqS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstqS16N VtstqS16N
//go:noescape
func VtstqS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstqS32N VtstqS32N
//go:noescape
func VtstqS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstqS64N VtstqS64N
//go:noescape
func VtstqS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstqU8N VtstqU8N
//go:noescape
func VtstqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstqU16N VtstqU16N
//go:noescape
func VtstqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstqU32N VtstqU32N
//go:noescape
func VtstqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
//
//go:linkname VtstqU64N VtstqU64N
//go:noescape
func VtstqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1S8N Vuzp1S8N
//go:noescape
func Vuzp1S8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1S16N Vuzp1S16N
//go:noescape
func Vuzp1S16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1S32N Vuzp1S32N
//go:noescape
func Vuzp1S32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1U8N Vuzp1U8N
//go:noescape
func Vuzp1U8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1U16N Vuzp1U16N
//go:noescape
func Vuzp1U16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1U32N Vuzp1U32N
//go:noescape
func Vuzp1U32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1F32N Vuzp1F32N
//go:noescape
func Vuzp1F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1QS8N Vuzp1QS8N
//go:noescape
func Vuzp1QS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1QS16N Vuzp1QS16N
//go:noescape
func Vuzp1QS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1QS32N Vuzp1QS32N
//go:noescape
func Vuzp1QS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1QS64N Vuzp1QS64N
//go:noescape
func Vuzp1QS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1QU8N Vuzp1QU8N
//go:noescape
func Vuzp1QU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1QU16N Vuzp1QU16N
//go:noescape
func Vuzp1QU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1QU32N Vuzp1QU32N
//go:noescape
func Vuzp1QU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1QU64N Vuzp1QU64N
//go:noescape
func Vuzp1QU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1QF32N Vuzp1QF32N
//go:noescape
func Vuzp1QF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp1QF64N Vuzp1QF64N
//go:noescape
func Vuzp1QF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2S8N Vuzp2S8N
//go:noescape
func Vuzp2S8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2S16N Vuzp2S16N
//go:noescape
func Vuzp2S16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2S32N Vuzp2S32N
//go:noescape
func Vuzp2S32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2U8N Vuzp2U8N
//go:noescape
func Vuzp2U8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2U16N Vuzp2U16N
//go:noescape
func Vuzp2U16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2U32N Vuzp2U32N
//go:noescape
func Vuzp2U32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2F32N Vuzp2F32N
//go:noescape
func Vuzp2F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2QS8N Vuzp2QS8N
//go:noescape
func Vuzp2QS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2QS16N Vuzp2QS16N
//go:noescape
func Vuzp2QS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2QS32N Vuzp2QS32N
//go:noescape
func Vuzp2QS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2QS64N Vuzp2QS64N
//go:noescape
func Vuzp2QS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2QU8N Vuzp2QU8N
//go:noescape
func Vuzp2QU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2QU16N Vuzp2QU16N
//go:noescape
func Vuzp2QU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2QU32N Vuzp2QU32N
//go:noescape
func Vuzp2QU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2QU64N Vuzp2QU64N
//go:noescape
func Vuzp2QU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2QF32N Vuzp2QF32N
//go:noescape
func Vuzp2QF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.
//
//go:linkname Vuzp2QF64N Vuzp2QF64N
//go:noescape
func Vuzp2QF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1S8N Vzip1S8N
//go:noescape
func Vzip1S8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1S16N Vzip1S16N
//go:noescape
func Vzip1S16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1S32N Vzip1S32N
//go:noescape
func Vzip1S32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1U8N Vzip1U8N
//go:noescape
func Vzip1U8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1U16N Vzip1U16N
//go:noescape
func Vzip1U16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1U32N Vzip1U32N
//go:noescape
func Vzip1U32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1F32N Vzip1F32N
//go:noescape
func Vzip1F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1QS8N Vzip1QS8N
//go:noescape
func Vzip1QS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1QS16N Vzip1QS16N
//go:noescape
func Vzip1QS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1QS32N Vzip1QS32N
//go:noescape
func Vzip1QS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1QS64N Vzip1QS64N
//go:noescape
func Vzip1QS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1QU8N Vzip1QU8N
//go:noescape
func Vzip1QU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1QU16N Vzip1QU16N
//go:noescape
func Vzip1QU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1QU32N Vzip1QU32N
//go:noescape
func Vzip1QU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1QU64N Vzip1QU64N
//go:noescape
func Vzip1QU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1QF32N Vzip1QF32N
//go:noescape
func Vzip1QF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip1QF64N Vzip1QF64N
//go:noescape
func Vzip1QF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2S8N Vzip2S8N
//go:noescape
func Vzip2S8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2S16N Vzip2S16N
//go:noescape
func Vzip2S16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2S32N Vzip2S32N
//go:noescape
func Vzip2S32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2U8N Vzip2U8N
//go:noescape
func Vzip2U8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2U16N Vzip2U16N
//go:noescape
func Vzip2U16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2U32N Vzip2U32N
//go:noescape
func Vzip2U32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2F32N Vzip2F32N
//go:noescape
func Vzip2F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2QS8N Vzip2QS8N
//go:noescape
func Vzip2QS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2QS16N Vzip2QS16N
//go:noescape
func Vzip2QS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2QS32N Vzip2QS32N
//go:noescape
func Vzip2QS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2QS64N Vzip2QS64N
//go:noescape
func Vzip2QS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2QU8N Vzip2QU8N
//go:noescape
func Vzip2QU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2QU16N Vzip2QU16N
//go:noescape
func Vzip2QU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2QU32N Vzip2QU32N
//go:noescape
func Vzip2QU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2QU64N Vzip2QU64N
//go:noescape
func Vzip2QU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2QF32N Vzip2QF32N
//go:noescape
func Vzip2QF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)

// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
//
//go:linkname Vzip2QF64N Vzip2QF64N
//go:noescape
func Vzip2QF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)


================================================
FILE: arm/neon/loops_test.go
================================================
package neon

import (
	"math/rand"
	"reflect"
	"testing"
	"unsafe"

	"github.com/alivanz/go-simd/arm"
)

func TestVabsS32N(t *testing.T) {
	const N = 1024 * 16
	var (
		r   = make([]arm.Int32, N)
		v   = make([]arm.Int32, N)
		ref = make([]arm.Int32, N)
	)
	for i := 0; i < N; i++ {
		r[i] = arm.Int32(int32(rand.Int()))
		v[i] = arm.Int32(int32(rand.Int()))
		if v[i] < 0 {
			ref[i] = -v[i]
		} else {
			ref[i] = v[i]
		}
	}
	VabsS32N(&r[0], &v[0], N)
	if !reflect.DeepEqual(r, ref) {
		t.Fatal(r)
	}
}

func TestVmulqF32N(t *testing.T) {
	const N = 1024 * 16
	var (
		r   = make([]arm.Float32, N)
		v1  = make([]arm.Float32, N)
		v2  = make([]arm.Float32, N)
		ref = make([]arm.Float32, N)
	)
	for i := 0; i < N; i++ {
		v1[i] = arm.Float32(rand.Float32())
		v2[i] = arm.Float32(rand.Float32())
		ref[i] = v1[i] * v2[i]
	}
	VmulqF32N(&r[0], &v1[0], &v2[0], N)
	if !reflect.DeepEqual(r, ref) {
		t.Fatal(r)
	}
}

// this benchmark is fully run on C code
func BenchmarkVmulqF32N(b *testing.B) {
	const N = 1024 * 1024
	var (
		r  = make([]arm.Float32, N)
		v1 = make([]arm.Float32, N)
		v2 = make([]arm.Float32, N)
	)
	b.SetBytes(N * 4)
	for i := int32(0); i < N; i++ {
		v1[i] = arm.Float32(rand.Float32())
		v2[i] = arm.Float32(rand.Float32())
	}
	b.StartTimer()
	for i := 0; i < b.N; i++ {
		VmulqF32N(&r[0], &v1[0], &v2[0], N)
	}
}

// this benchmark is calling the C code multiple times
func BenchmarkVmulqF32C(b *testing.B) {
	const N = 1024 * 1024
	var (
		r  = make([]arm.Float32, N)
		v1 = make([]arm.Float32, N)
		v2 = make([]arm.Float32, N)
	)
	b.SetBytes(N * 4)
	for i := int32(0); i < N; i++ {
		v1[i] = arm.Float32(rand.Float32())
		v2[i] = arm.Float32(rand.Float32())
	}
	b.StartTimer()
	for i := 0; i < b.N; i++ {
		for j := int32(0); j < N; j += 4 {
			VmulqF32(
				(*arm.Float32X4)(unsafe.Pointer(&r[j])),
				(*arm.Float32X4)(unsafe.Pointer(&v1[j])),
				(*arm.Float32X4)(unsafe.Pointer(&v2[j])),
			)
		}
	}
}

// this benchmark is Go runtime implementation
func BenchmarkVmulqF32Ref(b *testing.B) {
	const N = 1024 * 1024
	var (
		r  = make([]arm.Float32, N)
		v1 = make([]arm.Float32, N)
		v2 = make([]arm.Float32, N)
	)
	b.SetBytes(N * 4)
	for i := int32(0); i < N; i++ {
		v1[i] = arm.Float32(rand.Float32())
		v2[i] = arm.Float32(rand.Float32())
	}
	b.StartTimer()
	for i := 0; i < b.N; i++ {
		for j := int32(0); j < N; j++ {
			r[j] = v1[j] * v2[j]
		}
	}
}


================================================
FILE: arm/types.go
================================================
package arm

/*
#include <arm_neon.h>
*/
import "C"

// typedef float float32_t;
type Float32 = C.float32_t

// typedef __attribute__((neon_vector_type(2))) float32_t float32x2_t;
type Float32X2 = C.float32x2_t

// typedef struct float32x2x2_t { float32x2_t val[2];} float32x2x2_t;
type Float32X2X2 = C.float32x2x2_t

// typedef __attribute__((neon_vector_type(4))) float32_t float32x4_t;
type Float32X4 = C.float32x4_t

// typedef struct float32x4x2_t { float32x4_t val[2];} float32x4x2_t;
type Float32X4X2 = C.float32x4x2_t

// typedef double float64_t;
type Float64 = C.float64_t

// typedef __attribute__((neon_vector_type(1))) float64_t float64x1_t;
type Float64X1 = C.float64x1_t

// typedef __attribute__((neon_vector_type(2))) float64_t float64x2_t;
type Float64X2 = C.float64x2_t

// typedef short int16_t;
type Int16 = C.int16_t

// typedef __attribute__((neon_vector_type(4))) int16_t int16x4_t;
type Int16X4 = C.int16x4_t

// typedef struct int16x4x2_t { int16x4_t val[2];} int16x4x2_t;
type Int16X4X2 = C.int16x4x2_t

// typedef __attribute__((neon_vector_type(8))) int16_t int16x8_t;
type Int16X8 = C.int16x8_t

// typedef struct int16x8x2_t { int16x8_t val[2];} int16x8x2_t;
type Int16X8X2 = C.int16x8x2_t

// typedef int int32_t;
type Int32 = C.int32_t

// typedef __attribute__((neon_vector_type(2))) int32_t int32x2_t;
type Int32X2 = C.int32x2_t

// typedef struct int32x2x2_t { int32x2_t val[2];} int32x2x2_t;
type Int32X2X2 = C.int32x2x2_t

// typedef __attribute__((neon_vector_type(4))) int32_t int32x4_t;
type Int32X4 = C.int32x4_t

// typedef struct int32x4x2_t { int32x4_t val[2];} int32x4x2_t;
type Int32X4X2 = C.int32x4x2_t

// typedef longlong int64_t;
type Int64 = C.int64_t

// typedef __attribute__((neon_vector_type(1))) int64_t int64x1_t;
type Int64X1 = C.int64x1_t

// typedef __attribute__((neon_vector_type(2))) int64_t int64x2_t;
type Int64X2 = C.int64x2_t

// typedef signed char int8_t;
type Int8 = C.int8_t

// typedef __attribute__((neon_vector_type(16))) int8_t int8x16_t;
type Int8X16 = C.int8x16_t

// typedef struct int8x16x2_t { int8x16_t val[2];} int8x16x2_t;
type Int8X16X2 = C.int8x16x2_t

// typedef struct int8x16x3_t { int8x16_t val[3];} int8x16x3_t;
type Int8X16X3 = C.int8x16x3_t

// typedef struct int8x16x4_t { int8x16_t val[4];} int8x16x4_t;
type Int8X16X4 = C.int8x16x4_t

// typedef __attribute__((neon_vector_type(8))) int8_t int8x8_t;
type Int8X8 = C.int8x8_t

// typedef struct int8x8x2_t { int8x8_t val[2];} int8x8x2_t;
type Int8X8X2 = C.int8x8x2_t

// typedef struct int8x8x3_t { int8x8_t val[3];} int8x8x3_t;
type Int8X8X3 = C.int8x8x3_t

// typedef struct int8x8x4_t { int8x8_t val[4];} int8x8x4_t;
type Int8X8X4 = C.int8x8x4_t

// typedef __uint128_t poly128_t;
type Poly128 = C.poly128_t

// typedef uint16_t poly16_t;
type Poly16 = C.poly16_t

// typedef __attribute__((neon_polyvector_type(4))) poly16_t poly16x4_t;
type Poly16X4 = C.poly16x4_t

// typedef struct poly16x4x2_t { poly16x4_t val[2];} poly16x4x2_t;
type Poly16X4X2 = C.poly16x4x2_t

// typedef __attribute__((neon_polyvector_type(8))) poly16_t poly16x8_t;
type Poly16X8 = C.poly16x8_t

// typedef struct poly16x8x2_t { poly16x8_t val[2];} poly16x8x2_t;
type Poly16X8X2 = C.poly16x8x2_t

// typedef uint64_t poly64_t;
type Poly64 = C.poly64_t

// typedef __attribute__((neon_polyvector_type(1))) poly64_t poly64x1_t;
type Poly64X1 = C.poly64x1_t

// typedef __attribute__((neon_polyvector_type(2))) poly64_t poly64x2_t;
type Poly64X2 = C.poly64x2_t

// typedef uint8_t poly8_t;
type Poly8 = C.poly8_t

// typedef __attribute__((neon_polyvector_type(16))) poly8_t poly8x16_t;
type Poly8X16 = C.poly8x16_t

// typedef struct poly8x16x2_t { poly8x16_t val[2];} poly8x16x2_t;
type Poly8X16X2 = C.poly8x16x2_t

// typedef struct poly8x16x3_t { poly8x16_t val[3];} poly8x16x3_t;
type Poly8X16X3 = C.poly8x16x3_t

// typedef struct poly8x16x4_t { poly8x16_t val[4];} poly8x16x4_t;
type Poly8X16X4 = C.poly8x16x4_t

// typedef __attribute__((neon_polyvector_type(8))) poly8_t poly8x8_t;
type Poly8X8 = C.poly8x8_t

// typedef struct poly8x8x2_t { poly8x8_t val[2];} poly8x8x2_t;
type Poly8X8X2 = C.poly8x8x2_t

// typedef struct poly8x8x3_t { poly8x8_t val[3];} poly8x8x3_t;
type Poly8X8X3 = C.poly8x8x3_t

// typedef struct poly8x8x4_t { poly8x8_t val[4];} poly8x8x4_t;
type Poly8X8X4 = C.poly8x8x4_t

// typedef ushort uint16_t;
type Uint16 = C.uint16_t

// typedef __attribute__((neon_vector_type(4))) uint16_t uint16x4_t;
type Uint16X4 = C.uint16x4_t

// typedef struct uint16x4x2_t { uint16x4_t val[2];} uint16x4x2_t;
type Uint16X4X2 = C.uint16x4x2_t

// typedef __attribute__((neon_vector_type(8))) uint16_t uint16x8_t;
type Uint16X8 = C.uint16x8_t

// typedef struct uint16x8x2_t { uint16x8_t val[2];} uint16x8x2_t;
type Uint16X8X2 = C.uint16x8x2_t

// typedef uint uint32_t;
type Uint32 = C.uint32_t

// typedef __attribute__((neon_vector_type(2))) uint32_t uint32x2_t;
type Uint32X2 = C.uint32x2_t

// typedef struct uint32x2x2_t { uint32x2_t val[2];} uint32x2x2_t;
type Uint32X2X2 = C.uint32x2x2_t

// typedef __attribute__((neon_vector_type(4))) uint32_t uint32x4_t;
type Uint32X4 = C.uint32x4_t

// typedef struct uint32x4x2_t { uint32x4_t val[2];} uint32x4x2_t;
type Uint32X4X2 = C.uint32x4x2_t

// typedef ulonglong uint64_t;
type Uint64 = C.uint64_t

// typedef __attribute__((neon_vector_type(1))) uint64_t uint64x1_t;
type Uint64X1 = C.uint64x1_t

// typedef __attribute__((neon_vector_type(2))) uint64_t uint64x2_t;
type Uint64X2 = C.uint64x2_t

// typedef uchar uint8_t;
type Uint8 = C.uint8_t

// typedef __attribute__((neon_vector_type(16))) uint8_t uint8x16_t;
type Uint8X16 = C.uint8x16_t

// typedef struct uint8x16x2_t { uint8x16_t val[2];} uint8x16x2_t;
type Uint8X16X2 = C.uint8x16x2_t

// typedef struct uint8x16x3_t { uint8x16_t val[3];} uint8x16x3_t;
type Uint8X16X3 = C.uint8x16x3_t

// typedef struct uint8x16x4_t { uint8x16_t val[4];} uint8x16x4_t;
type Uint8X16X4 = C.uint8x16x4_t

// typedef __attribute__((neon_vector_type(8))) uint8_t uint8x8_t;
type Uint8X8 = C.uint8x8_t

// typedef struct uint8x8x2_t { uint8x8_t val[2];} uint8x8x2_t;
type Uint8X8X2 = C.uint8x8x2_t

// typedef struct uint8x8x3_t { uint8x8_t val[3];} uint8x8x3_t;
type Uint8X8X3 = C.uint8x8x3_t

// typedef struct uint8x8x4_t { uint8x8_t val[4];} uint8x8x4_t;
type Uint8X8X4 = C.uint8x8x4_t


================================================
FILE: example/neon/main.go
================================================
package main

import (
	"log"

	"github.com/alivanz/go-simd/arm"
	"github.com/alivanz/go-simd/arm/neon"
)

func main() {
	var a, b arm.Int8X8
	var add, mul arm.Int16X8
	for i := 0; i < 8; i++ {
		a[i] = arm.Int8(i)
		b[i] = arm.Int8(i * i)
	}
	log.Printf("a = %+v", b)
	log.Printf("b = %+v", a)
	neon.VaddlS8(&add, &a, &b)
	neon.VmullS8(&mul, &a, &b)
	log.Printf("add = %+v", add)
	log.Printf("mul = %+v", mul)
}


================================================
FILE: example/sse2/main.go
================================================
package main

import (
	"log"

	"github.com/alivanz/go-simd/x86"
)

func main() {
	a := x86.MmSetrEpi8(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
	b := x86.MmSetrEpi8(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0)
	add := x86.MmAddEpi8(a, b)
	log.Print(a)
	log.Print(b)
	log.Print(add)
}


================================================
FILE: generator/arm/arm.go
================================================
package main

import (
	"encoding/json"
	"os"

	"github.com/alivanz/go-simd/generator/utils"
)

type ArmIntrinsics []ArmIntrinsic

type ArmIntrinsic struct {
	Name        string `json:"name"`
	Description string `json:"description"`
}

func GetIntrinsics() (ArmIntrinsics, error) {
	if err := utils.Download(
		"intrinsics.json",
		"https://developer.arm.com/architectures/instruction-sets/intrinsics/data/intrinsics.json",
	); err != nil {
		return nil, err
	}
	f, err := os.Open("intrinsics.json")
	if err != nil {
		return nil, err
	}
	defer f.Close()
	var intrins ArmIntrinsics
	if err := json.NewDecoder(f).Decode(&intrins); err != nil {
		return nil, err
	}
	return intrins, nil
}

func (intrins ArmIntrinsics) Find(s string) *ArmIntrinsic {
	for _, intrin := range intrins {
		if intrin.Name == s {
			return &intrin
		}
	}
	return nil
}


================================================
FILE: generator/arm/main.go
================================================
package main

import (
	"bytes"
	"fmt"
	"io"
	"log"
	"os"
	"os/exec"
	"sort"
	"strconv"
	"strings"

	"github.com/alivanz/go-simd/generator/scanner"
	"github.com/alivanz/go-simd/generator/types"
	"github.com/alivanz/go-simd/generator/utils"
	"github.com/alivanz/go-simd/generator/writer"
	"github.com/iancoleman/strcase"
)

func Source() ([]byte, error) {
	cmd := exec.Command("clang", "-E", "-")
	cmd.Stdin = bytes.NewBufferString(strings.Join(writer.Includes([]string{
		"arm_neon.h",
	}), "\n"))
	cmd.Stderr = os.Stderr
	return cmd.Output()
}

func main() {
	src, err := Source()
	if err != nil {
		log.Fatal(err)
	}
	// write raw
	if err := writer.WriteToFile("raw.h", func(w io.Writer) error {
		_, err := w.Write(src)
		return err
	}); err != nil {
		log.Fatal(err)
	}
	// scan
	result, err := scanner.Scan(src)
	if err != nil {
		log.Fatal(err)
	}
	// filter functions
	result.Functions = utils.Filter(result.Functions, func(fn types.Function) bool {
		if strings.HasPrefix(fn.Name, "vbf") {
			return false
		}
		if strings.Contains(fn.Name, "bf16") {
			return false
		}
		return true
	})
	// filter types
	mtype := make(map[string]bool)
	for _, fn := range result.Functions {
		if fn.Return != nil {
			mtype[fn.Return.Name] = true
		}
		for _, arg := range fn.Args {
			mtype[arg.Name] = true
		}
	}
	result.Types = utils.Filter(result.Types, func(t types.Type) bool {
		return mtype[t.Name]
	})
	// sort functions
	sort.Slice(result.Functions, func(i, j int) bool {
		g0, i0, _ := sortGroup(result.Functions[i].Name)
		g1, i1, _ := sortGroup(result.Functions[j].Name)
		if g0 != g1 {
			return g0 < g1
		}
		return i0 < i1
	})
	// sort types
	sort.Slice(result.Types, func(i, j int) bool {
		return result.Types[i].Name < result.Types[j].Name
	})
	// write types
	if err := writer.WriteToFile("types.go", func(w io.Writer) error {
		if err := writer.Package(w, "arm"); err != nil {
			return err
		}
		if err := writer.ImportC(w, func(w io.Writer) error {
			_, err := io.WriteString(w, strings.Join(writer.Includes([]string{
				"arm_neon.h",
			}), "\n"))
			return err
		}); err != nil {
			return err
		}
		if err := writer.Types(w, result.Types); err != nil {
			return err
		}
		return nil
	}); err != nil {
		log.Fatal(err)
	}
	// patch intrinsics info
	intrins, err := GetIntrinsics()
	if err != nil {
		log.Fatal(err)
	}
	for i, fn := range result.Functions {
		if info := intrins.Find(fn.Name); info != nil {
			result.Functions[i].Comment = info.Description
		}
	}
	// write C
	if err := writer.WriteToFile("neon/functions.c", func(w io.Writer) error {
		if _, err := io.WriteString(w, "#include <arm_neon.h>\n\n"); err != nil {
			return err
		}
		for _, fn := range result.Functions {
			if fn.Blacklisted() {
				continue
			}
			if err := writer.RewriteC(w, fn); err != nil {
				return err
			}
		}
		return nil
	}); err != nil {
		log.Fatal(err)
	}
	// write functions
	if err := writer.WriteToFile("neon/functions.go", func(w io.Writer) error {
		if err := writer.Package(w, "neon"); err != nil {
			return err
		}
		if err := writer.Import(w, []string{
			"github.com/alivanz/go-simd/arm",
		}); err != nil {
			return err
		}
		if err := writer.ImportC(w, func(w io.Writer) error {
			if _, err := io.WriteString(w, "#include <arm_neon.h>"); err != nil {
				return err
			}
			return nil
		}); err != nil {
			return err
		}
		for _, fn := range result.Functions {
			if fn.Blacklisted() {
				continue
			}
			writer.DeclareFuncBypass(w, fn, "arm")
		}
		return nil
	}); err != nil {
		log.Fatal(err)
	}
	// C loops
	var (
		loops = make(map[string]bool)
	)
	if err := writer.WriteToFile("neon/loops.c", func(w io.Writer) error {
		if _, err := io.WriteString(w, "#include <arm_neon.h>\n\n"); err != nil {
			return err
		}
		if _, err := io.WriteString(w, "#define save(dst, src) *dst = src\n"); err != nil {
			return err
		}
		if _, err := io.WriteString(w, "#define load(src) (*src)\n"); err != nil {
			return err
		}
		if _, err := io.WriteString(w, `#define LOOP1(name, rtype, itype, f, set, load, rstep, istep) \
    void name(rtype *r, itype *v, int32_t n)                  \
    {                                                         \
        while (n >= rstep)                                    \
        {                                                     \
            set(r, f(load(v)));                               \
            r += rstep;                                       \
            n -= rstep;                                       \
            v += istep;                                       \
        }                                                     \
    }

`); err != nil {
			return err
		}
		for _, fn := range result.Functions {
			if fn.Blacklisted() {
				continue
			}
			if len(fn.Args) != 1 {
				continue
			}
			og, o0, o1 := parseType(fn.Return.Name)
			if og == "" {
				continue
			}
			ig, i0, i1 := parseType(fn.Args[0].Name)
			if ig == "" {
				continue
			}
			if o0 != i0 {
				continue
			}
			var rq, iq string
			if o0*o1 == 128 {
				rq = "q"
			}
			if i0*i1 == 128 {
				iq = "q"
			}
			if o1 == -1 {
				o1 = 1
			}
			if i1 == -1 {
				i1 = 1
			}
			group, _, suffix := sortGroup(fn.Name)
			rg, r0, _ := parseType(fn.Return.Name)
			if rg == "" {
				continue
			}
			io.WriteString(w,
				fmt.Sprintf(
					"LOOP1(%s, %s, %s, %s, %s, %s, %d, %d)\n",
					strcase.ToCamel(group+suffix+"N"),
					fmt.Sprintf("%s%d_t", rg, r0),
					fmt.Sprintf("%s%d_t", ig, i0),
					fn.Name,
					setter(fn.Return.Name, "save", fmt.Sprintf("vst1%s_%s%d", rq, typeShort[rg], r0)),
					setter(fn.Args[0].Name, "load", fmt.Sprintf("vld1%s_%s%d", iq, typeShort[ig], i0)),
					o1,
					i1,
				),
			)
			loops[fn.Name] = true
		}
		io.WriteString(w, "\n")
		if _, err := io.WriteString(w, `#define LOOP2(name, rtype, itype, f, set, load, rstep, istep) \
    void name(rtype *r, itype *v1, itype *v2, int32_t n)      \
    {                                                         \
        while (n >= rstep)                                    \
        {                                                     \
            set(r, f(load(v1), load(v2)));                    \
            r += rstep;                                       \
            n -= rstep;                                       \
            v1 += istep;                                      \
            v2 += istep;                                      \
        }                                                     \
    }

`); err != nil {
			return err
		}
		for _, fn := range result.Functions {
			if fn.Blacklisted() {
				continue
			}
			if len(fn.Args) != 2 {
				continue
			}
			if fn.Args[0].Name != fn.Args[1].Name {
				continue
			}
			og, o0, o1 := parseType(fn.Return.Name)
			if og == "" {
				continue
			}
			ig, i0, i1 := parseType(fn.Args[0].Name)
			if ig == "" {
				continue
			}
			if o0 != i0 {
				continue
			}
			var rq, iq string
			if o0*o1 == 128 {
				rq = "q"
			}
			if i0*i1 == 128 {
				iq = "q"
			}
			if o1 == -1 {
				o1 = 1
			}
			if i1 == -1 {
				i1 = 1
			}
			group, _, suffix := sortGroup(fn.Name)
			rg, r0, _ := parseType(fn.Return.Name)
			if rg == "" {
				continue
			}
			io.WriteString(w,
				fmt.Sprintf(
					"LOOP2(%s, %s, %s, %s, %s, %s, %d, %d)\n",
					strcase.ToCamel(group+suffix+"N"),
					fmt.Sprintf("%s%d_t", rg, r0),
					fmt.Sprintf("%s%d_t", ig, i0),
					fn.Name,
					setter(fn.Return.Name, "save", fmt.Sprintf("vst1%s_%s%d", rq, typeShort[rg], r0)),
					setter(fn.Args[0].Name, "load", fmt.Sprintf("vld1%s_%s%d", iq, typeShort[ig], i0)),
					o1,
					i1,
				),
			)
			loops[fn.Name] = true
		}
		return nil
	}); err != nil {
		log.Fatal(err)
	}
	// loop functions
	if err := writer.WriteToFile("neon/loops.go", func(w io.Writer) error {
		if err := writer.Package(w, "neon"); err != nil {
			return err
		}
		if err := writer.Import(w, []string{
			"github.com/alivanz/go-simd/arm",
		}); err != nil {
			return err
		}
		if err := writer.ImportC(w, func(w io.Writer) error {
			if _, err := io.WriteString(w, "#include <arm_neon.h>"); err != nil {
				return err
			}
			return nil
		}); err != nil {
			return err
		}
		for _, fn := range result.Functions {
			if !loops[fn.Name] {
				continue
			}
			// add suffix
			fn.Name += "N"
			// write
			fmt.Fprintf(w, "\n")
			if len(fn.Comment) > 0 {
				fmt.Fprintf(w, "// %s\n", fn.Comment)
			} else {
				fmt.Fprintf(w, "// %s\n", fn.Name)
			}
			fmt.Fprintf(w, "//\n")
			fmt.Fprintf(w, "//go:linkname %s %s\n", strcase.ToCamel(fn.Name), strcase.ToCamel(fn.Name))
			fmt.Fprintf(w, "//go:noescape\n")
			fmt.Fprintf(w, "func %s(", strcase.ToCamel(fn.Name))
			if fn.Return != nil {
				var parts = strings.SplitN(strings.TrimSuffix(fn.Return.Name, "_t"), "x", 2)
				fmt.Fprintf(w, "r *arm.%s, ", strcase.ToCamel(parts[0]))
			}
			fmt.Fprintf(w, "%s, n int32)\n", strings.Join(utils.Transform(fn.Args, func(i int, t types.Type) string {
				var parts = strings.SplitN(strings.TrimSuffix(t.Name, "_t"), "x", 2)
				return fmt.Sprintf("v%d *arm.%s", i, strcase.ToCamel(parts[0]))
			}), ", "))
		}
		return nil
	}); err != nil {
		log.Fatal(err)
	}
}

func setter(t string, direct string, def string) string {
	_, _, r1 := parseType(t)
	if r1 == -1 {
		return direct
	}
	return def
}

func parseType(t string) (string, int, int) {
	var (
		group string
	)
	t = strings.TrimSuffix(t, "_t")
	if strings.HasPrefix(t, "uint") {
		group = "uint"
		t = t[4:]
	} else if strings.HasPrefix(t, "int") {
		group = "int"
		t = t[3:]
	} else if strings.HasPrefix(t, "float") {
		group = "float"
		t = t[5:]
	}
	parts := strings.Split(t, "x")
	switch len(parts) {
	case 1:
		w, err := strconv.ParseUint(parts[0], 10, 32)
		if err != nil {
			return "", 0, 0
		}
		return group, int(w), -1
	case 2:
		w, err := strconv.ParseUint(parts[0], 10, 32)
		if err != nil {
			return "", 0, 0
		}
		h, err := strconv.ParseUint(parts[1], 10, 32)
		if err != nil {
			return "", 0, 0
		}
		return group, int(w), int(h)
	}
	return "", 0, 0
}

var (
	typeShort = map[string]string{
		"uint":    "u",
		"uint8":   "u8",
		"uint16":  "u16",
		"uint32":  "u32",
		"uint64":  "u64",
		"int":     "s",
		"int8":    "s8",
		"int16":   "s16",
		"int32":   "s32",
		"int64":   "s64",
		"float":   "f",
		"float32": "f32",
		"float64": "f64",
	}
)


================================================
FILE: generator/arm/sort.go
================================================
package main

import "strings"

var (
	suffixOrder = []string{
		"_s8",
		"_s16",
		"_s32",
		"_s64",
		"_u8",
		"_u16",
		"_u32",
		"_u64",
		"_f32",
		"_f64",
	}
)

func sortGroup(name string) (string, int, string) {
	var (
		group  = name
		index  = -1
		suffix = ""
	)
	for i, s := range suffixOrder {
		if strings.HasSuffix(name, s) {
			group = strings.TrimSuffix(name, s)
			index = i
			suffix = s
		}
	}
	return group, index, suffix
}


================================================
FILE: generator/scanner/scan.go
================================================
package scanner

import (
	"bytes"
	"regexp"

	"github.com/alivanz/go-simd/generator/types"
	"github.com/alivanz/go-simd/generator/utils"
)

var (
	name             = `(\w+?)`
	args             = `([\w\s,_]*?)`
	attr             = `(?:__attribute__\(\(` + `([\w\s,\(\)"]+?)` + `\)\))`
	regTypedefSimple = regexp.MustCompile(`typedef\s+` + attr + `?[\w\s]+? ` + name + `\s*` + attr + `?;`)
	regTypedefStruct = regexp.MustCompile(`typedef struct \w+? {.+?}\s*?` + name + `;`)
	regFunction      = regexp.MustCompile(name + `\s+` + attr + `?\s*` + name + `\s*\(` + args + `\)` + `\s*` + `{.*?}`)
	regArg           = regexp.MustCompile(`\s*(([\w\s]+)\s(?:\w+))`)
	regWhitespace    = regexp.MustCompile(`\s+`)
	regComma         = regexp.MustCompile(`\s*,\s*`)
	regLongLong      = regexp.MustCompile(`long\s+long`)
	regULongLong     = regexp.MustCompile(`unsigned\s+long\s+long`)
	regUlong         = regexp.MustCompile(`unsigned\s+long`)
	regUint          = regexp.MustCompile(`unsigned\s+int`)
	regUshort        = regexp.MustCompile(`unsigned\s+short`)
	regUchar         = regexp.MustCompile(`unsigned\s+char`)
)

type ScanResult struct {
	Types     []types.Type
	Functions []types.Function
}

func Scan(raw []byte) (*ScanResult, error) {
	var buf bytes.Buffer
	// filter #
	for _, line := range bytes.Split(raw, []byte("\n")) {
		if !bytes.HasPrefix(line, []byte("#")) {
			buf.Write(line)
		}
	}
	// remove duplicates whitespace
	raw = regWhitespace.ReplaceAll(buf.Bytes(), []byte(" "))
	// replace known types
	raw = regLongLong.ReplaceAll(raw, []byte("longlong"))
	raw = regULongLong.ReplaceAll(raw, []byte("ulonglong"))
	raw = regUlong.ReplaceAll(raw, []byte("ulong"))
	raw = regUint.ReplaceAll(raw, []byte("uint"))
	raw = regUshort.ReplaceAll(raw, []byte("ushort"))
	raw = regUchar.ReplaceAll(raw, []byte("uchar"))
	s := string(raw)
	var result ScanResult
	// types
	result.Types = utils.Merge(
		utils.Transform(
			regTypedefSimple.FindAllStringSubmatch(s, -1),
			func(i int, e []string) types.Type {
				return types.Type{
					Name:       e[2],
					Full:       e[0],
					Attributes: commaSplit(e[1], e[3]),
				}
			},
		),
		utils.Transform(
			regTypedefStruct.FindAllStringSubmatch(s, -1),
			func(i int, e []string) types.Type {
				return types.Type{
					Name: e[1],
					Full: e[0],
				}
			},
		),
	)
	// functions
	result.Functions = utils.Transform(
		regFunction.FindAllStringSubmatch(s, -1),
		func(i int, match []string) types.Function {
			var args []types.Type
			for _, arg := range regArg.FindAllStringSubmatch(match[4], -1) {
				if arg[2] == "void" {
					continue
				}
				args = append(args, types.Type{
					Name: arg[2],
					Full: arg[1],
				})
			}
			var ret *types.Type
			if match[1] != "void" {
				ret = &types.Type{
					Name: match[1],
					Full: match[1],
				}
			}
			return types.Function{
				Name:      match[3],
				Attribute: match[2],
				Return:    ret,
				Args:      args,
			}
		},
	)
	return &result, nil
}


================================================
FILE: generator/scanner/scan_test.go
================================================
package scanner

import (
	"reflect"
	"regexp"
	"testing"

	"github.com/alivanz/go-simd/generator/types"
)

func TestAttribute(t *testing.T) {
	reg := regexp.MustCompile(attr + ";")
	result := reg.FindAllString(`
		__attribute__((__vector_size__(32), __aligned__(32)));
		__attribute__((neon_vector_type(8)));
	`, -1)
	ref := []string{
		"__attribute__((__vector_size__(32), __aligned__(32)));",
		"__attribute__((neon_vector_type(8)));",
	}
	t.Log(result)
	t.Log(ref)
	if !reflect.DeepEqual(result, ref) {
		t.Fail()
	}
}

func TestScan(t *testing.T) {
	result, err := Scan([]byte(`
		typedef char int8_t;
		typedef __attribute__((neon_vector_type(8))) int8_t int8x8_t;
		typedef double __m256d __attribute__((__vector_size__(32), __aligned__(32)));
		typedef struct int32x4x3_t {
			int32x4_t val[3];
		} int32x4x3_t;
		int func(int a, int b, int c) { return a+b+c; }
		static __inline__ __m128 __attribute__((__always_inline__, __nodebug__, __target__("mmx, sse"), __min_vector_width__(128))) _mm_move_ss(__m128 __a, __m128 __b) { __a[0] = __b[0]; return __a; }
		static __inline__ long long __attribute__((__always_inline__, __nodebug__, __target__("mmx"), __min_vector_width__(64))) _mm_cvtm64_si64(__m64 __m) { return 1; }
		void lolo(int a, long long b) { }
		void vovo(void) { }
	`))
	if err != nil {
		t.Fatal(err)
	}
	ref := &ScanResult{
		Types: []types.Type{
			{
				Name: "int8_t",
				Full: "typedef char int8_t;",
			},
			{
				Name:       "int8x8_t",
				Full:       "typedef __attribute__((neon_vector_type(8))) int8_t int8x8_t;",
				Attributes: []string{"neon_vector_type(8)"},
			},
			{
				Name:       "__m256d",
				Full:       "typedef double __m256d __attribute__((__vector_size__(32), __aligned__(32)));",
				Attributes: []string{"__vector_size__(32)", "__aligned__(32)"},
			},
			{
				Name: "int32x4x3_t",
				Full: "typedef struct int32x4x3_t { int32x4_t val[3]; } int32x4x3_t;",
			},
		},
		Functions: []types.Function{
			{
				Name: "func",
				Return: &types.Type{
					Name: "int",
					Full: "int",
				},
				Args: []types.Type{
					{
						Name: "int",
						Full: "int a",
					}, {
						Name: "int",
						Full: "int b",
					}, {
						Name: "int",
						Full: "int c",
					},
				},
			},
			{
				Name:      "_mm_move_ss",
				Attribute: `__always_inline__, __nodebug__, __target__("mmx, sse"), __min_vector_width__(128)`,
				Return: &types.Type{
					Name: "__m128",
					Full: "__m128",
				},
				Args: []types.Type{
					{
						Name: "__m128",
						Full: "__m128 __a",
					},
					{
						Name: "__m128",
						Full: "__m128 __b",
					},
				},
			},
			{
				Name:      "_mm_cvtm64_si64",
				Attribute: `__always_inline__, __nodebug__, __target__("mmx"), __min_vector_width__(64)`,
				Return: &types.Type{
					Name: "longlong",
					Full: "longlong",
				},
				Args: []types.Type{
					{
						Name: "__m64",
						Full: "__m64 __m",
					},
				},
			},
			{
				Name: "lolo",
				Args: []types.Type{
					{
						Name: "int",
						Full: "int a",
					}, {
						Name: "longlong",
						Full: "longlong b",
					},
				},
			}, {
				Name: "vovo",
			},
		},
	}
	t.Logf("%+v", result.Functions[4].Return)
	if !reflect.DeepEqual(result.Types, ref.Types) {
		t.Logf("%+v", result.Types)
		t.Logf("%+v", ref.Types)
		t.Fatal()
	}
	if !reflect.DeepEqual(result, ref) {
		t.Logf("%+v", result.Functions)
		t.Logf("%+v", ref.Functions)
		t.Fatal()
	}
}


================================================
FILE: generator/scanner/util.go
================================================
package scanner

func commaSplit(ss ...string) []string {
	switch len(ss) {
	case 0:
		return nil
	case 1:
		s := regWhitespace.ReplaceAllString(ss[0], " ")
		if len(s) == 0 {
			return nil
		}
		return regComma.Split(s, -1)
	default:
		return append(commaSplit(ss[0]), commaSplit(ss[1:]...)...)
	}
}


================================================
FILE: generator/types/function.go
================================================
package types

import (
	"regexp"
	"strings"
)

type Function struct {
	Name      string
	Args      []Type
	Return    *Type
	Attribute string
	Comment   string
}

type Arg struct {
	Name string
	Type string
}

var (
	regTarget = regexp.MustCompile(`__target__\("([a-z0-9\s,]+)"\)`)
)

func (f *Function) Target() string {
	match := regTarget.FindStringSubmatch(f.Attribute)
	if match == nil {
		return ""
	}
	return match[1]
}

func (fn *Function) Blacklisted() bool {
	for _, blacklist := range []string{
		"f16",
		"vcmla",
		"__extension__",
	} {
		if strings.Contains(fn.Name, blacklist) {
			return true
		}
	}
	return false
}


================================================
FILE: generator/types/type.go
================================================
package types

import (
	"strings"

	"github.com/iancoleman/strcase"
)

type Type struct {
	Name       string
	Full       string
	Attributes []string
}

func (t *Type) C() string {
	switch t.Name {
	case "longlong":
		return "long long"
	case "ulonglong":
		return "unsigned long long"
	case "ulong":
		return "unsigned long"
	case "uint":
		return "unsigned int"
	case "ushort":
		return "unsigned short"
	case "uchar":
		return "unsigned char"
	default:
		return t.Name
	}
}

func (t *Type) CGO() string {
	if !strings.Contains(t.Name, " ") {
		return t.Name
	}
	s := strings.Replace(t.Name, "unsigned", "u", -1)
	s = strings.Replace(s, " ", "", -1)
	return s
}

func (t *Type) Go(pkg string) string {
	s := strings.TrimSuffix(string(t.Name), "_t")
	s = strcase.ToCamel(s)
	if len(pkg) > 0 {
		return pkg + "." + s
	}
	return s
}

func (t *Type) Blacklisted() bool {
	for _, blacklist := range []string{
		"__darwin",
		"__int",
		"__uint",
		"__mm_storeh",
		"_tile",
		"_aligned",
		// float16
		"float16",
		"f16",
		"v8bf",
		"v8hf",
		"m128h",
		"m128bh",
		// windows?
		"crt",
		"_pi_",
		"mbstate_t",
	} {
		if strings.Contains(t.Name, blacklist) {
			return true
		}
	}
	return false
}


================================================
FILE: generator/utils/download.go
================================================
package utils

import (
	"io"
	"net/http"
	"os"
)

func Download(dst, url string) error {
	if _, err := os.Stat(dst); !os.IsNotExist(err) {
		return nil
	}
	resp, err := http.Get(url)
	if err != nil {
		return err
	}
	defer resp.Body.Close()
	f, err := os.OpenFile(dst, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, os.ModePerm)
	if err != nil {
		return err
	}
	defer f.Close()
	if _, err := io.Copy(f, resp.Body); err != nil {
		return err
	}
	return nil
}


================================================
FILE: generator/utils/filter.go
================================================
package utils

func Filter[T any](arr []T, fn func(e T) bool) []T {
	out := make([]T, 0, len(arr))
	for _, e := range arr {
		if !fn(e) {
			continue
		}
		out = append(out, e)
	}
	return out
}


================================================
FILE: generator/utils/slice.go
================================================
package utils

func Transform[A, B any](arr []A, fn func(i int, e A) B) []B {
	if arr == nil {
		return nil
	}
	out := make([]B, len(arr))
	for i, e := range arr {
		out[i] = fn(i, e)
	}
	return out
}

func Merge[T any](lists ...[]T) []T {
	var out []T
	for _, l := range lists {
		out = append(out, l...)
	}
	return out
}


================================================
FILE: generator/writer/cgo.go
================================================
package writer

import (
	"fmt"
	"strings"
)

func Cflags(flags []string) string {
	return fmt.Sprintf("#cgo CFLAGS: %s", strings.Join(flags, " "))
}

func Includes(headers []string) []string {
	out := make([]string, len(headers))
	for i, h := range headers {
		out[i] = fmt.Sprintf("#include <%s>", h)
	}
	return out
}


================================================
FILE: generator/writer/function.go
================================================
package writer

import (
	"fmt"
	"io"
	"strings"

	"github.com/alivanz/go-simd/generator/types"
	"github.com/alivanz/go-simd/generator/utils"
	"github.com/iancoleman/strcase"
)

func DeclareFunc(w io.Writer, f types.Function, typePkg string) error {
	fmt.Fprintf(w, "\n")
	if len(f.Comment) > 0 {
		fmt.Fprintf(w, "// %s\n", f.Comment)
	} else {
		fmt.Fprintf(w, "// %s\n", f.Name)
	}
	// if len(f.Attribute) > 0 {
	// 	fmt.Fprintf(w, "// %s\n", f.Attribute)
	// }
	fmt.Fprintf(w, "func %s(", strcase.ToCamel(f.Name))
	fmt.Fprintf(w, "%s", strings.Join(utils.Transform(f.Args, func(i int, t types.Type) string {
		return fmt.Sprintf("v%d %s", i, t.Go(typePkg))
	}), ", "))
	if f.Return == nil {
		fmt.Fprintf(w, ") {\n")
	} else {
		fmt.Fprintf(w, ") %s {\n", f.Return.Go(typePkg))
	}
	if f.Return == nil {
		fmt.Fprintf(w, "\tC.%s(%s)\n", f.Name, strings.Join(utils.Transform(f.Args, func(i int, t types.Type) string {
			return fmt.Sprintf("v%d", i)
		}), ", "))
	} else {
		fmt.Fprintf(w, "\treturn C.%s(%s)\n", f.Name, strings.Join(utils.Transform(f.Args, func(i int, t types.Type) string {
			return fmt.Sprintf("v%d", i)
		}), ", "))
	}
	fmt.Fprintf(w, "}\n")
	return nil
}

func DeclareFuncBypass(w io.Writer, f types.Function, typePkg string) error {
	fmt.Fprintf(w, "\n")
	if len(f.Comment) > 0 {
		fmt.Fprintf(w, "// %s\n", f.Comment)
	} else {
		fmt.Fprintf(w, "// %s\n", f.Name)
	}
	fmt.Fprintf(w, "//\n")
	fmt.Fprintf(w, "//go:linkname %s %s\n", strcase.ToCamel(f.Name), strcase.ToCamel(f.Name))
	fmt.Fprintf(w, "//go:noescape\n")
	fmt.Fprintf(w, "func %s(", strcase.ToCamel(f.Name))
	if f.Return != nil {
		fmt.Fprintf(w, "r *%s, ", f.Return.Go(typePkg))
	}
	fmt.Fprintf(w, "%s)\n", strings.Join(utils.Transform(f.Args, func(i int, t types.Type) string {
		return fmt.Sprintf("v%d *%s", i, t.Go(typePkg))
	}), ", "))
	return nil
}

func RewriteC(w io.Writer, f types.Function) error {
	var cargs []string
	if f.Return != nil {
		cargs = append(cargs, fmt.Sprintf("%s* r", f.Return.C()))
	}
	for i, t := range f.Args {
		cargs = append(cargs, fmt.Sprintf("%s* v%d", t.C(), i))
	}
	fmt.Fprintf(w, "void %s(%s) { ",
		strcase.ToCamel(f.Name),
		strings.Join(cargs, ", "),
	)
	if f.Return != nil {
		fmt.Fprintf(w, "*r = ")
	}
	fmt.Fprintf(w, "%s(%s); }\n", f.Name, strings.Join(utils.Transform(f.Args, func(i int, t types.Type) string {
		return fmt.Sprintf("*v%d", i)
	}), ", "))
	return nil
}


================================================
FILE: generator/writer/package.go
================================================
package writer

import (
	"fmt"
	"io"
	"strings"

	"github.com/alivanz/go-simd/generator/types"
)

func Package(w io.Writer, pkg string) error {
	_, err := fmt.Fprintf(w, "package %s\n", pkg)
	return err
}

func Import(w io.Writer, pkgs []string) error {
	if len(pkgs) == 0 {
		return nil
	}
	_, err := fmt.Fprintf(w, "\nimport (\n\t\"%s\"\n)\n", strings.Join(pkgs, "\"\n\t\""))
	return err
}

func ImportC(w io.Writer, fn func(w io.Writer) error) error {
	if _, err := fmt.Fprintf(w, "\n/*\n"); err != nil {
		return err
	}
	if err := fn(w); err != nil {
		return err
	}
	if _, err := fmt.Fprintf(w, "\n*/\nimport \"C\"\n"); err != nil {
		return err
	}
	return nil
}

func Types(w io.Writer, types []types.Type) error {
	for _, t := range types {
		if t.Blacklisted() {
			continue
		}
		if err := DeclareType(w, t); err != nil {
			return err
		}
	}
	return nil
}

func Funcs(w io.Writer, funcs []types.Function, typePkg string) error {
	for _, fn := range funcs {
		if fn.Blacklisted() {
			continue
		}
		if err := DeclareFunc(w, fn, typePkg); err != nil {
			return err
		}
	}
	return nil
}


================================================
FILE: generator/writer/package_test.go
================================================
package writer

import (
	"bytes"
	"io"
	"strings"
	"testing"
)

func TestPackage(t *testing.T) {
	var buf bytes.Buffer
	Package(&buf, "abc")
	if buf.String() != "package abc\n" {
		t.Fatal(buf.String())
	}
}

func TestImport(t *testing.T) {
	var buf bytes.Buffer
	Import(&buf, []string{
		"pkg1",
		"pkg2",
		"pkg3",
	})
	if buf.String() != `
import (
	"pkg1"
	"pkg2"
	"pkg3"
)
` {
		t.Fatal(buf.String())
	}
}

func TestImportC(t *testing.T) {
	var buf bytes.Buffer
	ImportC(&buf, func(w io.Writer) error {
		io.WriteString(w, strings.Join([]string{
			`#include <abc.h>`,
			`#include <def.h>`,
		}, "\n"))
		return nil
	})
	ref := `
/*
#include <abc.h>
#include <def.h>
*/
import "C"
`
	if buf.String() != ref {
		t.Fatal(buf.String())
	}
}


================================================
FILE: generator/writer/type.go
================================================
package writer

import (
	"fmt"
	"io"

	"github.com/alivanz/go-simd/generator/types"
)

func DeclareType(w io.Writer, t types.Type) error {
	var err error
	if len(t.Full) > 0 {
		_, err = fmt.Fprintf(w, "\n// %s\ntype %s = C.%s\n", t.Full, t.Go(""), t.CGO())
	} else {
		_, err = fmt.Fprintf(w, "\ntype %s = C.%s\n", t.Go(""), t.CGO())
	}
	return err
}


================================================
FILE: generator/writer/writer.go
================================================
package writer

import (
	"io"
	"os"
	"path/filepath"
)

func WriteToFile(dst string, fn func(w io.Writer) error) error {
	if len(dst) == 0 {
		return nil
	}
	dst, err := filepath.Abs(dst)
	if err != nil {
		return err
	}
	if err := os.MkdirAll(filepath.Dir(dst), os.ModePerm); err != nil {
		return err
	}
	f, err := os.OpenFile(dst, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, os.ModePerm)
	if err != nil {
		return err
	}
	defer f.Close()
	return fn(f)
}


================================================
FILE: generator/x86/info.go
================================================
package main

import (
	"bytes"
	"io/ioutil"
	"regexp"

	"github.com/alivanz/go-simd/generator/utils"
)

type Intrinsic struct {
	Name        string
	CpuID       string
	Description string
	Operation   string
}

var (
	regIntrinsic   = regexp.MustCompile(`<intrinsic .+?</intrinsic>`)
	regName        = regexp.MustCompile(`name="(.+?)"`)
	regDescription = regexp.MustCompile(`<description>(.+?)</description>`)
	regCpuID       = regexp.MustCompile(`<CPUID>(.+?)</CPUID>`)
)

func GetIntrinsic() ([]*Intrinsic, error) {
	if err := utils.Download(
		"data.xml",
		"https://www.intel.com/content/dam/develop/public/us/en/include/intrinsics-guide/data-3-6-6.xml",
	); err != nil {
		return nil, err
	}
	raw, err := ioutil.ReadFile("data.xml")
	if err != nil {
		return nil, err
	}
	raw = bytes.ReplaceAll(raw, []byte("\n"), []byte(""))
	intrins := regIntrinsic.FindAll(raw, -1)
	out := make([]*Intrinsic, len(intrins))
	for i, part := range intrins {
		var intrin Intrinsic
		if match := regName.FindSubmatch(part); match != nil {
			intrin.Name = string(match[1])
		}
		if match := regDescription.FindSubmatch(part); match != nil {
			intrin.Description = string(match[1])
		}
		if match := regCpuID.FindSubmatch(part); match != nil {
			intrin.CpuID = string(match[1])
		}
		out[i] = &intrin
	}
	return out, nil
}


================================================
FILE: generator/x86/main.go
================================================
package main

import (
	"bytes"
	"fmt"
	"io"
	"log"
	"os"
	"os/exec"
	"regexp"
	"strings"

	"github.com/alivanz/go-simd/generator/scanner"
	"github.com/alivanz/go-simd/generator/types"
	"github.com/alivanz/go-simd/generator/utils"
	"github.com/alivanz/go-simd/generator/writer"
)

var (
	regComma = regexp.MustCompile(`\s*,\s*`)
)

func main() {
	// generate
	cmd := exec.Command("clang", "-march=native", "-E", "-")
	cmd.Stdin = bytes.NewBufferString(strings.Join([]string{
		"#include <immintrin.h>",
	}, "\n"))
	cmd.Stderr = os.Stderr
	src, err := cmd.Output()
	if err != nil {
		log.Fatal(err)
	}
	// raw
	if err := writer.WriteToFile("raw.h", func(w io.Writer) error {
		_, err := w.Write(src)
		return err
	}); err != nil {
		log.Fatal(err)
	}
	// scan
	result, err := scanner.Scan(src)
	if err != nil {
		log.Fatal(err)
	}
	// filter functions
	mfunc := make(map[string]bool)
	result.Functions = utils.Filter(result.Functions, func(fn types.Function) bool {
		if mfunc[fn.Name] {
			return false
		}
		if len(fn.Target()) == 0 {
			return false
		}
		mfunc[fn.Name] = true
		return true
	})
	// filter types
	mtype := make(map[string]bool)
	for _, fn := range result.Functions {
		if fn.Return != nil {
			mtype[fn.Return.Name] = true
			// append type
			result.Types = append(result.Types, *fn.Return)
		}
		for _, arg := range fn.Args {
			mtype[arg.Name] = true
			result.Types = append(result.Types, arg)
		}
	}
	result.Types = utils.Filter(result.Types, func(t types.Type) bool {
		if !mtype[t.Name] {
			return false
		}
		// remove dup
		delete(mtype, t.Name)
		return true
	})
	// types
	if err := writer.WriteToFile("types.go", func(w io.Writer) error {
		if err := writer.Package(w, "x86"); err != nil {
			return err
		}
		if err := writer.ImportC(w, func(w io.Writer) error {
			fmt.Fprintf(w, "#include <immintrin.h>")
			return err
		}); err != nil {
			return err
		}
		if err := writer.Types(w, result.Types); err != nil {
			return err
		}
		return nil
	}); err != nil {
		log.Fatal(err)
	}
	// patch funcs
	intrins, err := GetIntrinsic()
	if err != nil {
		log.Fatal(err)
	}
	log.Printf("%+v", intrins[0])
	mintrin := make(map[string]*Intrinsic)
	for _, intrin := range intrins {
		mintrin[intrin.Name] = intrin
	}
	log.Printf("%+v", mintrin["_mm_fmsubadd_pd"])
	for i, fn := range result.Functions {
		if intrin, found := mintrin[fn.Name]; found {
			result.Functions[i].Comment = intrin.Description
		}
	}
	// group funcs by target
	mf := make(map[string][]types.Function)
	for _, fn := range result.Functions {
		target := fn.Target()
		mf[target] = append(mf[target], fn)
	}
	// funcs
	for target, funcs := range mf {
		target = regComma.ReplaceAllString(target, "_")
		cname := fmt.Sprintf("%s/functions.c", target)
		fname := fmt.Sprintf("%s/functions.go", target)
		// write C
		if err := writer.WriteToFile(cname, func(w io.Writer) error {
			if _, err := io.WriteString(w, "#include <immintrin.h>\n\n"); err != nil {
				return err
			}
			for _, fn := range funcs {
				if fn.Blacklisted() {
					continue
				}
				if err := writer.RewriteC(w, fn); err != nil {
					return err
				}
			}
			return nil
		}); err != nil {
			log.Fatal(err)
		}
		// write Go
		if err := writer.WriteToFile(fname, func(w io.Writer) error {
			if err := writer.Package(w, target); err != nil {
				return err
			}
			if err := writer.Import(w, []string{
				"github.com/alivanz/go-simd/x86",
			}); err != nil {
				return err
			}
			if err := writer.ImportC(w, func(w io.Writer) error {
				feats := strings.Split(target, "_")
				if len(feats) > 0 {
					fmt.Fprintf(w, "#cgo CFLAGS: %s\n", strings.Join(utils.Transform(feats, func(i int, feat string) string {
						return "-m" + feat
					}), " "))
				}
				fmt.Fprintf(w, "#include <immintrin.h>")
				return err
			}); err != nil {
				return err
			}
			for _, fn := range funcs {
				if fn.Blacklisted() {
					continue
				}
				if err := writer.DeclareFuncBypass(w, fn, "x86"); err != nil {
					return err
				}
			}
			return nil
		}); err != nil {
			log.Fatal(err)
		}
	}
}


================================================
FILE: go.mod
================================================
module github.com/alivanz/go-simd

go 1.20

require github.com/iancoleman/strcase v0.2.0


================================================
FILE: go.sum
================================================
github.com/iancoleman/strcase v0.2.0 h1:05I4QRnGpI0m37iZQRuskXh+w77mr6Z41lwQzuHLwW0=
github.com/iancoleman/strcase v0.2.0/go.mod h1:iwCmte+B7n89clKwxIoIXy/HfoL7AsD47ZCWhYzw7ho=


================================================
FILE: x86/aes/functions.c
================================================
#include <immintrin.h>

void MmAesencSi128(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_aesenc_si128(*v0, *v1); }
void MmAesenclastSi128(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_aesenclast_si128(*v0, *v1); }
void MmAesdecSi128(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_aesdec_si128(*v0, *v1); }
void MmAesdeclastSi128(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_aesdeclast_si128(*v0, *v1); }
void MmAesimcSi128(__m128i* r, __m128i* v0) { *r = _mm_aesimc_si128(*v0); }


================================================
FILE: x86/aes/functions.go
================================================
package aes

import (
	"github.com/alivanz/go-simd/x86"
)

/*
#cgo CFLAGS: -maes
#include <immintrin.h>
*/
import "C"

// Perform one round of an AES encryption flow on data (state) in "a" using the round key in "RoundKey", and store the result in "dst"."
//
//go:linkname MmAesencSi128 MmAesencSi128
//go:noescape
func MmAesencSi128(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Perform the last round of an AES encryption flow on data (state) in "a" using the round key in "RoundKey", and store the result in "dst"."
//
//go:linkname MmAesenclastSi128 MmAesenclastSi128
//go:noescape
func MmAesenclastSi128(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Perform one round of an AES decryption flow on data (state) in "a" using the round key in "RoundKey", and store the result in "dst".
//
//go:linkname MmAesdecSi128 MmAesdecSi128
//go:noescape
func MmAesdecSi128(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Perform the last round of an AES decryption flow on data (state) in "a" using the round key in "RoundKey", and store the result in "dst".
//
//go:linkname MmAesdeclastSi128 MmAesdeclastSi128
//go:noescape
func MmAesdeclastSi128(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Perform the InvMixColumns transformation on "a" and store the result in "dst".
//
//go:linkname MmAesimcSi128 MmAesimcSi128
//go:noescape
func MmAesimcSi128(r *x86.M128I, v0 *x86.M128I)


================================================
FILE: x86/avx/functions.c
================================================
#include <immintrin.h>

void Mm256AddPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_add_pd(*v0, *v1); }
void Mm256AddPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_add_ps(*v0, *v1); }
void Mm256SubPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_sub_pd(*v0, *v1); }
void Mm256SubPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_sub_ps(*v0, *v1); }
void Mm256AddsubPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_addsub_pd(*v0, *v1); }
void Mm256AddsubPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_addsub_ps(*v0, *v1); }
void Mm256DivPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_div_pd(*v0, *v1); }
void Mm256DivPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_div_ps(*v0, *v1); }
void Mm256MaxPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_max_pd(*v0, *v1); }
void Mm256MaxPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_max_ps(*v0, *v1); }
void Mm256MinPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_min_pd(*v0, *v1); }
void Mm256MinPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_min_ps(*v0, *v1); }
void Mm256MulPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_mul_pd(*v0, *v1); }
void Mm256MulPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_mul_ps(*v0, *v1); }
void Mm256SqrtPd(__m256d* r, __m256d* v0) { *r = _mm256_sqrt_pd(*v0); }
void Mm256SqrtPs(__m256* r, __m256* v0) { *r = _mm256_sqrt_ps(*v0); }
void Mm256RsqrtPs(__m256* r, __m256* v0) { *r = _mm256_rsqrt_ps(*v0); }
void Mm256RcpPs(__m256* r, __m256* v0) { *r = _mm256_rcp_ps(*v0); }
void Mm256AndPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_and_pd(*v0, *v1); }
void Mm256AndPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_and_ps(*v0, *v1); }
void Mm256AndnotPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_andnot_pd(*v0, *v1); }
void Mm256AndnotPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_andnot_ps(*v0, *v1); }
void Mm256OrPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_or_pd(*v0, *v1); }
void Mm256OrPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_or_ps(*v0, *v1); }
void Mm256XorPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_xor_pd(*v0, *v1); }
void Mm256XorPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_xor_ps(*v0, *v1); }
void Mm256HaddPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_hadd_pd(*v0, *v1); }
void Mm256HaddPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_hadd_ps(*v0, *v1); }
void Mm256HsubPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_hsub_pd(*v0, *v1); }
void Mm256HsubPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_hsub_ps(*v0, *v1); }
void MmPermutevarPd(__m128d* r, __m128d* v0, __m128i* v1) { *r = _mm_permutevar_pd(*v0, *v1); }
void Mm256PermutevarPd(__m256d* r, __m256d* v0, __m256i* v1) { *r = _mm256_permutevar_pd(*v0, *v1); }
void MmPermutevarPs(__m128* r, __m128* v0, __m128i* v1) { *r = _mm_permutevar_ps(*v0, *v1); }
void Mm256PermutevarPs(__m256* r, __m256* v0, __m256i* v1) { *r = _mm256_permutevar_ps(*v0, *v1); }
void Mm256BlendvPd(__m256d* r, __m256d* v0, __m256d* v1, __m256d* v2) { *r = _mm256_blendv_pd(*v0, *v1, *v2); }
void Mm256BlendvPs(__m256* r, __m256* v0, __m256* v1, __m256* v2) { *r = _mm256_blendv_ps(*v0, *v1, *v2); }
void Mm256Cvtepi32Pd(__m256d* r, __m128i* v0) { *r = _mm256_cvtepi32_pd(*v0); }
void Mm256Cvtepi32Ps(__m256* r, __m256i* v0) { *r = _mm256_cvtepi32_ps(*v0); }
void Mm256CvtpdPs(__m128* r, __m256d* v0) { *r = _mm256_cvtpd_ps(*v0); }
void Mm256CvtpsEpi32(__m256i* r, __m256* v0) { *r = _mm256_cvtps_epi32(*v0); }
void Mm256CvtpsPd(__m256d* r, __m128* v0) { *r = _mm256_cvtps_pd(*v0); }
void Mm256CvttpdEpi32(__m128i* r, __m256d* v0) { *r = _mm256_cvttpd_epi32(*v0); }
void Mm256CvtpdEpi32(__m128i* r, __m256d* v0) { *r = _mm256_cvtpd_epi32(*v0); }
void Mm256CvttpsEpi32(__m256i* r, __m256* v0) { *r = _mm256_cvttps_epi32(*v0); }
void Mm256CvtsdF64(double* r, __m256d* v0) { *r = _mm256_cvtsd_f64(*v0); }
void Mm256Cvtsi256Si32(int* r, __m256i* v0) { *r = _mm256_cvtsi256_si32(*v0); }
void Mm256CvtssF32(float* r, __m256* v0) { *r = _mm256_cvtss_f32(*v0); }
void Mm256MovehdupPs(__m256* r, __m256* v0) { *r = _mm256_movehdup_ps(*v0); }
void Mm256MoveldupPs(__m256* r, __m256* v0) { *r = _mm256_moveldup_ps(*v0); }
void Mm256MovedupPd(__m256d* r, __m256d* v0) { *r = _mm256_movedup_pd(*v0); }
void Mm256UnpackhiPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_unpackhi_pd(*v0, *v1); }
void Mm256UnpackloPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_unpacklo_pd(*v0, *v1); }
void Mm256UnpackhiPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_unpackhi_ps(*v0, *v1); }
void Mm256UnpackloPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_unpacklo_ps(*v0, *v1); }
void MmTestzPd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_testz_pd(*v0, *v1); }
void MmTestcPd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_testc_pd(*v0, *v1); }
void MmTestnzcPd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_testnzc_pd(*v0, *v1); }
void MmTestzPs(int* r, __m128* v0, __m128* v1) { *r = _mm_testz_ps(*v0, *v1); }
void MmTestcPs(int* r, __m128* v0, __m128* v1) { *r = _mm_testc_ps(*v0, *v1); }
void MmTestnzcPs(int* r, __m128* v0, __m128* v1) { *r = _mm_testnzc_ps(*v0, *v1); }
void Mm256TestzPd(int* r, __m256d* v0, __m256d* v1) { *r = _mm256_testz_pd(*v0, *v1); }
void Mm256TestcPd(int* r, __m256d* v0, __m256d* v1) { *r = _mm256_testc_pd(*v0, *v1); }
void Mm256TestnzcPd(int* r, __m256d* v0, __m256d* v1) { *r = _mm256_testnzc_pd(*v0, *v1); }
void Mm256TestzPs(int* r, __m256* v0, __m256* v1) { *r = _mm256_testz_ps(*v0, *v1); }
void Mm256TestcPs(int* r, __m256* v0, __m256* v1) { *r = _mm256_testc_ps(*v0, *v1); }
void Mm256TestnzcPs(int* r, __m256* v0, __m256* v1) { *r = _mm256_testnzc_ps(*v0, *v1); }
void Mm256TestzSi256(int* r, __m256i* v0, __m256i* v1) { *r = _mm256_testz_si256(*v0, *v1); }
void Mm256TestcSi256(int* r, __m256i* v0, __m256i* v1) { *r = _mm256_testc_si256(*v0, *v1); }
void Mm256TestnzcSi256(int* r, __m256i* v0, __m256i* v1) { *r = _mm256_testnzc_si256(*v0, *v1); }
void Mm256MovemaskPd(int* r, __m256d* v0) { *r = _mm256_movemask_pd(*v0); }
void Mm256MovemaskPs(int* r, __m256* v0) { *r = _mm256_movemask_ps(*v0); }
void Mm256Zeroall() { _mm256_zeroall(); }
void Mm256Zeroupper() { _mm256_zeroupper(); }
void Mm256UndefinedPd(__m256d* r) { *r = _mm256_undefined_pd(); }
void Mm256UndefinedPs(__m256* r) { *r = _mm256_undefined_ps(); }
void Mm256UndefinedSi256(__m256i* r) { *r = _mm256_undefined_si256(); }
void Mm256SetPd(__m256d* r, double* v0, double* v1, double* v2, double* v3) { *r = _mm256_set_pd(*v0, *v1, *v2, *v3); }
void Mm256SetPs(__m256* r, float* v0, float* v1, float* v2, float* v3, float* v4, float* v5, float* v6, float* v7) { *r = _mm256_set_ps(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7); }
void Mm256SetEpi32(__m256i* r, int* v0, int* v1, int* v2, int* v3, int* v4, int* v5, int* v6, int* v7) { *r = _mm256_set_epi32(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7); }
void Mm256SetEpi16(__m256i* r, short* v0, short* v1, short* v2, short* v3, short* v4, short* v5, short* v6, short* v7, short* v8, short* v9, short* v10, short* v11, short* v12, short* v13, short* v14, short* v15) { *r = _mm256_set_epi16(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7, *v8, *v9, *v10, *v11, *v12, *v13, *v14, *v15); }
void Mm256SetEpi8(__m256i* r, char* v0, char* v1, char* v2, char* v3, char* v4, char* v5, char* v6, char* v7, char* v8, char* v9, char* v10, char* v11, char* v12, char* v13, char* v14, char* v15, char* v16, char* v17, char* v18, char* v19, char* v20, char* v21, char* v22, char* v23, char* v24, char* v25, char* v26, char* v27, char* v28, char* v29, char* v30, char* v31) { *r = _mm256_set_epi8(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7, *v8, *v9, *v10, *v11, *v12, *v13, *v14, *v15, *v16, *v17, *v18, *v19, *v20, *v21, *v22, *v23, *v24, *v25, *v26, *v27, *v28, *v29, *v30, *v31); }
void Mm256SetEpi64X(__m256i* r, long long* v0, long long* v1, long long* v2, long long* v3) { *r = _mm256_set_epi64x(*v0, *v1, *v2, *v3); }
void Mm256SetrPd(__m256d* r, double* v0, double* v1, double* v2, double* v3) { *r = _mm256_setr_pd(*v0, *v1, *v2, *v3); }
void Mm256SetrPs(__m256* r, float* v0, float* v1, float* v2, float* v3, float* v4, float* v5, float* v6, float* v7) { *r = _mm256_setr_ps(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7); }
void Mm256SetrEpi32(__m256i* r, int* v0, int* v1, int* v2, int* v3, int* v4, int* v5, int* v6, int* v7) { *r = _mm256_setr_epi32(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7); }
void Mm256SetrEpi16(__m256i* r, short* v0, short* v1, short* v2, short* v3, short* v4, short* v5, short* v6, short* v7, short* v8, short* v9, short* v10, short* v11, short* v12, short* v13, short* v14, short* v15) { *r = _mm256_setr_epi16(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7, *v8, *v9, *v10, *v11, *v12, *v13, *v14, *v15); }
void Mm256SetrEpi8(__m256i* r, char* v0, char* v1, char* v2, char* v3, char* v4, char* v5, char* v6, char* v7, char* v8, char* v9, char* v10, char* v11, char* v12, char* v13, char* v14, char* v15, char* v16, char* v17, char* v18, char* v19, char* v20, char* v21, char* v22, char* v23, char* v24, char* v25, char* v26, char* v27, char* v28, char* v29, char* v30, char* v31) { *r = _mm256_setr_epi8(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7, *v8, *v9, *v10, *v11, *v12, *v13, *v14, *v15, *v16, *v17, *v18, *v19, *v20, *v21, *v22, *v23, *v24, *v25, *v26, *v27, *v28, *v29, *v30, *v31); }
void Mm256SetrEpi64X(__m256i* r, long long* v0, long long* v1, long long* v2, long long* v3) { *r = _mm256_setr_epi64x(*v0, *v1, *v2, *v3); }
void Mm256Set1Pd(__m256d* r, double* v0) { *r = _mm256_set1_pd(*v0); }
void Mm256Set1Ps(__m256* r, float* v0) { *r = _mm256_set1_ps(*v0); }
void Mm256Set1Epi32(__m256i* r, int* v0) { *r = _mm256_set1_epi32(*v0); }
void Mm256Set1Epi16(__m256i* r, short* v0) { *r = _mm256_set1_epi16(*v0); }
void Mm256Set1Epi8(__m256i* r, char* v0) { *r = _mm256_set1_epi8(*v0); }
void Mm256Set1Epi64X(__m256i* r, long long* v0) { *r = _mm256_set1_epi64x(*v0); }
void Mm256SetzeroPd(__m256d* r) { *r = _mm256_setzero_pd(); }
void Mm256SetzeroPs(__m256* r) { *r = _mm256_setzero_ps(); }
void Mm256SetzeroSi256(__m256i* r) { *r = _mm256_setzero_si256(); }
void Mm256CastpdPs(__m256* r, __m256d* v0) { *r = _mm256_castpd_ps(*v0); }
void Mm256CastpdSi256(__m256i* r, __m256d* v0) { *r = _mm256_castpd_si256(*v0); }
void Mm256CastpsPd(__m256d* r, __m256* v0) { *r = _mm256_castps_pd(*v0); }
void Mm256CastpsSi256(__m256i* r, __m256* v0) { *r = _mm256_castps_si256(*v0); }
void Mm256Castsi256Ps(__m256* r, __m256i* v0) { *r = _mm256_castsi256_ps(*v0); }
void Mm256Castsi256Pd(__m256d* r, __m256i* v0) { *r = _mm256_castsi256_pd(*v0); }
void Mm256Castpd256Pd128(__m128d* r, __m256d* v0) { *r = _mm256_castpd256_pd128(*v0); }
void Mm256Castps256Ps128(__m128* r, __m256* v0) { *r = _mm256_castps256_ps128(*v0); }
void Mm256Castsi256Si128(__m128i* r, __m256i* v0) { *r = _mm256_castsi256_si128(*v0); }
void Mm256Castpd128Pd256(__m256d* r, __m128d* v0) { *r = _mm256_castpd128_pd256(*v0); }
void Mm256Castps128Ps256(__m256* r, __m128* v0) { *r = _mm256_castps128_ps256(*v0); }
void Mm256Castsi128Si256(__m256i* r, __m128i* v0) { *r = _mm256_castsi128_si256(*v0); }
void Mm256Zextpd128Pd256(__m256d* r, __m128d* v0) { *r = _mm256_zextpd128_pd256(*v0); }
void Mm256Zextps128Ps256(__m256* r, __m128* v0) { *r = _mm256_zextps128_ps256(*v0); }
void Mm256Zextsi128Si256(__m256i* r, __m128i* v0) { *r = _mm256_zextsi128_si256(*v0); }
void Mm256SetM128(__m256* r, __m128* v0, __m128* v1) { *r = _mm256_set_m128(*v0, *v1); }
void Mm256SetM128D(__m256d* r, __m128d* v0, __m128d* v1) { *r = _mm256_set_m128d(*v0, *v1); }
void Mm256SetM128I(__m256i* r, __m128i* v0, __m128i* v1) { *r = _mm256_set_m128i(*v0, *v1); }
void Mm256SetrM128(__m256* r, __m128* v0, __m128* v1) { *r = _mm256_setr_m128(*v0, *v1); }
void Mm256SetrM128D(__m256d* r, __m128d* v0, __m128d* v1) { *r = _mm256_setr_m128d(*v0, *v1); }
void Mm256SetrM128I(__m256i* r, __m128i* v0, __m128i* v1) { *r = _mm256_setr_m128i(*v0, *v1); }


================================================
FILE: x86/avx/functions.go
================================================
package avx

import (
	"github.com/alivanz/go-simd/x86"
)

/*
#cgo CFLAGS: -mavx
#include <immintrin.h>
*/
import "C"

// Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
//
//go:linkname Mm256AddPd Mm256AddPd
//go:noescape
func Mm256AddPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)

// Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
//
//go:linkname Mm256AddPs Mm256AddPs
//go:noescape
func Mm256AddPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)

// Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
//
//go:linkname Mm256SubPd Mm256SubPd
//go:noescape
func Mm256SubPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)

// Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
//
//go:linkname Mm256SubPs Mm256SubPs
//go:noescape
func Mm256SubPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)

// Alternatively add and subtract packed double-precision (64-bit) floating-point elements in "a" to/from packed elements in "b", and store the results in "dst".
//
//go:linkname Mm256AddsubPd Mm256AddsubPd
//go:noescape
func Mm256AddsubPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)

// Alternatively add and subtract packed single-precision (32-bit) floating-point elements in "a" to/from packed elements in "b", and store the results in "dst".
//
//go:linkname Mm256AddsubPs Mm256AddsubPs
//go:noescape
func Mm256AddsubPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)

// Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst".
//
//go:linkname Mm256DivPd Mm256DivPd
//go:noescape
func Mm256DivPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)

// Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst".
//
//go:linkname Mm256DivPs Mm256DivPs
//go:noescape
func Mm256DivPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)

// Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [max_float_note]
//
//go:linkname Mm256MaxPd Mm256MaxPd
//go:noescape
func Mm256MaxPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)

// Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [max_float_note]
//
//go:linkname Mm256MaxPs Mm256MaxPs
//go:noescape
func Mm256MaxPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)

// Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [min_float_note]
//
//go:linkname Mm256MinPd Mm256MinPd
//go:noescape
func Mm256MinPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)

// Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [min_float_note]
//
//go:linkname Mm256MinPs Mm256MinPs
//go:noescape
func Mm256MinPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)

// Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
//
//go:linkname Mm256MulPd Mm256MulPd
//go:noescape
func Mm256MulPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)

// Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
//
//go:linkname Mm256MulPs Mm256MulPs
//go:noescape
func Mm256MulPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)

// Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
//
//go:linkname Mm256SqrtPd Mm256SqrtPd
//go:noescape
func Mm256SqrtPd(r *x86.M256D, v0 *x86.M256D)

// Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
//
//go:linkname Mm256SqrtPs Mm256SqrtPs
//go:noescape
func Mm256SqrtPs(r *x86.M256, v0 *x86.M256)

// Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12.
//
//go:linkname Mm256RsqrtPs Mm256RsqrtPs
//go:noescape
func Mm256RsqrtPs(r *x86.M256, v0 *x86.M256)

// Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12.
//
//go:linkname Mm256RcpPs Mm256RcpPs
//go:noescape
func Mm256RcpPs(r *x86.M256, v0 *x86.M256)

// Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
//
//go:linkname Mm256AndPd Mm256AndPd
//go:noescape
func Mm256AndPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)

// Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
//
//go:linkname Mm256AndPs Mm256AndPs
//go:noescape
func Mm256AndPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)

// Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst".
//
//go:linkname Mm256AndnotPd Mm256AndnotPd
//go:noescape
func Mm256AndnotPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)

// Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst".
//
//go:linkname Mm256AndnotPs Mm256AndnotPs
//go:noescape
func Mm256AndnotPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)

// Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
//
//go:linkname Mm256OrPd Mm256OrPd
//go:noescape
func Mm256OrPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)

// Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
//
//go:linkname Mm256OrPs Mm256OrPs
//go:noescape
func Mm256OrPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)

// Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
//
//go:linkname Mm256XorPd Mm256XorPd
//go:noescape
func Mm256XorPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)

// Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
//
//go:linkname Mm256XorPs Mm256XorPs
//go:noescape
func Mm256XorPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)

// Horizontally add adjacent pairs of double-precision (64-bit) floating-point elements in "a" and "b", and pack the results in "dst".
//
//go:linkname Mm256HaddPd Mm256HaddPd
//go:noescape
func Mm256HaddPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)

// Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in "a" and "b", and pack the results in "dst".
//
//go:linkname Mm256HaddPs Mm256HaddPs
//go:noescape
func Mm256HaddPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)

// Horizontally subtract adjacent pairs of double-precision (64-bit) floating-point elements in "a" and "b", and pack the results in "dst".
//
//go:linkname Mm256HsubPd Mm256HsubPd
//go:noescape
func Mm256HsubPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)

// Horizontally subtract adjacent pairs of single-precision (32-bit) floating-point elements in "a" and "b", and pack the results in "dst".
//
//go:linkname Mm256HsubPs Mm256HsubPs
//go:noescape
func Mm256HsubPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)

// Shuffle double-precision (64-bit) floating-point elements in "a" using the control in "b", and store the results in "dst".
//
//go:linkname MmPermutevarPd MmPermutevarPd
//go:noescape
func MmPermutevarPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128I)

// Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst".
//
//go:linkname Mm256PermutevarPd Mm256PermutevarPd
//go:noescape
func Mm256PermutevarPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256I)

// Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "b", and store the results in "dst".
//
//go:linkname MmPermutevarPs MmPermutevarPs
//go:noescape
func MmPermutevarPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128I)

// Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst".
//
//go:linkname Mm256PermutevarPs Mm256PermutevarPs
//go:noescape
func Mm256PermutevarPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256I)

// Blend packed double-precision (64-bit) floating-point elements from "a" and "b" using "mask", and store the results in "dst".
//
//go:linkname Mm256BlendvPd Mm256BlendvPd
//go:noescape
func Mm256BlendvPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D, v2 *x86.M256D)

// Blend packed single-precision (32-bit) floating-point elements from "a" and "b" using "mask", and store the results in "dst".
//
//go:linkname Mm256BlendvPs Mm256BlendvPs
//go:noescape
func Mm256BlendvPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256, v2 *x86.M256)

// Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".
//
//go:linkname Mm256Cvtepi32Pd Mm256Cvtepi32Pd
//go:noescape
func Mm256Cvtepi32Pd(r *x86.M256D, v0 *x86.M128I)

// Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
//
//go:linkname Mm256Cvtepi32Ps Mm256Cvtepi32Ps
//go:noescape
func Mm256Cvtepi32Ps(r *x86.M256, v0 *x86.M256I)

// Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
//
//go:linkname Mm256CvtpdPs Mm256CvtpdPs
//go:noescape
func Mm256CvtpdPs(r *x86.M128, v0 *x86.M256D)

// Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".
//
//go:linkname Mm256CvtpsEpi32 Mm256CvtpsEpi32
//go:noescape
func Mm256CvtpsEpi32(r *x86.M256I, v0 *x86.M256)

// Convert packed single-precision (32-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".
//
//go:linkname Mm256CvtpsPd Mm256CvtpsPd
//go:noescape
func Mm256CvtpsPd(r *x86.M256D, v0 *x86.M128)

// Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".
//
//go:linkname Mm256CvttpdEpi32 Mm256CvttpdEpi32
//go:noescape
func Mm256CvttpdEpi32(r *x86.M128I, v0 *x86.M256D)

// Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".
//
//go:linkname Mm256CvtpdEpi32 Mm256CvtpdEpi32
//go:noescape
func Mm256CvtpdEpi32(r *x86.M128I, v0 *x86.M256D)

// Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".
//
//go:linkname Mm256CvttpsEpi32 Mm256CvttpsEpi32
//go:noescape
func Mm256CvttpsEpi32(r *x86.M256I, v0 *x86.M256)

// Copy the lower double-precision (64-bit) floating-point element of "a" to "dst".
//
//go:linkname Mm256CvtsdF64 Mm256CvtsdF64
//go:noescape
func Mm256CvtsdF64(r *x86.Double, v0 *x86.M256D)

// Copy the lower 32-bit integer in "a" to "dst".
//
//go:linkname Mm256Cvtsi256Si32 Mm256Cvtsi256Si32
//go:noescape
func Mm256Cvtsi256Si32(r *x86.Int, v0 *x86.M256I)

// Copy the lower single-precision (32-bit) floating-point element of "a" to "dst".
//
//go:linkname Mm256CvtssF32 Mm256CvtssF32
//go:noescape
func Mm256CvtssF32(r *x86.Float, v0 *x86.M256)

// Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst".
//
//go:linkname Mm256MovehdupPs Mm256MovehdupPs
//go:noescape
func Mm256MovehdupPs(r *x86.M256, v0 *x86.M256)

// Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst".
//
//go:linkname Mm256MoveldupPs Mm256MoveldupPs
//go:noescape
func Mm256MoveldupPs(r *x86.M256, v0 *x86.M256)

// Duplicate even-indexed double-precision (64-bit) floating-point elements from "a", and store the results in "dst".
//
//go:linkname Mm256MovedupPd Mm256MovedupPd
//go:noescape
func Mm256MovedupPd(r *x86.M256D, v0 *x86.M256D)

// Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".
//
//go:linkname Mm256UnpackhiPd Mm256UnpackhiPd
//go:noescape
func Mm256UnpackhiPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)

// Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".
//
//go:linkname Mm256UnpackloPd Mm256UnpackloPd
//go:noescape
func Mm256UnpackloPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)

// Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".
//
//go:linkname Mm256UnpackhiPs Mm256UnpackhiPs
//go:noescape
func Mm256UnpackhiPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)

// Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".
//
//go:linkname Mm256UnpackloPs Mm256UnpackloPs
//go:noescape
func Mm256UnpackloPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)

// Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "ZF" value.
//
//go:linkname MmTestzPd MmTestzPd
//go:noescape
func MmTestzPd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)

// Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "CF" value.
//
//go:linkname MmTestcPd MmTestcPd
//go:noescape
func MmTestcPd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)

// Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0.
//
//go:linkname MmTestnzcPd MmTestnzcPd
//go:noescape
func MmTestnzcPd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)

// Compute the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "ZF" value.
//
//go:linkname MmTestzPs MmTestzPs
//go:noescape
func MmTestzPs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)

// Compute the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "CF" value.
//
//go:linkname MmTestcPs MmTestcPs
//go:noescape
func MmTestcPs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)

// Compute the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0.
//
//go:linkname MmTestnzcPs MmTestnzcPs
//go:noescape
func MmTestnzcPs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)

// Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "ZF" value.
//
//go:linkname Mm256TestzPd Mm256TestzPd
//go:noescape
func Mm256TestzPd(r *x86.Int, v0 *x86.M256D, v1 *x86.M256D)

// Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "CF" value.
//
//go:linkname Mm256TestcPd Mm256TestcPd
//go:noescape
func Mm256TestcPd(r *x86.Int, v0 *x86.M256D, v1 *x86.M256D)

// Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0.
//
//go:linkname Mm256TestnzcPd Mm256TestnzcPd
//go:noescape
func Mm256TestnzcPd(r *x86.Int, v0 *x86.M256D, v1 *x86.M256D)

// Compute the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "ZF" value.
//
//go:linkname Mm256TestzPs Mm256TestzPs
//go:noescape
func Mm256TestzPs(r *x86.Int, v0 *x86.M256, v1 *x86.M256)

// Compute the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "CF" value.
//
//go:linkname Mm256TestcPs Mm256TestcPs
//go:noescape
func Mm256TestcPs(r *x86.Int, v0 *x86.M256, v1 *x86.M256)

// Compute the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0.
//
//go:linkname Mm256TestnzcPs Mm256TestnzcPs
//go:noescape
func Mm256TestnzcPs(r *x86.Int, v0 *x86.M256, v1 *x86.M256)

// Compute the bitwise AND of 256 bits (representing integer data) in "a" and "b", and set "ZF" to 1 if the result is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", and set "CF" to 1 if the result is zero, otherwise set "CF" to 0. Return the "ZF" value.
//
//go:linkname Mm256TestzSi256 Mm256TestzSi256
//go:noescape
func Mm256TestzSi256(r *x86.Int, v0 *x86.M256I, v1 *x86.M256I)

// Compute the bitwise AND of 256 bits (representing integer data) in "a" and "b", and set "ZF" to 1 if the result is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", and set "CF" to 1 if the result is zero, otherwise set "CF" to 0. Return the "CF" value.
//
//go:linkname Mm256TestcSi256 Mm256TestcSi256
//go:noescape
func Mm256TestcSi256(r *x86.Int, v0 *x86.M256I, v1 *x86.M256I)

// Compute the bitwise AND of 256 bits (representing integer data) in "a" and "b", and set "ZF" to 1 if the result is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", and set "CF" to 1 if the result is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0.
//
//go:linkname Mm256TestnzcSi256 Mm256TestnzcSi256
//go:noescape
func Mm256TestnzcSi256(r *x86.Int, v0 *x86.M256I, v1 *x86.M256I)

// Set each bit of mask "dst" based on the most significant bit of the corresponding packed double-precision (64-bit) floating-point element in "a".
//
//go:linkname Mm256MovemaskPd Mm256MovemaskPd
//go:noescape
func Mm256MovemaskPd(r *x86.Int, v0 *x86.M256D)

// Set each bit of mask "dst" based on the most significant bit of the corresponding packed single-precision (32-bit) floating-point element in "a".
//
//go:linkname Mm256MovemaskPs Mm256MovemaskPs
//go:noescape
func Mm256MovemaskPs(r *x86.Int, v0 *x86.M256)

// Zero the contents of all XMM or YMM registers.
//
//go:linkname Mm256Zeroall Mm256Zeroall
//go:noescape
func Mm256Zeroall()

// Zero the upper 128 bits of all YMM registers; the lower 128-bits of the registers are unmodified.
//
//go:linkname Mm256Zeroupper Mm256Zeroupper
//go:noescape
func Mm256Zeroupper()

// Return vector of type __m256d with undefined elements.
//
//go:linkname Mm256UndefinedPd Mm256UndefinedPd
//go:noescape
func Mm256UndefinedPd(r *x86.M256D, )

// Return vector of type __m256 with undefined elements.
//
//go:linkname Mm256UndefinedPs Mm256UndefinedPs
//go:noescape
func Mm256UndefinedPs(r *x86.M256, )

// Return vector of type __m256i with undefined elements.
//
//go:linkname Mm256UndefinedSi256 Mm256UndefinedSi256
//go:noescape
func Mm256UndefinedSi256(r *x86.M256I, )

// Set packed double-precision (64-bit) floating-point elements in "dst" with the supplied values.
//
//go:linkname Mm256SetPd Mm256SetPd
//go:noescape
func Mm256SetPd(r *x86.M256D, v0 *x86.Double, v1 *x86.Double, v2 *x86.Double, v3 *x86.Double)

// Set packed single-precision (32-bit) floating-point elements in "dst" with the supplied values.
//
//go:linkname Mm256SetPs Mm256SetPs
//go:noescape
func Mm256SetPs(r *x86.M256, v0 *x86.Float, v1 *x86.Float, v2 *x86.Float, v3 *x86.Float, v4 *x86.Float, v5 *x86.Float, v6 *x86.Float, v7 *x86.Float)

// Set packed 32-bit integers in "dst" with the supplied values.
//
//go:linkname Mm256SetEpi32 Mm256SetEpi32
//go:noescape
func Mm256SetEpi32(r *x86.M256I, v0 *x86.Int, v1 *x86.Int, v2 *x86.Int, v3 *x86.Int, v4 *x86.Int, v5 *x86.Int, v6 *x86.Int, v7 *x86.Int)

// Set packed 16-bit integers in "dst" with the supplied values.
//
//go:linkname Mm256SetEpi16 Mm256SetEpi16
//go:noescape
func Mm256SetEpi16(r *x86.M256I, v0 *x86.Short, v1 *x86.Short, v2 *x86.Short, v3 *x86.Short, v4 *x86.Short, v5 *x86.Short, v6 *x86.Short, v7 *x86.Short, v8 *x86.Short, v9 *x86.Short, v10 *x86.Short, v11 *x86.Short, v12 *x86.Short, v13 *x86.Short, v14 *x86.Short, v15 *x86.Short)

// Set packed 8-bit integers in "dst" with the supplied values.
//
//go:linkname Mm256SetEpi8 Mm256SetEpi8
//go:noescape
func Mm256SetEpi8(r *x86.M256I, v0 *x86.Char, v1 *x86.Char, v2 *x86.Char, v3 *x86.Char, v4 *x86.Char, v5 *x86.Char, v6 *x86.Char, v7 *x86.Char, v8 *x86.Char, v9 *x86.Char, v10 *x86.Char, v11 *x86.Char, v12 *x86.Char, v13 *x86.Char, v14 *x86.Char, v15 *x86.Char, v16 *x86.Char, v17 *x86.Char, v18 *x86.Char, v19 *x86.Char, v20 *x86.Char, v21 *x86.Char, v22 *x86.Char, v23 *x86.Char, v24 *x86.Char, v25 *x86.Char, v26 *x86.Char, v27 *x86.Char, v28 *x86.Char, v29 *x86.Char, v30 *x86.Char, v31 *x86.Char)

// Set packed 64-bit integers in "dst" with the supplied values.
//
//go:linkname Mm256SetEpi64X Mm256SetEpi64X
//go:noescape
func Mm256SetEpi64X(r *x86.M256I, v0 *x86.Longlong, v1 *x86.Longlong, v2 *x86.Longlong, v3 *x86.Longlong)

// Set packed double-precision (64-bit) floating-point elements in "dst" with the supplied values in reverse order.
//
//go:linkname Mm256SetrPd Mm256SetrPd
//go:noescape
func Mm256SetrPd(r *x86.M256D, v0 *x86.Double, v1 *x86.Double, v2 *x86.Double, v3 *x86.Double)

// Set packed single-precision (32-bit) floating-point elements in "dst" with the supplied values in reverse order.
//
//go:linkname Mm256SetrPs Mm256SetrPs
//go:noescape
func Mm256SetrPs(r *x86.M256, v0 *x86.Float, v1 *x86.Float, v2 *x86.Float, v3 *x86.Float, v4 *x86.Float, v5 *x86.Float, v6 *x86.Float, v7 *x86.Float)

// Set packed 32-bit integers in "dst" with the supplied values in reverse order.
//
//go:linkname Mm256SetrEpi32 Mm256SetrEpi32
//go:noescape
func Mm256SetrEpi32(r *x86.M256I, v0 *x86.Int, v1 *x86.Int, v2 *x86.Int, v3 *x86.Int, v4 *x86.Int, v5 *x86.Int, v6 *x86.Int, v7 *x86.Int)

// Set packed 16-bit integers in "dst" with the supplied values in reverse order.
//
//go:linkname Mm256SetrEpi16 Mm256SetrEpi16
//go:noescape
func Mm256SetrEpi16(r *x86.M256I, v0 *x86.Short, v1 *x86.Short, v2 *x86.Short, v3 *x86.Short, v4 *x86.Short, v5 *x86.Short, v6 *x86.Short, v7 *x86.Short, v8 *x86.Short, v9 *x86.Short, v10 *x86.Short, v11 *x86.Short, v12 *x86.Short, v13 *x86.Short, v14 *x86.Short, v15 *x86.Short)

// Set packed 8-bit integers in "dst" with the supplied values in reverse order.
//
//go:linkname Mm256SetrEpi8 Mm256SetrEpi8
//go:noescape
func Mm256SetrEpi8(r *x86.M256I, v0 *x86.Char, v1 *x86.Char, v2 *x86.Char, v3 *x86.Char, v4 *x86.Char, v5 *x86.Char, v6 *x86.Char, v7 *x86.Char, v8 *x86.Char, v9 *x86.Char, v10 *x86.Char, v11 *x86.Char, v12 *x86.Char, v13 *x86.Char, v14 *x86.Char, v15 *x86.Char, v16 *x86.Char, v17 *x86.Char, v18 *x86.Char, v19 *x86.Char, v20 *x86.Char, v21 *x86.Char, v22 *x86.Char, v23 *x86.Char, v24 *x86.Char, v25 *x86.Char, v26 *x86.Char, v27 *x86.Char, v28 *x86.Char, v29 *x86.Char, v30 *x86.Char, v31 *x86.Char)

// Set packed 64-bit integers in "dst" with the supplied values in reverse order.
//
//go:linkname Mm256SetrEpi64X Mm256SetrEpi64X
//go:noescape
func Mm256SetrEpi64X(r *x86.M256I, v0 *x86.Longlong, v1 *x86.Longlong, v2 *x86.Longlong, v3 *x86.Longlong)

// Broadcast double-precision (64-bit) floating-point value "a" to all elements of "dst".
//
//go:linkname Mm256Set1Pd Mm256Set1Pd
//go:noescape
func Mm256Set1Pd(r *x86.M256D, v0 *x86.Double)

// Broadcast single-precision (32-bit) floating-point value "a" to all elements of "dst".
//
//go:linkname Mm256Set1Ps Mm256Set1Ps
//go:noescape
func Mm256Set1Ps(r *x86.M256, v0 *x86.Float)

// Broadcast 32-bit integer "a" to all elements of "dst". This intrinsic may generate the "vpbroadcastd".
//
//go:linkname Mm256Set1Epi32 Mm256Set1Epi32
//go:noescape
func Mm256Set1Epi32(r *x86.M256I, v0 *x86.Int)

// Broadcast 16-bit integer "a" to all all elements of "dst". This intrinsic may generate the "vpbroadcastw".
//
//go:linkname Mm256Set1Epi16 Mm256Set1Epi16
//go:noescape
func Mm256Set1Epi16(r *x86.M256I, v0 *x86.Short)

// Broadcast 8-bit integer "a" to all elements of "dst". This intrinsic may generate the "vpbroadcastb".
//
//go:linkname Mm256Set1Epi8 Mm256Set1Epi8
//go:noescape
func Mm256Set1Epi8(r *x86.M256I, v0 *x86.Char)

// Broadcast 64-bit integer "a" to all elements of "dst". This intrinsic may generate the "vpbroadcastq".
//
//go:linkname Mm256Set1Epi64X Mm256Set1Epi64X
//go:noescape
func Mm256Set1Epi64X(r *x86.M256I, v0 *x86.Longlong)

// Return vector of type __m256d with all elements set to zero.
//
//go:linkname Mm256SetzeroPd Mm256SetzeroPd
//go:noescape
func Mm256SetzeroPd(r *x86.M256D, )

// Return vector of type __m256 with all elements set to zero.
//
//go:linkname Mm256SetzeroPs Mm256SetzeroPs
//go:noescape
func Mm256SetzeroPs(r *x86.M256, )

// Return vector of type __m256i with all elements set to zero.
//
//go:linkname Mm256SetzeroSi256 Mm256SetzeroSi256
//go:noescape
func Mm256SetzeroSi256(r *x86.M256I, )

// Cast vector of type __m256d to type __m256.	This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
//
//go:linkname Mm256CastpdPs Mm256CastpdPs
//go:noescape
func Mm256CastpdPs(r *x86.M256, v0 *x86.M256D)

// Cast vector of type __m256d to type __m256i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
//
//go:linkname Mm256CastpdSi256 Mm256CastpdSi256
//go:noescape
func Mm256CastpdSi256(r *x86.M256I, v0 *x86.M256D)

// Cast vector of type __m256 to type __m256d.	This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
//
//go:linkname Mm256CastpsPd Mm256CastpsPd
//go:noescape
func Mm256CastpsPd(r *x86.M256D, v0 *x86.M256)

// Cast vector of type __m256 to type __m256i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
//
//go:linkname Mm256CastpsSi256 Mm256CastpsSi256
//go:noescape
func Mm256CastpsSi256(r *x86.M256I, v0 *x86.M256)

// Cast vector of type __m256i to type __m256. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
//
//go:linkname Mm256Castsi256Ps Mm256Castsi256Ps
//go:noescape
func Mm256Castsi256Ps(r *x86.M256, v0 *x86.M256I)

// Cast vector of type __m256i to type __m256d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
//
//go:linkname Mm256Castsi256Pd Mm256Castsi256Pd
//go:noescape
func Mm256Castsi256Pd(r *x86.M256D, v0 *x86.M256I)

// Cast vector of type __m256d to type __m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
//
//go:linkname Mm256Castpd256Pd128 Mm256Castpd256Pd128
//go:noescape
func Mm256Castpd256Pd128(r *x86.M128D, v0 *x86.M256D)

// Cast vector of type __m256 to type __m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
//
//go:linkname Mm256Castps256Ps128 Mm256Castps256Ps128
//go:noescape
func Mm256Castps256Ps128(r *x86.M128, v0 *x86.M256)

// Cast vector of type __m256i to type __m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
//
//go:linkname Mm256Castsi256Si128 Mm256Castsi256Si128
//go:noescape
func Mm256Castsi256Si128(r *x86.M128I, v0 *x86.M256I)

// Cast vector of type __m128d to type __m256d; the upper 128 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
//
//go:linkname Mm256Castpd128Pd256 Mm256Castpd128Pd256
//go:noescape
func Mm256Castpd128Pd256(r *x86.M256D, v0 *x86.M128D)

// Cast vector of type __m128 to type __m256; the upper 128 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
//
//go:linkname Mm256Castps128Ps256 Mm256Castps128Ps256
//go:noescape
func Mm256Castps128Ps256(r *x86.M256, v0 *x86.M128)

// Cast vector of type __m128i to type __m256i; the upper 128 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
//
//go:linkname Mm256Castsi128Si256 Mm256Castsi128Si256
//go:noescape
func Mm256Castsi128Si256(r *x86.M256I, v0 *x86.M128I)

// Cast vector of type __m128d to type __m256d; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
//
//go:linkname Mm256Zextpd128Pd256 Mm256Zextpd128Pd256
//go:noescape
func Mm256Zextpd128Pd256(r *x86.M256D, v0 *x86.M128D)

// Cast vector of type __m128 to type __m256; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
//
//go:linkname Mm256Zextps128Ps256 Mm256Zextps128Ps256
//go:noescape
func Mm256Zextps128Ps256(r *x86.M256, v0 *x86.M128)

// Cast vector of type __m128i to type __m256i; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
//
//go:linkname Mm256Zextsi128Si256 Mm256Zextsi128Si256
//go:noescape
func Mm256Zextsi128Si256(r *x86.M256I, v0 *x86.M128I)

// Set packed __m256 vector "dst" with the supplied values.
//
//go:linkname Mm256SetM128 Mm256SetM128
//go:noescape
func Mm256SetM128(r *x86.M256, v0 *x86.M128, v1 *x86.M128)

// Set packed __m256d vector "dst" with the supplied values.
//
//go:linkname Mm256SetM128D Mm256SetM128D
//go:noescape
func Mm256SetM128D(r *x86.M256D, v0 *x86.M128D, v1 *x86.M128D)

// Set packed __m256i vector "dst" with the supplied values.
//
//go:linkname Mm256SetM128I Mm256SetM128I
//go:noescape
func Mm256SetM128I(r *x86.M256I, v0 *x86.M128I, v1 *x86.M128I)

// Set packed __m256 vector "dst" with the supplied values.
//
//go:linkname Mm256SetrM128 Mm256SetrM128
//go:noescape
func Mm256SetrM128(r *x86.M256, v0 *x86.M128, v1 *x86.M128)

// Set packed __m256d vector "dst" with the supplied values.
//
//go:linkname Mm256SetrM128D Mm256SetrM128D
//go:noescape
func Mm256SetrM128D(r *x86.M256D, v0 *x86.M128D, v1 *x86.M128D)

// Set packed __m256i vector "dst" with the supplied values.
//
//go:linkname Mm256SetrM128I Mm256SetrM128I
//go:noescape
func Mm256SetrM128I(r *x86.M256I, v0 *x86.M128I, v1 *x86.M128I)


================================================
FILE: x86/avx2/functions.c
================================================
#include <immintrin.h>

void Mm256AbsEpi8(__m256i* r, __m256i* v0) { *r = _mm256_abs_epi8(*v0); }
void Mm256AbsEpi16(__m256i* r, __m256i* v0) { *r = _mm256_abs_epi16(*v0); }
void Mm256AbsEpi32(__m256i* r, __m256i* v0) { *r = _mm256_abs_epi32(*v0); }
void Mm256PacksEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_packs_epi16(*v0, *v1); }
void Mm256PacksEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_packs_epi32(*v0, *v1); }
void Mm256PackusEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_packus_epi16(*v0, *v1); }
void Mm256PackusEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_packus_epi32(*v0, *v1); }
void Mm256AddEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_add_epi8(*v0, *v1); }
void Mm256AddEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_add_epi16(*v0, *v1); }
void Mm256AddEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_add_epi32(*v0, *v1); }
void Mm256AddEpi64(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_add_epi64(*v0, *v1); }
void Mm256AddsEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_adds_epi8(*v0, *v1); }
void Mm256AddsEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_adds_epi16(*v0, *v1); }
void Mm256AddsEpu8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_adds_epu8(*v0, *v1); }
void Mm256AddsEpu16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_adds_epu16(*v0, *v1); }
void Mm256AndSi256(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_and_si256(*v0, *v1); }
void Mm256AndnotSi256(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_andnot_si256(*v0, *v1); }
void Mm256AvgEpu8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_avg_epu8(*v0, *v1); }
void Mm256AvgEpu16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_avg_epu16(*v0, *v1); }
void Mm256BlendvEpi8(__m256i* r, __m256i* v0, __m256i* v1, __m256i* v2) { *r = _mm256_blendv_epi8(*v0, *v1, *v2); }
void Mm256CmpeqEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_cmpeq_epi8(*v0, *v1); }
void Mm256CmpeqEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_cmpeq_epi16(*v0, *v1); }
void Mm256CmpeqEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_cmpeq_epi32(*v0, *v1); }
void Mm256CmpeqEpi64(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_cmpeq_epi64(*v0, *v1); }
void Mm256CmpgtEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_cmpgt_epi8(*v0, *v1); }
void Mm256CmpgtEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_cmpgt_epi16(*v0, *v1); }
void Mm256CmpgtEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_cmpgt_epi32(*v0, *v1); }
void Mm256CmpgtEpi64(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_cmpgt_epi64(*v0, *v1); }
void Mm256HaddEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_hadd_epi16(*v0, *v1); }
void Mm256HaddEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_hadd_epi32(*v0, *v1); }
void Mm256HaddsEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_hadds_epi16(*v0, *v1); }
void Mm256HsubEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_hsub_epi16(*v0, *v1); }
void Mm256HsubEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_hsub_epi32(*v0, *v1); }
void Mm256HsubsEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_hsubs_epi16(*v0, *v1); }
void Mm256MaddubsEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_maddubs_epi16(*v0, *v1); }
void Mm256MaddEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_madd_epi16(*v0, *v1); }
void Mm256MaxEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_max_epi8(*v0, *v1); }
void Mm256MaxEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_max_epi16(*v0, *v1); }
void Mm256MaxEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_max_epi32(*v0, *v1); }
void Mm256MaxEpu8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_max_epu8(*v0, *v1); }
void Mm256MaxEpu16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_max_epu16(*v0, *v1); }
void Mm256MaxEpu32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_max_epu32(*v0, *v1); }
void Mm256MinEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_min_epi8(*v0, *v1); }
void Mm256MinEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_min_epi16(*v0, *v1); }
void Mm256MinEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_min_epi32(*v0, *v1); }
void Mm256MinEpu8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_min_epu8(*v0, *v1); }
void Mm256MinEpu16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_min_epu16(*v0, *v1); }
void Mm256MinEpu32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_min_epu32(*v0, *v1); }
void Mm256MovemaskEpi8(int* r, __m256i* v0) { *r = _mm256_movemask_epi8(*v0); }
void Mm256Cvtepi8Epi16(__m256i* r, __m128i* v0) { *r = _mm256_cvtepi8_epi16(*v0); }
void Mm256Cvtepi8Epi32(__m256i* r, __m128i* v0) { *r = _mm256_cvtepi8_epi32(*v0); }
void Mm256Cvtepi8Epi64(__m256i* r, __m128i* v0) { *r = _mm256_cvtepi8_epi64(*v0); }
void Mm256Cvtepi16Epi32(__m256i* r, __m128i* v0) { *r = _mm256_cvtepi16_epi32(*v0); }
void Mm256Cvtepi16Epi64(__m256i* r, __m128i* v0) { *r = _mm256_cvtepi16_epi64(*v0); }
void Mm256Cvtepi32Epi64(__m256i* r, __m128i* v0) { *r = _mm256_cvtepi32_epi64(*v0); }
void Mm256Cvtepu8Epi16(__m256i* r, __m128i* v0) { *r = _mm256_cvtepu8_epi16(*v0); }
void Mm256Cvtepu8Epi32(__m256i* r, __m128i* v0) { *r = _mm256_cvtepu8_epi32(*v0); }
void Mm256Cvtepu8Epi64(__m256i* r, __m128i* v0) { *r = _mm256_cvtepu8_epi64(*v0); }
void Mm256Cvtepu16Epi32(__m256i* r, __m128i* v0) { *r = _mm256_cvtepu16_epi32(*v0); }
void Mm256Cvtepu16Epi64(__m256i* r, __m128i* v0) { *r = _mm256_cvtepu16_epi64(*v0); }
void Mm256Cvtepu32Epi64(__m256i* r, __m128i* v0) { *r = _mm256_cvtepu32_epi64(*v0); }
void Mm256MulEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_mul_epi32(*v0, *v1); }
void Mm256MulhrsEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_mulhrs_epi16(*v0, *v1); }
void Mm256MulhiEpu16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_mulhi_epu16(*v0, *v1); }
void Mm256MulhiEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_mulhi_epi16(*v0, *v1); }
void Mm256MulloEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_mullo_epi16(*v0, *v1); }
void Mm256MulloEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_mullo_epi32(*v0, *v1); }
void Mm256MulEpu32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_mul_epu32(*v0, *v1); }
void Mm256OrSi256(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_or_si256(*v0, *v1); }
void Mm256SadEpu8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sad_epu8(*v0, *v1); }
void Mm256ShuffleEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_shuffle_epi8(*v0, *v1); }
void Mm256SignEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sign_epi8(*v0, *v1); }
void Mm256SignEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sign_epi16(*v0, *v1); }
void Mm256SignEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sign_epi32(*v0, *v1); }
void Mm256SlliEpi16(__m256i* r, __m256i* v0, int* v1) { *r = _mm256_slli_epi16(*v0, *v1); }
void Mm256SllEpi16(__m256i* r, __m256i* v0, __m128i* v1) { *r = _mm256_sll_epi16(*v0, *v1); }
void Mm256SlliEpi32(__m256i* r, __m256i* v0, int* v1) { *r = _mm256_slli_epi32(*v0, *v1); }
void Mm256SllEpi32(__m256i* r, __m256i* v0, __m128i* v1) { *r = _mm256_sll_epi32(*v0, *v1); }
void Mm256SlliEpi64(__m256i* r, __m256i* v0, int* v1) { *r = _mm256_slli_epi64(*v0, *v1); }
void Mm256SllEpi64(__m256i* r, __m256i* v0, __m128i* v1) { *r = _mm256_sll_epi64(*v0, *v1); }
void Mm256SraiEpi16(__m256i* r, __m256i* v0, int* v1) { *r = _mm256_srai_epi16(*v0, *v1); }
void Mm256SraEpi16(__m256i* r, __m256i* v0, __m128i* v1) { *r = _mm256_sra_epi16(*v0, *v1); }
void Mm256SraiEpi32(__m256i* r, __m256i* v0, int* v1) { *r = _mm256_srai_epi32(*v0, *v1); }
void Mm256SraEpi32(__m256i* r, __m256i* v0, __m128i* v1) { *r = _mm256_sra_epi32(*v0, *v1); }
void Mm256SrliEpi16(__m256i* r, __m256i* v0, int* v1) { *r = _mm256_srli_epi16(*v0, *v1); }
void Mm256SrlEpi16(__m256i* r, __m256i* v0, __m128i* v1) { *r = _mm256_srl_epi16(*v0, *v1); }
void Mm256SrliEpi32(__m256i* r, __m256i* v0, int* v1) { *r = _mm256_srli_epi32(*v0, *v1); }
void Mm256SrlEpi32(__m256i* r, __m256i* v0, __m128i* v1) { *r = _mm256_srl_epi32(*v0, *v1); }
void Mm256SrliEpi64(__m256i* r, __m256i* v0, int* v1) { *r = _mm256_srli_epi64(*v0, *v1); }
void Mm256SrlEpi64(__m256i* r, __m256i* v0, __m128i* v1) { *r = _mm256_srl_epi64(*v0, *v1); }
void Mm256SubEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sub_epi8(*v0, *v1); }
void Mm256SubEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sub_epi16(*v0, *v1); }
void Mm256SubEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sub_epi32(*v0, *v1); }
void Mm256SubEpi64(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sub_epi64(*v0, *v1); }
void Mm256SubsEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_subs_epi8(*v0, *v1); }
void Mm256SubsEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_subs_epi16(*v0, *v1); }
void Mm256SubsEpu8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_subs_epu8(*v0, *v1); }
void Mm256SubsEpu16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_subs_epu16(*v0, *v1); }
void Mm256UnpackhiEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_unpackhi_epi8(*v0, *v1); }
void Mm256UnpackhiEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_unpackhi_epi16(*v0, *v1); }
void Mm256UnpackhiEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_unpackhi_epi32(*v0, *v1); }
void Mm256UnpackhiEpi64(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_unpackhi_epi64(*v0, *v1); }
void Mm256UnpackloEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_unpacklo_epi8(*v0, *v1); }
void Mm256UnpackloEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_unpacklo_epi16(*v0, *v1); }
void Mm256UnpackloEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_unpacklo_epi32(*v0, *v1); }
void Mm256UnpackloEpi64(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_unpacklo_epi64(*v0, *v1); }
void Mm256XorSi256(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_xor_si256(*v0, *v1); }
void MmBroadcastssPs(__m128* r, __m128* v0) { *r = _mm_broadcastss_ps(*v0); }
void MmBroadcastsdPd(__m128d* r, __m128d* v0) { *r = _mm_broadcastsd_pd(*v0); }
void Mm256BroadcastssPs(__m256* r, __m128* v0) { *r = _mm256_broadcastss_ps(*v0); }
void Mm256BroadcastsdPd(__m256d* r, __m128d* v0) { *r = _mm256_broadcastsd_pd(*v0); }
void Mm256Broadcastsi128Si256(__m256i* r, __m128i* v0) { *r = _mm256_broadcastsi128_si256(*v0); }
void Mm256BroadcastbEpi8(__m256i* r, __m128i* v0) { *r = _mm256_broadcastb_epi8(*v0); }
void Mm256BroadcastwEpi16(__m256i* r, __m128i* v0) { *r = _mm256_broadcastw_epi16(*v0); }
void Mm256BroadcastdEpi32(__m256i* r, __m128i* v0) { *r = _mm256_broadcastd_epi32(*v0); }
void Mm256BroadcastqEpi64(__m256i* r, __m128i* v0) { *r = _mm256_broadcastq_epi64(*v0); }
void MmBroadcastbEpi8(__m128i* r, __m128i* v0) { *r = _mm_broadcastb_epi8(*v0); }
void MmBroadcastwEpi16(__m128i* r, __m128i* v0) { *r = _mm_broadcastw_epi16(*v0); }
void MmBroadcastdEpi32(__m128i* r, __m128i* v0) { *r = _mm_broadcastd_epi32(*v0); }
void MmBroadcastqEpi64(__m128i* r, __m128i* v0) { *r = _mm_broadcastq_epi64(*v0); }
void Mm256Permutevar8X32Epi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_permutevar8x32_epi32(*v0, *v1); }
void Mm256Permutevar8X32Ps(__m256* r, __m256* v0, __m256i* v1) { *r = _mm256_permutevar8x32_ps(*v0, *v1); }
void Mm256SllvEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sllv_epi32(*v0, *v1); }
void MmSllvEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sllv_epi32(*v0, *v1); }
void Mm256SllvEpi64(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sllv_epi64(*v0, *v1); }
void MmSllvEpi64(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sllv_epi64(*v0, *v1); }
void Mm256SravEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_srav_epi32(*v0, *v1); }
void MmSravEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_srav_epi32(*v0, *v1); }
void Mm256SrlvEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_srlv_epi32(*v0, *v1); }
void MmSrlvEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_srlv_epi32(*v0, *v1); }
void Mm256SrlvEpi64(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_srlv_epi64(*v0, *v1); }
void MmSrlvEpi64(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_srlv_epi64(*v0, *v1); }


================================================
FILE: x86/avx2/functions.go
================================================
package avx2

import (
	"github.com/alivanz/go-simd/x86"
)

/*
#cgo CFLAGS: -mavx2
#include <immintrin.h>
*/
import "C"

// Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst".
//
//go:linkname Mm256AbsEpi8 Mm256AbsEpi8
//go:noescape
func Mm256AbsEpi8(r *x86.M256I, v0 *x86.M256I)

// Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst".
//
//go:linkname Mm256AbsEpi16 Mm256AbsEpi16
//go:noescape
func Mm256AbsEpi16(r *x86.M256I, v0 *x86.M256I)

// Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst".
//
//go:linkname Mm256AbsEpi32 Mm256AbsEpi32
//go:noescape
func Mm256AbsEpi32(r *x86.M256I, v0 *x86.M256I)

// Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst".
//
//go:linkname Mm256PacksEpi16 Mm256PacksEpi16
//go:noescape
func Mm256PacksEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst".
//
//go:linkname Mm256PacksEpi32 Mm256PacksEpi32
//go:noescape
func Mm256PacksEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst".
//
//go:linkname Mm256PackusEpi16 Mm256PackusEpi16
//go:noescape
func Mm256PackusEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using unsigned saturation, and store the results in "dst".
//
//go:linkname Mm256PackusEpi32 Mm256PackusEpi32
//go:noescape
func Mm256PackusEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Add packed 8-bit integers in "a" and "b", and store the results in "dst".
//
//go:linkname Mm256AddEpi8 Mm256AddEpi8
//go:noescape
func Mm256AddEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Add packed 16-bit integers in "a" and "b", and store the results in "dst".
//
//go:linkname Mm256AddEpi16 Mm256AddEpi16
//go:noescape
func Mm256AddEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Add packed 32-bit integers in "a" and "b", and store the results in "dst".
//
//go:linkname Mm256AddEpi32 Mm256AddEpi32
//go:noescape
func Mm256AddEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Add packed 64-bit integers in "a" and "b", and store the results in "dst".
//
//go:linkname Mm256AddEpi64 Mm256AddEpi64
//go:noescape
func Mm256AddEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Add packed 8-bit integers in "a" and "b" using saturation, and store the results in "dst".
//
//go:linkname Mm256AddsEpi8 Mm256AddsEpi8
//go:noescape
func Mm256AddsEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Add packed 16-bit integers in "a" and "b" using saturation, and store the results in "dst".
//
//go:linkname Mm256AddsEpi16 Mm256AddsEpi16
//go:noescape
func Mm256AddsEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst".
//
//go:linkname Mm256AddsEpu8 Mm256AddsEpu8
//go:noescape
func Mm256AddsEpu8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst".
//
//go:linkname Mm256AddsEpu16 Mm256AddsEpu16
//go:noescape
func Mm256AddsEpu16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Compute the bitwise AND of 256 bits (representing integer data) in "a" and "b", and store the result in "dst".
//
//go:linkname Mm256AndSi256 Mm256AndSi256
//go:noescape
func Mm256AndSi256(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Compute the bitwise NOT of 256 bits (representing integer data) in "a" and then AND with "b", and store the result in "dst".
//
//go:linkname Mm256AndnotSi256 Mm256AndnotSi256
//go:noescape
func Mm256AndnotSi256(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst".
//
//go:linkname Mm256AvgEpu8 Mm256AvgEpu8
//go:noescape
func Mm256AvgEpu8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst".
//
//go:linkname Mm256AvgEpu16 Mm256AvgEpu16
//go:noescape
func Mm256AvgEpu16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Blend packed 8-bit integers from "a" and "b" using "mask", and store the results in "dst".
//
//go:linkname Mm256BlendvEpi8 Mm256BlendvEpi8
//go:noescape
func Mm256BlendvEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I, v2 *x86.M256I)

// Compare packed 8-bit integers in "a" and "b" for equality, and store the results in "dst".
//
//go:linkname Mm256CmpeqEpi8 Mm256CmpeqEpi8
//go:noescape
func Mm256CmpeqEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Compare packed 16-bit integers in "a" and "b" for equality, and store the results in "dst".
//
//go:linkname Mm256CmpeqEpi16 Mm256CmpeqEpi16
//go:noescape
func Mm256CmpeqEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Compare packed 32-bit integers in "a" and "b" for equality, and store the results in "dst".
//
//go:linkname Mm256CmpeqEpi32 Mm256CmpeqEpi32
//go:noescape
func Mm256CmpeqEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Compare packed 64-bit integers in "a" and "b" for equality, and store the results in "dst".
//
//go:linkname Mm256CmpeqEpi64 Mm256CmpeqEpi64
//go:noescape
func Mm256CmpeqEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in "dst".
//
//go:linkname Mm256CmpgtEpi8 Mm256CmpgtEpi8
//go:noescape
func Mm256CmpgtEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in "dst".
//
//go:linkname Mm256CmpgtEpi16 Mm256CmpgtEpi16
//go:noescape
func Mm256CmpgtEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in "dst".
//
//go:linkname Mm256CmpgtEpi32 Mm256CmpgtEpi32
//go:noescape
func Mm256CmpgtEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Compare packed signed 64-bit integers in "a" and "b" for greater-than, and store the results in "dst".
//
//go:linkname Mm256CmpgtEpi64 Mm256CmpgtEpi64
//go:noescape
func Mm256CmpgtEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Horizontally add adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst".
//
//go:linkname Mm256HaddEpi16 Mm256HaddEpi16
//go:noescape
func Mm256HaddEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Horizontally add adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst".
//
//go:linkname Mm256HaddEpi32 Mm256HaddEpi32
//go:noescape
func Mm256HaddEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Horizontally add adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst".
//
//go:linkname Mm256HaddsEpi16 Mm256HaddsEpi16
//go:noescape
func Mm256HaddsEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Horizontally subtract adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst".
//
//go:linkname Mm256HsubEpi16 Mm256HsubEpi16
//go:noescape
func Mm256HsubEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Horizontally subtract adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst".
//
//go:linkname Mm256HsubEpi32 Mm256HsubEpi32
//go:noescape
func Mm256HsubEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Horizontally subtract adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst".
//
//go:linkname Mm256HsubsEpi16 Mm256HsubsEpi16
//go:noescape
func Mm256HsubsEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Vertically multiply each unsigned 8-bit integer from "a" with the corresponding signed 8-bit integer from "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst".
//
//go:linkname Mm256MaddubsEpi16 Mm256MaddubsEpi16
//go:noescape
func Mm256MaddubsEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst".
//
//go:linkname Mm256MaddEpi16 Mm256MaddEpi16
//go:noescape
func Mm256MaddEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Compare packed signed 8-bit integers in "a" and "b", and store packed maximum values in "dst".
//
//go:linkname Mm256MaxEpi8 Mm256MaxEpi8
//go:noescape
func Mm256MaxEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst".
//
//go:linkname Mm256MaxEpi16 Mm256MaxEpi16
//go:noescape
func Mm256MaxEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Compare packed signed 32-bit integers in "a" and "b", and store packed maximum values in "dst".
//
//go:linkname Mm256MaxEpi32 Mm256MaxEpi32
//go:noescape
func Mm256MaxEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst".
//
//go:linkname Mm256MaxEpu8 Mm256MaxEpu8
//go:noescape
func Mm256MaxEpu8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Compare packed unsigned 16-bit integers in "a" and "b", and store packed maximum values in "dst".
//
//go:linkname Mm256MaxEpu16 Mm256MaxEpu16
//go:noescape
func Mm256MaxEpu16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Compare packed unsigned 32-bit integers in "a" and "b", and store packed maximum values in "dst".
//
//go:linkname Mm256MaxEpu32 Mm256MaxEpu32
//go:noescape
func Mm256MaxEpu32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Compare packed signed 8-bit integers in "a" and "b", and store packed minimum values in "dst".
//
//go:linkname Mm256MinEpi8 Mm256MinEpi8
//go:noescape
func Mm256MinEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst".
//
//go:linkname Mm256MinEpi16 Mm256MinEpi16
//go:noescape
func Mm256MinEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Compare packed signed 32-bit integers in "a" and "b", and store packed minimum values in "dst".
//
//go:linkname Mm256MinEpi32 Mm256MinEpi32
//go:noescape
func Mm256MinEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst".
//
//go:linkname Mm256MinEpu8 Mm256MinEpu8
//go:noescape
func Mm256MinEpu8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Compare packed unsigned 16-bit integers in "a" and "b", and store packed minimum values in "dst".
//
//go:linkname Mm256MinEpu16 Mm256MinEpu16
//go:noescape
func Mm256MinEpu16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Compare packed unsigned 32-bit integers in "a" and "b", and store packed minimum values in "dst".
//
//go:linkname Mm256MinEpu32 Mm256MinEpu32
//go:noescape
func Mm256MinEpu32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Create mask from the most significant bit of each 8-bit element in "a", and store the result in "dst".
//
//go:linkname Mm256MovemaskEpi8 Mm256MovemaskEpi8
//go:noescape
func Mm256MovemaskEpi8(r *x86.Int, v0 *x86.M256I)

// Sign extend packed 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst".
//
//go:linkname Mm256Cvtepi8Epi16 Mm256Cvtepi8Epi16
//go:noescape
func Mm256Cvtepi8Epi16(r *x86.M256I, v0 *x86.M128I)

// Sign extend packed 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst".
//
//go:linkname Mm256Cvtepi8Epi32 Mm256Cvtepi8Epi32
//go:noescape
func Mm256Cvtepi8Epi32(r *x86.M256I, v0 *x86.M128I)

// Sign extend packed 8-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst".
//
//go:linkname Mm256Cvtepi8Epi64 Mm256Cvtepi8Epi64
//go:noescape
func Mm256Cvtepi8Epi64(r *x86.M256I, v0 *x86.M128I)

// Sign extend packed 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst".
//
//go:linkname Mm256Cvtepi16Epi32 Mm256Cvtepi16Epi32
//go:noescape
func Mm256Cvtepi16Epi32(r *x86.M256I, v0 *x86.M128I)

// Sign extend packed 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst".
//
//go:linkname Mm256Cvtepi16Epi64 Mm256Cvtepi16Epi64
//go:noescape
func Mm256Cvtepi16Epi64(r *x86.M256I, v0 *x86.M128I)

// Sign extend packed 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst".
//
//go:linkname Mm256Cvtepi32Epi64 Mm256Cvtepi32Epi64
//go:noescape
func Mm256Cvtepi32Epi64(r *x86.M256I, v0 *x86.M128I)

// Zero extend packed unsigned 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst".
//
//go:linkname Mm256Cvtepu8Epi16 Mm256Cvtepu8Epi16
//go:noescape
func Mm256Cvtepu8Epi16(r *x86.M256I, v0 *x86.M128I)

// Zero extend packed unsigned 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst".
//
//go:linkname Mm256Cvtepu8Epi32 Mm256Cvtepu8Epi32
//go:noescape
func Mm256Cvtepu8Epi32(r *x86.M256I, v0 *x86.M128I)

// Zero extend packed unsigned 8-bit integers in the low 8 byte sof "a" to packed 64-bit integers, and store the results in "dst".
//
//go:linkname Mm256Cvtepu8Epi64 Mm256Cvtepu8Epi64
//go:noescape
func Mm256Cvtepu8Epi64(r *x86.M256I, v0 *x86.M128I)

// Zero extend packed unsigned 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst".
//
//go:linkname Mm256Cvtepu16Epi32 Mm256Cvtepu16Epi32
//go:noescape
func Mm256Cvtepu16Epi32(r *x86.M256I, v0 *x86.M128I)

// Zero extend packed unsigned 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst".
//
//go:linkname Mm256Cvtepu16Epi64 Mm256Cvtepu16Epi64
//go:noescape
func Mm256Cvtepu16Epi64(r *x86.M256I, v0 *x86.M128I)

// Zero extend packed unsigned 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst".
//
//go:linkname Mm256Cvtepu32Epi64 Mm256Cvtepu32Epi64
//go:noescape
func Mm256Cvtepu32Epi64(r *x86.M256I, v0 *x86.M128I)

// Multiply the low signed 32-bit integers from each packed 64-bit element in "a" and "b", and store the signed 64-bit results in "dst".
//
//go:linkname Mm256MulEpi32 Mm256MulEpi32
//go:noescape
func Mm256MulEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst".
//
//go:linkname Mm256MulhrsEpi16 Mm256MulhrsEpi16
//go:noescape
func Mm256MulhrsEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst".
//
//go:linkname Mm256MulhiEpu16 Mm256MulhiEpu16
//go:noescape
func Mm256MulhiEpu16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst".
//
//go:linkname Mm256MulhiEpi16 Mm256MulhiEpi16
//go:noescape
func Mm256MulhiEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst".
//
//go:linkname Mm256MulloEpi16 Mm256MulloEpi16
//go:noescape
func Mm256MulloEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Multiply the packed signed 32-bit integers in "a" and "b", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in "dst".
//
//go:linkname Mm256MulloEpi32 Mm256MulloEpi32
//go:noescape
func Mm256MulloEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst".
//
//go:linkname Mm256MulEpu32 Mm256MulEpu32
//go:noescape
func Mm256MulEpu32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Compute the bitwise OR of 256 bits (representing integer data) in "a" and "b", and store the result in "dst".
//
//go:linkname Mm256OrSi256 Mm256OrSi256
//go:noescape
func Mm256OrSi256(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Compute the absolute differences of packed unsigned 8-bit integers in "a" and "b", then horizontally sum each consecutive 8 differences to produce four unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of 64-bit elements in "dst".
//
//go:linkname Mm256SadEpu8 Mm256SadEpu8
//go:noescape
func Mm256SadEpu8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Shuffle 8-bit integers in "a" within 128-bit lanes according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst".
//
//go:linkname Mm256ShuffleEpi8 Mm256ShuffleEpi8
//go:noescape
func Mm256ShuffleEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Negate packed signed 8-bit integers in "a" when the corresponding signed 8-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero.
//
//go:linkname Mm256SignEpi8 Mm256SignEpi8
//go:noescape
func Mm256SignEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Negate packed signed 16-bit integers in "a" when the corresponding signed 16-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero.
//
//go:linkname Mm256SignEpi16 Mm256SignEpi16
//go:noescape
func Mm256SignEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Negate packed signed 32-bit integers in "a" when the corresponding signed 32-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero.
//
//go:linkname Mm256SignEpi32 Mm256SignEpi32
//go:noescape
func Mm256SignEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".
//
//go:linkname Mm256SlliEpi16 Mm256SlliEpi16
//go:noescape
func Mm256SlliEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.Int)

// Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".
//
//go:linkname Mm256SllEpi16 Mm256SllEpi16
//go:noescape
func Mm256SllEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M128I)

// Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".
//
//go:linkname Mm256SlliEpi32 Mm256SlliEpi32
//go:noescape
func Mm256SlliEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.Int)

// Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".
//
//go:linkname Mm256SllEpi32 Mm256SllEpi32
//go:noescape
func Mm256SllEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M128I)

// Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".
//
//go:linkname Mm256SlliEpi64 Mm256SlliEpi64
//go:noescape
func Mm256SlliEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.Int)

// Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".
//
//go:linkname Mm256SllEpi64 Mm256SllEpi64
//go:noescape
func Mm256SllEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M128I)

// Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".
//
//go:linkname Mm256SraiEpi16 Mm256SraiEpi16
//go:noescape
func Mm256SraiEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.Int)

// Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".
//
//go:linkname Mm256SraEpi16 Mm256SraEpi16
//go:noescape
func Mm256SraEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M128I)

// Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".
//
//go:linkname Mm256SraiEpi32 Mm256SraiEpi32
//go:noescape
func Mm256SraiEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.Int)

// Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".
//
//go:linkname Mm256SraEpi32 Mm256SraEpi32
//go:noescape
func Mm256SraEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M128I)

// Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".
//
//go:linkname Mm256SrliEpi16 Mm256SrliEpi16
//go:noescape
func Mm256SrliEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.Int)

// Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".
//
//go:linkname Mm256SrlEpi16 Mm256SrlEpi16
//go:noescape
func Mm256SrlEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M128I)

// Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".
//
//go:linkname Mm256SrliEpi32 Mm256SrliEpi32
//go:noescape
func Mm256SrliEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.Int)

// Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".
//
//go:linkname Mm256SrlEpi32 Mm256SrlEpi32
//go:noescape
func Mm256SrlEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M128I)

// Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".
//
//go:linkname Mm256SrliEpi64 Mm256SrliEpi64
//go:noescape
func Mm256SrliEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.Int)

// Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".
//
//go:linkname Mm256SrlEpi64 Mm256SrlEpi64
//go:noescape
func Mm256SrlEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M128I)

// Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst".
//
//go:linkname Mm256SubEpi8 Mm256SubEpi8
//go:noescape
func Mm256SubEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst".
//
//go:linkname Mm256SubEpi16 Mm256SubEpi16
//go:noescape
func Mm256SubEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst".
//
//go:linkname Mm256SubEpi32 Mm256SubEpi32
//go:noescape
func Mm256SubEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst".
//
//go:linkname Mm256SubEpi64 Mm256SubEpi64
//go:noescape
func Mm256SubEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst".
//
//go:linkname Mm256SubsEpi8 Mm256SubsEpi8
//go:noescape
func Mm256SubsEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst".
//
//go:linkname Mm256SubsEpi16 Mm256SubsEpi16
//go:noescape
func Mm256SubsEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst".
//
//go:linkname Mm256SubsEpu8 Mm256SubsEpu8
//go:noescape
func Mm256SubsEpu8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst".
//
//go:linkname Mm256SubsEpu16 Mm256SubsEpu16
//go:noescape
func Mm256SubsEpu16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Unpack and interleave 8-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".
//
//go:linkname Mm256UnpackhiEpi8 Mm256UnpackhiEpi8
//go:noescape
func Mm256UnpackhiEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Unpack and interleave 16-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".
//
//go:linkname Mm256UnpackhiEpi16 Mm256UnpackhiEpi16
//go:noescape
func Mm256UnpackhiEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Unpack and interleave 32-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".
//
//go:linkname Mm256UnpackhiEpi32 Mm256UnpackhiEpi32
//go:noescape
func Mm256UnpackhiEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Unpack and interleave 64-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".
//
//go:linkname Mm256UnpackhiEpi64 Mm256UnpackhiEpi64
//go:noescape
func Mm256UnpackhiEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Unpack and interleave 8-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".
//
//go:linkname Mm256UnpackloEpi8 Mm256UnpackloEpi8
//go:noescape
func Mm256UnpackloEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Unpack and interleave 16-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".
//
//go:linkname Mm256UnpackloEpi16 Mm256UnpackloEpi16
//go:noescape
func Mm256UnpackloEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Unpack and interleave 32-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".
//
//go:linkname Mm256UnpackloEpi32 Mm256UnpackloEpi32
//go:noescape
func Mm256UnpackloEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Unpack and interleave 64-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".
//
//go:linkname Mm256UnpackloEpi64 Mm256UnpackloEpi64
//go:noescape
func Mm256UnpackloEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Compute the bitwise XOR of 256 bits (representing integer data) in "a" and "b", and store the result in "dst".
//
//go:linkname Mm256XorSi256 Mm256XorSi256
//go:noescape
func Mm256XorSi256(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst".
//
//go:linkname MmBroadcastssPs MmBroadcastssPs
//go:noescape
func MmBroadcastssPs(r *x86.M128, v0 *x86.M128)

// Broadcast the low double-precision (64-bit) floating-point element from "a" to all elements of "dst".
//
//go:linkname MmBroadcastsdPd MmBroadcastsdPd
//go:noescape
func MmBroadcastsdPd(r *x86.M128D, v0 *x86.M128D)

// Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst".
//
//go:linkname Mm256BroadcastssPs Mm256BroadcastssPs
//go:noescape
func Mm256BroadcastssPs(r *x86.M256, v0 *x86.M128)

// Broadcast the low double-precision (64-bit) floating-point element from "a" to all elements of "dst".
//
//go:linkname Mm256BroadcastsdPd Mm256BroadcastsdPd
//go:noescape
func Mm256BroadcastsdPd(r *x86.M256D, v0 *x86.M128D)

// Broadcast 128 bits of integer data from "a" to all 128-bit lanes in "dst".
//
//go:linkname Mm256Broadcastsi128Si256 Mm256Broadcastsi128Si256
//go:noescape
func Mm256Broadcastsi128Si256(r *x86.M256I, v0 *x86.M128I)

// Broadcast the low packed 8-bit integer from "a" to all elements of "dst".
//
//go:linkname Mm256BroadcastbEpi8 Mm256BroadcastbEpi8
//go:noescape
func Mm256BroadcastbEpi8(r *x86.M256I, v0 *x86.M128I)

// Broadcast the low packed 16-bit integer from "a" to all elements of "dst".
//
//go:linkname Mm256BroadcastwEpi16 Mm256BroadcastwEpi16
//go:noescape
func Mm256BroadcastwEpi16(r *x86.M256I, v0 *x86.M128I)

// Broadcast the low packed 32-bit integer from "a" to all elements of "dst".
//
//go:linkname Mm256BroadcastdEpi32 Mm256BroadcastdEpi32
//go:noescape
func Mm256BroadcastdEpi32(r *x86.M256I, v0 *x86.M128I)

// Broadcast the low packed 64-bit integer from "a" to all elements of "dst".
//
//go:linkname Mm256BroadcastqEpi64 Mm256BroadcastqEpi64
//go:noescape
func Mm256BroadcastqEpi64(r *x86.M256I, v0 *x86.M128I)

// Broadcast the low packed 8-bit integer from "a" to all elements of "dst".
//
//go:linkname MmBroadcastbEpi8 MmBroadcastbEpi8
//go:noescape
func MmBroadcastbEpi8(r *x86.M128I, v0 *x86.M128I)

// Broadcast the low packed 16-bit integer from "a" to all elements of "dst".
//
//go:linkname MmBroadcastwEpi16 MmBroadcastwEpi16
//go:noescape
func MmBroadcastwEpi16(r *x86.M128I, v0 *x86.M128I)

// Broadcast the low packed 32-bit integer from "a" to all elements of "dst".
//
//go:linkname MmBroadcastdEpi32 MmBroadcastdEpi32
//go:noescape
func MmBroadcastdEpi32(r *x86.M128I, v0 *x86.M128I)

// Broadcast the low packed 64-bit integer from "a" to all elements of "dst".
//
//go:linkname MmBroadcastqEpi64 MmBroadcastqEpi64
//go:noescape
func MmBroadcastqEpi64(r *x86.M128I, v0 *x86.M128I)

// Shuffle 32-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst".
//
//go:linkname Mm256Permutevar8X32Epi32 Mm256Permutevar8X32Epi32
//go:noescape
func Mm256Permutevar8X32Epi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Shuffle single-precision (32-bit) floating-point elements in "a" across lanes using the corresponding index in "idx".
//
//go:linkname Mm256Permutevar8X32Ps Mm256Permutevar8X32Ps
//go:noescape
func Mm256Permutevar8X32Ps(r *x86.M256, v0 *x86.M256, v1 *x86.M256I)

// Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".
//
//go:linkname Mm256SllvEpi32 Mm256SllvEpi32
//go:noescape
func Mm256SllvEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".
//
//go:linkname MmSllvEpi32 MmSllvEpi32
//go:noescape
func MmSllvEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".
//
//go:linkname Mm256SllvEpi64 Mm256SllvEpi64
//go:noescape
func Mm256SllvEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".
//
//go:linkname MmSllvEpi64 MmSllvEpi64
//go:noescape
func MmSllvEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst".
//
//go:linkname Mm256SravEpi32 Mm256SravEpi32
//go:noescape
func Mm256SravEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst".
//
//go:linkname MmSravEpi32 MmSravEpi32
//go:noescape
func MmSravEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".
//
//go:linkname Mm256SrlvEpi32 Mm256SrlvEpi32
//go:noescape
func Mm256SrlvEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".
//
//go:linkname MmSrlvEpi32 MmSrlvEpi32
//go:noescape
func MmSrlvEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".
//
//go:linkname Mm256SrlvEpi64 Mm256SrlvEpi64
//go:noescape
func Mm256SrlvEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)

// Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".
//
//go:linkname MmSrlvEpi64 MmSrlvEpi64
//go:noescape
func MmSrlvEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)


================================================
FILE: x86/bmi/functions.c
================================================
#include <immintrin.h>

void AndnU32(unsigned int* r, unsigned int* v0, unsigned int* v1) { *r = __andn_u32(*v0, *v1); }
void BextrU32(unsigned int* r, unsigned int* v0, unsigned int* v1) { *r = __bextr_u32(*v0, *v1); }
void BextrU32(unsigned int* r, unsigned int* v0, unsigned int* v1, unsigned int* v2) { *r = _bextr_u32(*v0, *v1, *v2); }
void Bextr2U32(unsigned int* r, unsigned int* v0, unsigned int* v1) { *r = _bextr2_u32(*v0, *v1); }
void BlsiU32(unsigned int* r, unsigned int* v0) { *r = __blsi_u32(*v0); }
void BlsmskU32(unsigned int* r, unsigned int* v0) { *r = __blsmsk_u32(*v0); }
void BlsrU32(unsigned int* r, unsigned int* v0) { *r = __blsr_u32(*v0); }
void AndnU64(unsigned long long* r, unsigned long long* v0, unsigned long long* v1) { *r = __andn_u64(*v0, *v1); }
void BextrU64(unsigned long long* r, unsigned long long* v0, unsigned long long* v1) { *r = __bextr_u64(*v0, *v1); }
void BextrU64(unsigned long long* r, unsigned long long* v0, unsigned int* v1, unsigned int* v2) { *r = _bextr_u64(*v0, *v1, *v2); }
void Bextr2U64(unsigned long long* r, unsigned long long* v0, unsigned long long* v1) { *r = _bextr2_u64(*v0, *v1); }
void BlsiU64(unsigned long long* r, unsigned long long* v0) { *r = __blsi_u64(*v0); }
void BlsmskU64(unsigned long long* r, unsigned long long* v0) { *r = __blsmsk_u64(*v0); }
void BlsrU64(unsigned long long* r, unsigned long long* v0) { *r = __blsr_u64(*v0); }


================================================
FILE: x86/bmi/functions.go
================================================
package bmi

import (
	"github.com/alivanz/go-simd/x86"
)

/*
#cgo CFLAGS: -mbmi
#include <immintrin.h>
*/
import "C"

// __andn_u32
//
//go:linkname AndnU32 AndnU32
//go:noescape
func AndnU32(r *x86.Uint, v0 *x86.Uint, v1 *x86.Uint)

// __bextr_u32
//
//go:linkname BextrU32 BextrU32
//go:noescape
func BextrU32(r *x86.Uint, v0 *x86.Uint, v1 *x86.Uint)

// Extract contiguous bits from unsigned 32-bit integer "a", and store the result in "dst". Extract the number of bits specified by "len", starting at the bit specified by "start".
//
//go:linkname BextrU32 BextrU32
//go:noescape
func BextrU32(r *x86.Uint, v0 *x86.Uint, v1 *x86.Uint, v2 *x86.Uint)

// Extract contiguous bits from unsigned 32-bit integer "a", and store the result in "dst". Extract the number of bits specified by bits 15:8 of "control", starting at the bit specified by bits 0:7 of "control".
//
//go:linkname Bextr2U32 Bextr2U32
//go:noescape
func Bextr2U32(r *x86.Uint, v0 *x86.Uint, v1 *x86.Uint)

// __blsi_u32
//
//go:linkname BlsiU32 BlsiU32
//go:noescape
func BlsiU32(r *x86.Uint, v0 *x86.Uint)

// __blsmsk_u32
//
//go:linkname BlsmskU32 BlsmskU32
//go:noescape
func BlsmskU32(r *x86.Uint, v0 *x86.Uint)

// __blsr_u32
//
//go:linkname BlsrU32 BlsrU32
//go:noescape
func BlsrU32(r *x86.Uint, v0 *x86.Uint)

// __andn_u64
//
//go:linkname AndnU64 AndnU64
//go:noescape
func AndnU64(r *x86.Ulonglong, v0 *x86.Ulonglong, v1 *x86.Ulonglong)

// __bextr_u64
//
//go:linkname BextrU64 BextrU64
//go:noescape
func BextrU64(r *x86.Ulonglong, v0 *x86.Ulonglong, v1 *x86.Ulonglong)

// Extract contiguous bits from unsigned 64-bit integer "a", and store the result in "dst". Extract the number of bits specified by "len", starting at the bit specified by "start".
//
//go:linkname BextrU64 BextrU64
//go:noescape
func BextrU64(r *x86.Ulonglong, v0 *x86.Ulonglong, v1 *x86.Uint, v2 *x86.Uint)

// Extract contiguous bits from unsigned 64-bit integer "a", and store the result in "dst". Extract the number of bits specified by bits 15:8 of "control", starting at the bit specified by bits 0:7 of "control"..
//
//go:linkname Bextr2U64 Bextr2U64
//go:noescape
func Bextr2U64(r *x86.Ulonglong, v0 *x86.Ulonglong, v1 *x86.Ulonglong)

// __blsi_u64
//
//go:linkname BlsiU64 BlsiU64
//go:noescape
func BlsiU64(r *x86.Ulonglong, v0 *x86.Ulonglong)

// __blsmsk_u64
//
//go:linkname BlsmskU64 BlsmskU64
//go:noescape
func BlsmskU64(r *x86.Ulonglong, v0 *x86.Ulonglong)

// __blsr_u64
//
//go:linkname BlsrU64 BlsrU64
//go:noescape
func BlsrU64(r *x86.Ulonglong, v0 *x86.Ulonglong)


================================================
FILE: x86/bmi2/functions.c
================================================
#include <immintrin.h>

void BzhiU32(unsigned int* r, unsigned int* v0, unsigned int* v1) { *r = _bzhi_u32(*v0, *v1); }
void PdepU32(unsigned int* r, unsigned int* v0, unsigned int* v1) { *r = _pdep_u32(*v0, *v1); }
void PextU32(unsigned int* r, unsigned int* v0, unsigned int* v1) { *r = _pext_u32(*v0, *v1); }
void BzhiU64(unsigned long long* r, unsigned long long* v0, unsigned long long* v1) { *r = _bzhi_u64(*v0, *v1); }
void PdepU64(unsigned long long* r, unsigned long long* v0, unsigned long long* v1) { *r = _pdep_u64(*v0, *v1); }
void PextU64(unsigned long long* r, unsigned long long* v0, unsigned long long* v1) { *r = _pext_u64(*v0, *v1); }


================================================
FILE: x86/bmi2/functions.go
================================================
package bmi2

import (
	"github.com/alivanz/go-simd/x86"
)

/*
#cgo CFLAGS: -mbmi2
#include <immintrin.h>
*/
import "C"

// Copy all bits from unsigned 32-bit integer "a" to "dst", and reset (set to 0) the high bits in "dst" starting at "index".
//
//go:linkname BzhiU32 BzhiU32
//go:noescape
func BzhiU32(r *x86.Uint, v0 *x86.Uint, v1 *x86.Uint)

// Deposit contiguous low bits from unsigned 32-bit integer "a" to "dst" at the corresponding bit locations specified by "mask"; all other bits in "dst" are set to zero.
//
//go:linkname PdepU32 PdepU32
//go:noescape
func PdepU32(r *x86.Uint, v0 *x86.Uint, v1 *x86.Uint)

// Extract bits from unsigned 32-bit integer "a" at the corresponding bit locations specified by "mask" to contiguous low bits in "dst"; the remaining upper bits in "dst" are set to zero.
//
//go:linkname PextU32 PextU32
//go:noescape
func PextU32(r *x86.Uint, v0 *x86.Uint, v1 *x86.Uint)

// Copy all bits from unsigned 64-bit integer "a" to "dst", and reset (set to 0) the high bits in "dst" starting at "index".
//
//go:linkname BzhiU64 BzhiU64
//go:noescape
func BzhiU64(r *x86.Ulonglong, v0 *x86.Ulonglong, v1 *x86.Ulonglong)

// Deposit contiguous low bits from unsigned 64-bit integer "a" to "dst" at the corresponding bit locations specified by "mask"; all other bits in "dst" are set to zero.
//
//go:linkname PdepU64 PdepU64
//go:noescape
func PdepU64(r *x86.Ulonglong, v0 *x86.Ulonglong, v1 *x86.Ulonglong)

// Extract bits from unsigned 64-bit integer "a" at the corresponding bit locations specified by "mask" to contiguous low bits in "dst"; the remaining upper bits in "dst" are set to zero.
//
//go:linkname PextU64 PextU64
//go:noescape
func PextU64(r *x86.Ulonglong, v0 *x86.Ulonglong, v1 *x86.Ulonglong)


================================================
FILE: x86/crc32/functions.c
================================================
#include <immintrin.h>

void MmCrc32U8(unsigned int* r, unsigned int* v0, unsigned char* v1) { *r = _mm_crc32_u8(*v0, *v1); }
void MmCrc32U16(unsigned int* r, unsigned int* v0, unsigned short* v1) { *r = _mm_crc32_u16(*v0, *v1); }
void MmCrc32U32(unsigned int* r, unsigned int* v0, unsigned int* v1) { *r = _mm_crc32_u32(*v0, *v1); }
void MmCrc32U64(unsigned long long* r, unsigned long long* v0, unsigned long long* v1) { *r = _mm_crc32_u64(*v0, *v1); }


================================================
FILE: x86/crc32/functions.go
================================================
package crc32

import (
	"github.com/alivanz/go-simd/x86"
)

/*
#cgo CFLAGS: -mcrc32
#include <immintrin.h>
*/
import "C"

// Starting with the initial value in "crc", accumulates a CRC32 value for unsigned 8-bit integer "v", and stores the result in "dst".
//
//go:linkname MmCrc32U8 MmCrc32U8
//go:noescape
func MmCrc32U8(r *x86.Uint, v0 *x86.Uint, v1 *x86.Uchar)

// Starting with the initial value in "crc", accumulates a CRC32 value for unsigned 16-bit integer "v", and stores the result in "dst".
//
//go:linkname MmCrc32U16 MmCrc32U16
//go:noescape
func MmCrc32U16(r *x86.Uint, v0 *x86.Uint, v1 *x86.Ushort)

// Starting with the initial value in "crc", accumulates a CRC32 value for unsigned 32-bit integer "v", and stores the result in "dst".
//
//go:linkname MmCrc32U32 MmCrc32U32
//go:noescape
func MmCrc32U32(r *x86.Uint, v0 *x86.Uint, v1 *x86.Uint)

// Starting with the initial value in "crc", accumulates a CRC32 value for unsigned 64-bit integer "v", and stores the result in "dst".
//
//go:linkname MmCrc32U64 MmCrc32U64
//go:noescape
func MmCrc32U64(r *x86.Ulonglong, v0 *x86.Ulonglong, v1 *x86.Ulonglong)


================================================
FILE: x86/f16c/functions.c
================================================
#include <immintrin.h>

void CvtshSs(float* r, unsigned short* v0) { *r = _cvtsh_ss(*v0); }
void MmCvtphPs(__m128* r, __m128i* v0) { *r = _mm_cvtph_ps(*v0); }
void Mm256CvtphPs(__m256* r, __m128i* v0) { *r = _mm256_cvtph_ps(*v0); }


================================================
FILE: x86/f16c/functions.go
================================================
package f16c

import (
	"github.com/alivanz/go-simd/x86"
)

/*
#cgo CFLAGS: -mf16c
#include <immintrin.h>
*/
import "C"

// Convert the half-precision (16-bit) floating-point value "a" to a single-precision (32-bit) floating-point value, and store the result in "dst".
//
//go:linkname CvtshSs CvtshSs
//go:noescape
func CvtshSs(r *x86.Float, v0 *x86.Ushort)

// Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
//
//go:linkname MmCvtphPs MmCvtphPs
//go:noescape
func MmCvtphPs(r *x86.M128, v0 *x86.M128I)

// Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
//
//go:linkname Mm256CvtphPs Mm256CvtphPs
//go:noescape
func Mm256CvtphPs(r *x86.M256, v0 *x86.M128I)


================================================
FILE: x86/fma/functions.c
================================================
#include <immintrin.h>

void MmFmaddPs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fmadd_ps(*v0, *v1, *v2); }
void MmFmaddPd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fmadd_pd(*v0, *v1, *v2); }
void MmFmaddSs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fmadd_ss(*v0, *v1, *v2); }
void MmFmaddSd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fmadd_sd(*v0, *v1, *v2); }
void MmFmsubPs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fmsub_ps(*v0, *v1, *v2); }
void MmFmsubPd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fmsub_pd(*v0, *v1, *v2); }
void MmFmsubSs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fmsub_ss(*v0, *v1, *v2); }
void MmFmsubSd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fmsub_sd(*v0, *v1, *v2); }
void MmFnmaddPs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fnmadd_ps(*v0, *v1, *v2); }
void MmFnmaddPd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fnmadd_pd(*v0, *v1, *v2); }
void MmFnmaddSs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fnmadd_ss(*v0, *v1, *v2); }
void MmFnmaddSd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fnmadd_sd(*v0, *v1, *v2); }
void MmFnmsubPs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fnmsub_ps(*v0, *v1, *v2); }
void MmFnmsubPd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fnmsub_pd(*v0, *v1, *v2); }
void MmFnmsubSs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fnmsub_ss(*v0, *v1, *v2); }
void MmFnmsubSd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fnmsub_sd(*v0, *v1, *v2); }
void MmFmaddsubPs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fmaddsub_ps(*v0, *v1, *v2); }
void MmFmaddsubPd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fmaddsub_pd(*v0, *v1, *v2); }
void MmFmsubaddPs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fmsubadd_ps(*v0, *v1, *v2); }
void MmFmsubaddPd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fmsubadd_pd(*v0, *v1, *v2); }
void Mm256FmaddPs(__m256* r, __m256* v0, __m256* v1, __m256* v2) { *r = _mm256_fmadd_ps(*v0, *v1, *v2); }
void Mm256FmaddPd(__m256d* r, __m256d* v0, __m256d* v1, __m256d* v2) { *r = _mm256_fmadd_pd(*v0, *v1, *v2); }
void Mm256FmsubPs(__m256* r, __m256* v0, __m256* v1, __m256* v2) { *r = _mm256_fmsub_ps(*v0, *v1, *v2); }
void Mm256FmsubPd(__m256d* r, __m256d* v0, __m256d* v1, __m256d* v2) { *r = _mm256_fmsub_pd(*v0, *v1, *v2); }
void Mm256FnmaddPs(__m256* r, __m256* v0, __m256* v1, __m256* v2) { *r = _mm256_fnmadd_ps(*v0, *v1, *v2); }
void Mm256FnmaddPd(__m256d* r, __m256d* v0, __m256d* v1, __m256d* v2) { *r = _mm256_fnmadd_pd(*v0, *v1, *v2); }
void Mm256FnmsubPs(__m256* r, __m256* v0, __m256* v1, __m256* v2) { *r = _mm256_fnmsub_ps(*v0, *v1, *v2); }
void Mm256FnmsubPd(__m256d* r, __m256d* v0, __m256d* v1, __m256d* v2) { *r = _mm256_fnmsub_pd(*v0, *v1, *v2); }
void Mm256FmaddsubPs(__m256* r, __m256* v0, __m256* v1, __m256* v2) { *r = _mm256_fmaddsub_ps(*v0, *v1, *v2); }
void Mm256FmaddsubPd(__m256d* r, __m256d* v0, __m256d* v1, __m256d* v2) { *r = _mm256_fmaddsub_pd(*v0, *v1, *v2); }
void Mm256FmsubaddPs(__m256* r, __m256* v0, __m256* v1, __m256* v2) { *r = _mm256_fmsubadd_ps(*v0, *v1, *v2); }
void Mm256FmsubaddPd(__m256d* r, __m256d* v0, __m256d* v1, __m256d* v2) { *r = _mm256_fmsubadd_pd(*v0, *v1, *v2); }


================================================
FILE: x86/fma/functions.go
================================================
package fma

import (
	"github.com/alivanz/go-simd/x86"
)

/*
#cgo CFLAGS: -mfma
#include <immintrin.h>
*/
import "C"

// Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst".
//
//go:linkname MmFmaddPs MmFmaddPs
//go:noescape
func MmFmaddPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128)

// Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst".
//
//go:linkname MmFmaddPd MmFmaddPd
//go:noescape
func MmFmaddPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D)

// Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
//
//go:linkname MmFmaddSs MmFmaddSs
//go:noescape
func MmFmaddSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128)

// Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
//
//go:linkname MmFmaddSd MmFmaddSd
//go:noescape
func MmFmaddSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D)

// Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst".
//
//go:linkname MmFmsubPs MmFmsubPs
//go:noescape
func MmFmsubPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128)

// Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst".
//
//go:linkname MmFmsubPd MmFmsubPd
//go:noescape
func MmFmsubPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D)

// Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
//
//go:linkname MmFmsubSs MmFmsubSs
//go:noescape
func MmFmsubSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128)

// Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
//
//go:linkname MmFmsubSd MmFmsubSd
//go:noescape
func MmFmsubSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D)

// Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst".
//
//go:linkname MmFnmaddPs MmFnmaddPs
//go:noescape
func MmFnmaddPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128)

// Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst".
//
//go:linkname MmFnmaddPd MmFnmaddPd
//go:noescape
func MmFnmaddPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D)

// Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
//
//go:linkname MmFnmaddSs MmFnmaddSs
//go:noescape
func MmFnmaddSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128)

// Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
//
//go:linkname MmFnmaddSd MmFnmaddSd
//go:noescape
func MmFnmaddSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D)

// Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst".
//
//go:linkname MmFnmsubPs MmFnmsubPs
//go:noescape
func MmFnmsubPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128)

// Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst".
//
//go:linkname MmFnmsubPd MmFnmsubPd
//go:noescape
func MmFnmsubPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D)

// Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
//
//go:linkname MmFnmsubSs MmFnmsubSs
//go:noescape
func MmFnmsubSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128)

// Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
//
//go:linkname MmFnmsubSd MmFnmsubSd
//go:noescape
func MmFnmsubSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D)

// Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst".
//
//go:linkname MmFmaddsubPs MmFmaddsubPs
//go:noescape
func MmFmaddsubPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128)

// Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst".
//
//go:linkname MmFmaddsubPd MmFmaddsubPd
//go:noescape
func MmFmaddsubPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D)

// Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst".
//
//go:linkname MmFmsubaddPs MmFmsubaddPs
//go:noescape
func MmFmsubaddPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128)

// Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst".
//
//go:linkname MmFmsubaddPd MmFmsubaddPd
//go:noescape
func MmFmsubaddPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D)

// Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst".
//
//go:linkname Mm256FmaddPs Mm256FmaddPs
//go:noescape
func Mm256FmaddPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256, v2 *x86.M256)

// Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst".
//
//go:linkname Mm256FmaddPd Mm256FmaddPd
//go:noescape
func Mm256FmaddPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D, v2 *x86.M256D)

// Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst".
//
//go:linkname Mm256FmsubPs Mm256FmsubPs
//go:noescape
func Mm256FmsubPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256, v2 *x86.M256)

// Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst".
//
//go:linkname Mm256FmsubPd Mm256FmsubPd
//go:noescape
func Mm256FmsubPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D, v2 *x86.M256D)

// Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst".
//
//go:linkname Mm256FnmaddPs Mm256FnmaddPs
//go:noescape
func Mm256FnmaddPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256, v2 *x86.M256)

// Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst".
//
//go:linkname Mm256FnmaddPd Mm256FnmaddPd
//go:noescape
func Mm256FnmaddPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D, v2 *x86.M256D)

// Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst".
//
//go:linkname Mm256FnmsubPs Mm256FnmsubPs
//go:noescape
func Mm256FnmsubPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256, v2 *x86.M256)

// Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst".
//
//go:linkname Mm256FnmsubPd Mm256FnmsubPd
//go:noescape
func Mm256FnmsubPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D, v2 *x86.M256D)

// Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst".
//
//go:linkname Mm256FmaddsubPs Mm256FmaddsubPs
//go:noescape
func Mm256FmaddsubPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256, v2 *x86.M256)

// Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst".
//
//go:linkname Mm256FmaddsubPd Mm256FmaddsubPd
//go:noescape
func Mm256FmaddsubPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D, v2 *x86.M256D)

// Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst".
//
//go:linkname Mm256FmsubaddPs Mm256FmsubaddPs
//go:noescape
func Mm256FmsubaddPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256, v2 *x86.M256)

// Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst".
//
//go:linkname Mm256FmsubaddPd Mm256FmsubaddPd
//go:noescape
func Mm256FmsubaddPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D, v2 *x86.M256D)


================================================
FILE: x86/fsgsbase/functions.c
================================================
#include <immintrin.h>

void ReadfsbaseU32(unsigned int* r) { *r = _readfsbase_u32(); }
void ReadfsbaseU64(unsigned long long* r) { *r = _readfsbase_u64(); }
void ReadgsbaseU32(unsigned int* r) { *r = _readgsbase_u32(); }
void ReadgsbaseU64(unsigned long long* r) { *r = _readgsbase_u64(); }
void WritefsbaseU32(unsigned int* v0) { _writefsbase_u32(*v0); }
void WritefsbaseU64(unsigned long long* v0) { _writefsbase_u64(*v0); }
void WritegsbaseU32(unsigned int* v0) { _writegsbase_u32(*v0); }
void WritegsbaseU64(unsigned long long* v0) { _writegsbase_u64(*v0); }


================================================
FILE: x86/fsgsbase/functions.go
================================================
package fsgsbase

import (
	"github.com/alivanz/go-simd/x86"
)

/*
#cgo CFLAGS: -mfsgsbase
#include <immintrin.h>
*/
import "C"

// Read the FS segment base register and store the 32-bit result in "dst".
//
//go:linkname ReadfsbaseU32 ReadfsbaseU32
//go:noescape
func ReadfsbaseU32(r *x86.Uint, )

// Read the FS segment base register and store the 64-bit result in "dst".
//
//go:linkname ReadfsbaseU64 ReadfsbaseU64
//go:noescape
func ReadfsbaseU64(r *x86.Ulonglong, )

// Read the GS segment base register and store the 32-bit result in "dst".
//
//go:linkname ReadgsbaseU32 ReadgsbaseU32
//go:noescape
func ReadgsbaseU32(r *x86.Uint, )

// Read the GS segment base register and store the 64-bit result in "dst".
//
//go:linkname ReadgsbaseU64 ReadgsbaseU64
//go:noescape
func ReadgsbaseU64(r *x86.Ulonglong, )

// Write the unsigned 32-bit integer "a" to the FS segment base register.
//
//go:linkname WritefsbaseU32 WritefsbaseU32
//go:noescape
func WritefsbaseU32(v0 *x86.Uint)

// Write the unsigned 64-bit integer "a" to the FS segment base register.
//
//go:linkname WritefsbaseU64 WritefsbaseU64
//go:noescape
func WritefsbaseU64(v0 *x86.Ulonglong)

// Write the unsigned 32-bit integer "a" to the GS segment base register.
//
//go:linkname WritegsbaseU32 WritegsbaseU32
//go:noescape
func WritegsbaseU32(v0 *x86.Uint)

// Write the unsigned 64-bit integer "a" to the GS segment base register.
//
//go:linkname WritegsbaseU64 WritegsbaseU64
//go:noescape
func WritegsbaseU64(v0 *x86.Ulonglong)


================================================
FILE: x86/generate.go
================================================
package x86

//go:generate go run ../generator/x86


================================================
FILE: x86/lzcnt/functions.c
================================================
#include <immintrin.h>

void Lzcnt32(unsigned int* r, unsigned int* v0) { *r = __lzcnt32(*v0); }
void LzcntU32(unsigned int* r, unsigned int* v0) { *r = _lzcnt_u32(*v0); }
void LzcntU64(unsigned long long* r, unsigned long long* v0) { *r = _lzcnt_u64(*v0); }


================================================
FILE: x86/lzcnt/functions.go
================================================
package lzcnt

import (
	"github.com/alivanz/go-simd/x86"
)

/*
#cgo CFLAGS: -mlzcnt
#include <immintrin.h>
*/
import "C"

// __lzcnt32
//
//go:linkname Lzcnt32 Lzcnt32
//go:noescape
func Lzcnt32(r *x86.Uint, v0 *x86.Uint)

// Count the number of leading zero bits in unsigned 32-bit integer "a", and return that count in "dst".
//
//go:linkname LzcntU32 LzcntU32
//go:noescape
func LzcntU32(r *x86.Uint, v0 *x86.Uint)

// Count the number of leading zero bits in unsigned 64-bit integer "a", and return that count in "dst".
//
//go:linkname LzcntU64 LzcntU64
//go:noescape
func LzcntU64(r *x86.Ulonglong, v0 *x86.Ulonglong)


================================================
FILE: x86/mmx/functions.c
================================================
#include <immintrin.h>

void MmEmpty() { _mm_empty(); }
void MmCvtsi32Si64(__m64* r, int* v0) { *r = _mm_cvtsi32_si64(*v0); }
void MmCvtsi64Si32(int* r, __m64* v0) { *r = _mm_cvtsi64_si32(*v0); }
void MmCvtsi64M64(__m64* r, long long* v0) { *r = _mm_cvtsi64_m64(*v0); }
void MmCvtm64Si64(long long* r, __m64* v0) { *r = _mm_cvtm64_si64(*v0); }
void MmPacksPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_packs_pi16(*v0, *v1); }
void MmPacksPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_packs_pi32(*v0, *v1); }
void MmPacksPu16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_packs_pu16(*v0, *v1); }
void MmUnpackhiPi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_unpackhi_pi8(*v0, *v1); }
void MmUnpackhiPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_unpackhi_pi16(*v0, *v1); }
void MmUnpackhiPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_unpackhi_pi32(*v0, *v1); }
void MmUnpackloPi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_unpacklo_pi8(*v0, *v1); }
void MmUnpackloPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_unpacklo_pi16(*v0, *v1); }
void MmUnpackloPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_unpacklo_pi32(*v0, *v1); }
void MmAddPi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_add_pi8(*v0, *v1); }
void MmAddPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_add_pi16(*v0, *v1); }
void MmAddPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_add_pi32(*v0, *v1); }
void MmAddsPi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_adds_pi8(*v0, *v1); }
void MmAddsPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_adds_pi16(*v0, *v1); }
void MmAddsPu8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_adds_pu8(*v0, *v1); }
void MmAddsPu16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_adds_pu16(*v0, *v1); }
void MmSubPi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sub_pi8(*v0, *v1); }
void MmSubPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sub_pi16(*v0, *v1); }
void MmSubPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sub_pi32(*v0, *v1); }
void MmSubsPi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_subs_pi8(*v0, *v1); }
void MmSubsPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_subs_pi16(*v0, *v1); }
void MmSubsPu8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_subs_pu8(*v0, *v1); }
void MmSubsPu16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_subs_pu16(*v0, *v1); }
void MmMaddPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_madd_pi16(*v0, *v1); }
void MmMulhiPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_mulhi_pi16(*v0, *v1); }
void MmMulloPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_mullo_pi16(*v0, *v1); }
void MmSllPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sll_pi16(*v0, *v1); }
void MmSlliPi16(__m64* r, __m64* v0, int* v1) { *r = _mm_slli_pi16(*v0, *v1); }
void MmSllPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sll_pi32(*v0, *v1); }
void MmSlliPi32(__m64* r, __m64* v0, int* v1) { *r = _mm_slli_pi32(*v0, *v1); }
void MmSllSi64(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sll_si64(*v0, *v1); }
void MmSlliSi64(__m64* r, __m64* v0, int* v1) { *r = _mm_slli_si64(*v0, *v1); }
void MmSraPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sra_pi16(*v0, *v1); }
void MmSraiPi16(__m64* r, __m64* v0, int* v1) { *r = _mm_srai_pi16(*v0, *v1); }
void MmSraPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sra_pi32(*v0, *v1); }
void MmSraiPi32(__m64* r, __m64* v0, int* v1) { *r = _mm_srai_pi32(*v0, *v1); }
void MmSrlPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_srl_pi16(*v0, *v1); }
void MmSrliPi16(__m64* r, __m64* v0, int* v1) { *r = _mm_srli_pi16(*v0, *v1); }
void MmSrlPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_srl_pi32(*v0, *v1); }
void MmSrliPi32(__m64* r, __m64* v0, int* v1) { *r = _mm_srli_pi32(*v0, *v1); }
void MmSrlSi64(__m64* r, __m64* v0, __m64* v1) { *r = _mm_srl_si64(*v0, *v1); }
void MmSrliSi64(__m64* r, __m64* v0, int* v1) { *r = _mm_srli_si64(*v0, *v1); }
void MmAndSi64(__m64* r, __m64* v0, __m64* v1) { *r = _mm_and_si64(*v0, *v1); }
void MmAndnotSi64(__m64* r, __m64* v0, __m64* v1) { *r = _mm_andnot_si64(*v0, *v1); }
void MmOrSi64(__m64* r, __m64* v0, __m64* v1) { *r = _mm_or_si64(*v0, *v1); }
void MmXorSi64(__m64* r, __m64* v0, __m64* v1) { *r = _mm_xor_si64(*v0, *v1); }
void MmCmpeqPi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_cmpeq_pi8(*v0, *v1); }
void MmCmpeqPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_cmpeq_pi16(*v0, *v1); }
void MmCmpeqPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_cmpeq_pi32(*v0, *v1); }
void MmCmpgtPi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_cmpgt_pi8(*v0, *v1); }
void MmCmpgtPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_cmpgt_pi16(*v0, *v1); }
void MmCmpgtPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_cmpgt_pi32(*v0, *v1); }
void MmSetzeroSi64(__m64* r) { *r = _mm_setzero_si64(); }
void MmSetPi32(__m64* r, int* v0, int* v1) { *r = _mm_set_pi32(*v0, *v1); }
void MmSetPi16(__m64* r, short* v0, short* v1, short* v2, short* v3) { *r = _mm_set_pi16(*v0, *v1, *v2, *v3); }
void MmSetPi8(__m64* r, char* v0, char* v1, char* v2, char* v3, char* v4, char* v5, char* v6, char* v7) { *r = _mm_set_pi8(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7); }
void MmSet1Pi32(__m64* r, int* v0) { *r = _mm_set1_pi32(*v0); }
void MmSet1Pi16(__m64* r, short* v0) { *r = _mm_set1_pi16(*v0); }
void MmSet1Pi8(__m64* r, char* v0) { *r = _mm_set1_pi8(*v0); }
void MmSetrPi32(__m64* r, int* v0, int* v1) { *r = _mm_setr_pi32(*v0, *v1); }
void MmSetrPi16(__m64* r, short* v0, short* v1, short* v2, short* v3) { *r = _mm_setr_pi16(*v0, *v1, *v2, *v3); }
void MmSetrPi8(__m64* r, char* v0, char* v1, char* v2, char* v3, char* v4, char* v5, char* v6, char* v7) { *r = _mm_setr_pi8(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7); }


================================================
FILE: x86/mmx/functions.go
================================================
package mmx

import (
	"github.com/alivanz/go-simd/x86"
)

/*
#cgo CFLAGS: -mmmx
#include <immintrin.h>
*/
import "C"

// Empty the MMX state, which marks the x87 FPU registers as available for use by x87 instructions. This instruction must be used at the end of all MMX technology procedures.
//
//go:linkname MmEmpty MmEmpty
//go:noescape
func MmEmpty()

// Copy 32-bit integer "a" to the lower elements of "dst", and zero the upper element of "dst".
//
//go:linkname MmCvtsi32Si64 MmCvtsi32Si64
//go:noescape
func MmCvtsi32Si64(r *x86.M64, v0 *x86.Int)

// Copy the lower 32-bit integer in "a" to "dst".
//
//go:linkname MmCvtsi64Si32 MmCvtsi64Si32
//go:noescape
func MmCvtsi64Si32(r *x86.Int, v0 *x86.M64)

// Copy 64-bit integer "a" to "dst".
//
//go:linkname MmCvtsi64M64 MmCvtsi64M64
//go:noescape
func MmCvtsi64M64(r *x86.M64, v0 *x86.Longlong)

// Copy 64-bit integer "a" to "dst".
//
//go:linkname MmCvtm64Si64 MmCvtm64Si64
//go:noescape
func MmCvtm64Si64(r *x86.Longlong, v0 *x86.M64)

// Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst".
//
//go:linkname MmPacksPi16 MmPacksPi16
//go:noescape
func MmPacksPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst".
//
//go:linkname MmPacksPi32 MmPacksPi32
//go:noescape
func MmPacksPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst".
//
//go:linkname MmPacksPu16 MmPacksPu16
//go:noescape
func MmPacksPu16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Unpack and interleave 8-bit integers from the high half of "a" and "b", and store the results in "dst".
//
//go:linkname MmUnpackhiPi8 MmUnpackhiPi8
//go:noescape
func MmUnpackhiPi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Unpack and interleave 16-bit integers from the high half of "a" and "b", and store the results in "dst".
//
//go:linkname MmUnpackhiPi16 MmUnpackhiPi16
//go:noescape
func MmUnpackhiPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Unpack and interleave 32-bit integers from the high half of "a" and "b", and store the results in "dst".
//
//go:linkname MmUnpackhiPi32 MmUnpackhiPi32
//go:noescape
func MmUnpackhiPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Unpack and interleave 8-bit integers from the low half of "a" and "b", and store the results in "dst".
//
//go:linkname MmUnpackloPi8 MmUnpackloPi8
//go:noescape
func MmUnpackloPi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Unpack and interleave 16-bit integers from the low half of "a" and "b", and store the results in "dst".
//
//go:linkname MmUnpackloPi16 MmUnpackloPi16
//go:noescape
func MmUnpackloPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Unpack and interleave 32-bit integers from the low half of "a" and "b", and store the results in "dst".
//
//go:linkname MmUnpackloPi32 MmUnpackloPi32
//go:noescape
func MmUnpackloPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Add packed 8-bit integers in "a" and "b", and store the results in "dst".
//
//go:linkname MmAddPi8 MmAddPi8
//go:noescape
func MmAddPi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Add packed 16-bit integers in "a" and "b", and store the results in "dst".
//
//go:linkname MmAddPi16 MmAddPi16
//go:noescape
func MmAddPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Add packed 32-bit integers in "a" and "b", and store the results in "dst".
//
//go:linkname MmAddPi32 MmAddPi32
//go:noescape
func MmAddPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst".
//
//go:linkname MmAddsPi8 MmAddsPi8
//go:noescape
func MmAddsPi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst".
//
//go:linkname MmAddsPi16 MmAddsPi16
//go:noescape
func MmAddsPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst".
//
//go:linkname MmAddsPu8 MmAddsPu8
//go:noescape
func MmAddsPu8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst".
//
//go:linkname MmAddsPu16 MmAddsPu16
//go:noescape
func MmAddsPu16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst".
//
//go:linkname MmSubPi8 MmSubPi8
//go:noescape
func MmSubPi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst".
//
//go:linkname MmSubPi16 MmSubPi16
//go:noescape
func MmSubPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst".
//
//go:linkname MmSubPi32 MmSubPi32
//go:noescape
func MmSubPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst".
//
//go:linkname MmSubsPi8 MmSubsPi8
//go:noescape
func MmSubsPi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst".
//
//go:linkname MmSubsPi16 MmSubsPi16
//go:noescape
func MmSubsPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst".
//
//go:linkname MmSubsPu8 MmSubsPu8
//go:noescape
func MmSubsPu8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst".
//
//go:linkname MmSubsPu16 MmSubsPu16
//go:noescape
func MmSubsPu16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst".
//
//go:linkname MmMaddPi16 MmMaddPi16
//go:noescape
func MmMaddPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst".
//
//go:linkname MmMulhiPi16 MmMulhiPi16
//go:noescape
func MmMulhiPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst".
//
//go:linkname MmMulloPi16 MmMulloPi16
//go:noescape
func MmMulloPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".
//
//go:linkname MmSllPi16 MmSllPi16
//go:noescape
func MmSllPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".
//
//go:linkname MmSlliPi16 MmSlliPi16
//go:noescape
func MmSlliPi16(r *x86.M64, v0 *x86.M64, v1 *x86.Int)

// Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".
//
//go:linkname MmSllPi32 MmSllPi32
//go:noescape
func MmSllPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".
//
//go:linkname MmSlliPi32 MmSlliPi32
//go:noescape
func MmSlliPi32(r *x86.M64, v0 *x86.M64, v1 *x86.Int)

// Shift 64-bit integer "a" left by "count" while shifting in zeros, and store the result in "dst".
//
//go:linkname MmSllSi64 MmSllSi64
//go:noescape
func MmSllSi64(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Shift 64-bit integer "a" left by "imm8" while shifting in zeros, and store the result in "dst".
//
//go:linkname MmSlliSi64 MmSlliSi64
//go:noescape
func MmSlliSi64(r *x86.M64, v0 *x86.M64, v1 *x86.Int)

// Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".
//
//go:linkname MmSraPi16 MmSraPi16
//go:noescape
func MmSraPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".
//
//go:linkname MmSraiPi16 MmSraiPi16
//go:noescape
func MmSraiPi16(r *x86.M64, v0 *x86.M64, v1 *x86.Int)

// Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".
//
//go:linkname MmSraPi32 MmSraPi32
//go:noescape
func MmSraPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".
//
//go:linkname MmSraiPi32 MmSraiPi32
//go:noescape
func MmSraiPi32(r *x86.M64, v0 *x86.M64, v1 *x86.Int)

// Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".
//
//go:linkname MmSrlPi16 MmSrlPi16
//go:noescape
func MmSrlPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".
//
//go:linkname MmSrliPi16 MmSrliPi16
//go:noescape
func MmSrliPi16(r *x86.M64, v0 *x86.M64, v1 *x86.Int)

// Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".
//
//go:linkname MmSrlPi32 MmSrlPi32
//go:noescape
func MmSrlPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".
//
//go:linkname MmSrliPi32 MmSrliPi32
//go:noescape
func MmSrliPi32(r *x86.M64, v0 *x86.M64, v1 *x86.Int)

// Shift 64-bit integer "a" right by "count" while shifting in zeros, and store the result in "dst".
//
//go:linkname MmSrlSi64 MmSrlSi64
//go:noescape
func MmSrlSi64(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Shift 64-bit integer "a" right by "imm8" while shifting in zeros, and store the result in "dst".
//
//go:linkname MmSrliSi64 MmSrliSi64
//go:noescape
func MmSrliSi64(r *x86.M64, v0 *x86.M64, v1 *x86.Int)

// Compute the bitwise AND of 64 bits (representing integer data) in "a" and "b", and store the result in "dst".
//
//go:linkname MmAndSi64 MmAndSi64
//go:noescape
func MmAndSi64(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Compute the bitwise NOT of 64 bits (representing integer data) in "a" and then AND with "b", and store the result in "dst".
//
//go:linkname MmAndnotSi64 MmAndnotSi64
//go:noescape
func MmAndnotSi64(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Compute the bitwise OR of 64 bits (representing integer data) in "a" and "b", and store the result in "dst".
//
//go:linkname MmOrSi64 MmOrSi64
//go:noescape
func MmOrSi64(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Compute the bitwise XOR of 64 bits (representing integer data) in "a" and "b", and store the result in "dst".
//
//go:linkname MmXorSi64 MmXorSi64
//go:noescape
func MmXorSi64(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Compare packed 8-bit integers in "a" and "b" for equality, and store the results in "dst".
//
//go:linkname MmCmpeqPi8 MmCmpeqPi8
//go:noescape
func MmCmpeqPi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Compare packed 16-bit integers in "a" and "b" for equality, and store the results in "dst".
//
//go:linkname MmCmpeqPi16 MmCmpeqPi16
//go:noescape
func MmCmpeqPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Compare packed 32-bit integers in "a" and "b" for equality, and store the results in "dst".
//
//go:linkname MmCmpeqPi32 MmCmpeqPi32
//go:noescape
func MmCmpeqPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in "dst".
//
//go:linkname MmCmpgtPi8 MmCmpgtPi8
//go:noescape
func MmCmpgtPi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in "dst".
//
//go:linkname MmCmpgtPi16 MmCmpgtPi16
//go:noescape
func MmCmpgtPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in "dst".
//
//go:linkname MmCmpgtPi32 MmCmpgtPi32
//go:noescape
func MmCmpgtPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Return vector of type __m64 with all elements set to zero.
//
//go:linkname MmSetzeroSi64 MmSetzeroSi64
//go:noescape
func MmSetzeroSi64(r *x86.M64, )

// Set packed 32-bit integers in "dst" with the supplied values.
//
//go:linkname MmSetPi32 MmSetPi32
//go:noescape
func MmSetPi32(r *x86.M64, v0 *x86.Int, v1 *x86.Int)

// Set packed 16-bit integers in "dst" with the supplied values.
//
//go:linkname MmSetPi16 MmSetPi16
//go:noescape
func MmSetPi16(r *x86.M64, v0 *x86.Short, v1 *x86.Short, v2 *x86.Short, v3 *x86.Short)

// Set packed 8-bit integers in "dst" with the supplied values.
//
//go:linkname MmSetPi8 MmSetPi8
//go:noescape
func MmSetPi8(r *x86.M64, v0 *x86.Char, v1 *x86.Char, v2 *x86.Char, v3 *x86.Char, v4 *x86.Char, v5 *x86.Char, v6 *x86.Char, v7 *x86.Char)

// Broadcast 32-bit integer "a" to all elements of "dst".
//
//go:linkname MmSet1Pi32 MmSet1Pi32
//go:noescape
func MmSet1Pi32(r *x86.M64, v0 *x86.Int)

// Broadcast 16-bit integer "a" to all all elements of "dst".
//
//go:linkname MmSet1Pi16 MmSet1Pi16
//go:noescape
func MmSet1Pi16(r *x86.M64, v0 *x86.Short)

// Broadcast 8-bit integer "a" to all elements of "dst".
//
//go:linkname MmSet1Pi8 MmSet1Pi8
//go:noescape
func MmSet1Pi8(r *x86.M64, v0 *x86.Char)

// Set packed 32-bit integers in "dst" with the supplied values in reverse order.
//
//go:linkname MmSetrPi32 MmSetrPi32
//go:noescape
func MmSetrPi32(r *x86.M64, v0 *x86.Int, v1 *x86.Int)

// Set packed 16-bit integers in "dst" with the supplied values in reverse order.
//
//go:linkname MmSetrPi16 MmSetrPi16
//go:noescape
func MmSetrPi16(r *x86.M64, v0 *x86.Short, v1 *x86.Short, v2 *x86.Short, v3 *x86.Short)

// Set packed 8-bit integers in "dst" with the supplied values in reverse order.
//
//go:linkname MmSetrPi8 MmSetrPi8
//go:noescape
func MmSetrPi8(r *x86.M64, v0 *x86.Char, v1 *x86.Char, v2 *x86.Char, v3 *x86.Char, v4 *x86.Char, v5 *x86.Char, v6 *x86.Char, v7 *x86.Char)


================================================
FILE: x86/mmx_sse/functions.c
================================================
#include <immintrin.h>

void MmCvtpsPi32(__m64* r, __m128* v0) { *r = _mm_cvtps_pi32(*v0); }
void MmCvtPs2Pi(__m64* r, __m128* v0) { *r = _mm_cvt_ps2pi(*v0); }
void MmCvttpsPi32(__m64* r, __m128* v0) { *r = _mm_cvttps_pi32(*v0); }
void MmCvttPs2Pi(__m64* r, __m128* v0) { *r = _mm_cvtt_ps2pi(*v0); }
void MmCvtpi32Ps(__m128* r, __m128* v0, __m64* v1) { *r = _mm_cvtpi32_ps(*v0, *v1); }
void MmCvtPi2Ps(__m128* r, __m128* v0, __m64* v1) { *r = _mm_cvt_pi2ps(*v0, *v1); }
void MmMaxPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_max_pi16(*v0, *v1); }
void MmMaxPu8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_max_pu8(*v0, *v1); }
void MmMinPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_min_pi16(*v0, *v1); }
void MmMinPu8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_min_pu8(*v0, *v1); }
void MmMovemaskPi8(int* r, __m64* v0) { *r = _mm_movemask_pi8(*v0); }
void MmMulhiPu16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_mulhi_pu16(*v0, *v1); }
void MmAvgPu8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_avg_pu8(*v0, *v1); }
void MmAvgPu16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_avg_pu16(*v0, *v1); }
void MmSadPu8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sad_pu8(*v0, *v1); }
void MmCvtpi16Ps(__m128* r, __m64* v0) { *r = _mm_cvtpi16_ps(*v0); }
void MmCvtpu16Ps(__m128* r, __m64* v0) { *r = _mm_cvtpu16_ps(*v0); }
void MmCvtpi8Ps(__m128* r, __m64* v0) { *r = _mm_cvtpi8_ps(*v0); }
void MmCvtpu8Ps(__m128* r, __m64* v0) { *r = _mm_cvtpu8_ps(*v0); }
void MmCvtpi32X2Ps(__m128* r, __m64* v0, __m64* v1) { *r = _mm_cvtpi32x2_ps(*v0, *v1); }
void MmCvtpsPi16(__m64* r, __m128* v0) { *r = _mm_cvtps_pi16(*v0); }
void MmCvtpsPi8(__m64* r, __m128* v0) { *r = _mm_cvtps_pi8(*v0); }


================================================
FILE: x86/mmx_sse/functions.go
================================================
package mmx_sse

import (
	"github.com/alivanz/go-simd/x86"
)

/*
#cgo CFLAGS: -mmmx -msse
#include <immintrin.h>
*/
import "C"

// Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".
//
//go:linkname MmCvtpsPi32 MmCvtpsPi32
//go:noescape
func MmCvtpsPi32(r *x86.M64, v0 *x86.M128)

// Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".
//
//go:linkname MmCvtPs2Pi MmCvtPs2Pi
//go:noescape
func MmCvtPs2Pi(r *x86.M64, v0 *x86.M128)

// Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".
//
//go:linkname MmCvttpsPi32 MmCvttpsPi32
//go:noescape
func MmCvttpsPi32(r *x86.M64, v0 *x86.M128)

// Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".
//
//go:linkname MmCvttPs2Pi MmCvttPs2Pi
//go:noescape
func MmCvttPs2Pi(r *x86.M64, v0 *x86.M128)

// Convert packed 32-bit integers in "b" to packed single-precision (32-bit) floating-point elements, store the results in the lower 2 elements of "dst", and copy the upper 2 packed elements from "a" to the upper elements of "dst".
//
//go:linkname MmCvtpi32Ps MmCvtpi32Ps
//go:noescape
func MmCvtpi32Ps(r *x86.M128, v0 *x86.M128, v1 *x86.M64)

// Convert packed signed 32-bit integers in "b" to packed single-precision (32-bit) floating-point elements, store the results in the lower 2 elements of "dst", and copy the upper 2 packed elements from "a" to the upper elements of "dst".
//
//go:linkname MmCvtPi2Ps MmCvtPi2Ps
//go:noescape
func MmCvtPi2Ps(r *x86.M128, v0 *x86.M128, v1 *x86.M64)

// Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst".
//
//go:linkname MmMaxPi16 MmMaxPi16
//go:noescape
func MmMaxPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst".
//
//go:linkname MmMaxPu8 MmMaxPu8
//go:noescape
func MmMaxPu8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst".
//
//go:linkname MmMinPi16 MmMinPi16
//go:noescape
func MmMinPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst".
//
//go:linkname MmMinPu8 MmMinPu8
//go:noescape
func MmMinPu8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Create mask from the most significant bit of each 8-bit element in "a", and store the result in "dst".
//
//go:linkname MmMovemaskPi8 MmMovemaskPi8
//go:noescape
func MmMovemaskPi8(r *x86.Int, v0 *x86.M64)

// Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst".
//
//go:linkname MmMulhiPu16 MmMulhiPu16
//go:noescape
func MmMulhiPu16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst".
//
//go:linkname MmAvgPu8 MmAvgPu8
//go:noescape
func MmAvgPu8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst".
//
//go:linkname MmAvgPu16 MmAvgPu16
//go:noescape
func MmAvgPu16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Compute the absolute differences of packed unsigned 8-bit integers in "a" and "b", then horizontally sum each consecutive 8 differences to produce four unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of "dst".
//
//go:linkname MmSadPu8 MmSadPu8
//go:noescape
func MmSadPu8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Convert packed 16-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
//
//go:linkname MmCvtpi16Ps MmCvtpi16Ps
//go:noescape
func MmCvtpi16Ps(r *x86.M128, v0 *x86.M64)

// Convert packed unsigned 16-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
//
//go:linkname MmCvtpu16Ps MmCvtpu16Ps
//go:noescape
func MmCvtpu16Ps(r *x86.M128, v0 *x86.M64)

// Convert the lower packed 8-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
//
//go:linkname MmCvtpi8Ps MmCvtpi8Ps
//go:noescape
func MmCvtpi8Ps(r *x86.M128, v0 *x86.M64)

// Convert the lower packed unsigned 8-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
//
//go:linkname MmCvtpu8Ps MmCvtpu8Ps
//go:noescape
func MmCvtpu8Ps(r *x86.M128, v0 *x86.M64)

// Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, store the results in the lower 2 elements of "dst", then covert the packed signed 32-bit integers in "b" to single-precision (32-bit) floating-point element, and store the results in the upper 2 elements of "dst".
//
//go:linkname MmCvtpi32X2Ps MmCvtpi32X2Ps
//go:noescape
func MmCvtpi32X2Ps(r *x86.M128, v0 *x86.M64, v1 *x86.M64)

// Convert packed single-precision (32-bit) floating-point elements in "a" to packed 16-bit integers, and store the results in "dst". Note: this intrinsic will generate 0x7FFF, rather than 0x8000, for input values between 0x7FFF and 0x7FFFFFFF.
//
//go:linkname MmCvtpsPi16 MmCvtpsPi16
//go:noescape
func MmCvtpsPi16(r *x86.M64, v0 *x86.M128)

// Convert packed single-precision (32-bit) floating-point elements in "a" to packed 8-bit integers, and store the results in lower 4 elements of "dst". Note: this intrinsic will generate 0x7F, rather than 0x80, for input values between 0x7F and 0x7FFFFFFF.
//
//go:linkname MmCvtpsPi8 MmCvtpsPi8
//go:noescape
func MmCvtpsPi8(r *x86.M64, v0 *x86.M128)


================================================
FILE: x86/mmx_sse2/functions.c
================================================
#include <immintrin.h>

void MmCvtpdPi32(__m64* r, __m128d* v0) { *r = _mm_cvtpd_pi32(*v0); }
void MmCvttpdPi32(__m64* r, __m128d* v0) { *r = _mm_cvttpd_pi32(*v0); }
void MmCvtpi32Pd(__m128d* r, __m64* v0) { *r = _mm_cvtpi32_pd(*v0); }
void MmAddSi64(__m64* r, __m64* v0, __m64* v1) { *r = _mm_add_si64(*v0, *v1); }
void MmMulSu32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_mul_su32(*v0, *v1); }
void MmSubSi64(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sub_si64(*v0, *v1); }


================================================
FILE: x86/mmx_sse2/functions.go
================================================
package mmx_sse2

import (
	"github.com/alivanz/go-simd/x86"
)

/*
#cgo CFLAGS: -mmmx -msse2
#include <immintrin.h>
*/
import "C"

// Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".
//
//go:linkname MmCvtpdPi32 MmCvtpdPi32
//go:noescape
func MmCvtpdPi32(r *x86.M64, v0 *x86.M128D)

// Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".
//
//go:linkname MmCvttpdPi32 MmCvttpdPi32
//go:noescape
func MmCvttpdPi32(r *x86.M64, v0 *x86.M128D)

// Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".
//
//go:linkname MmCvtpi32Pd MmCvtpi32Pd
//go:noescape
func MmCvtpi32Pd(r *x86.M128D, v0 *x86.M64)

// Add 64-bit integers "a" and "b", and store the result in "dst".
//
//go:linkname MmAddSi64 MmAddSi64
//go:noescape
func MmAddSi64(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Multiply the low unsigned 32-bit integers from "a" and "b", and store the unsigned 64-bit result in "dst".
//
//go:linkname MmMulSu32 MmMulSu32
//go:noescape
func MmMulSu32(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Subtract 64-bit integer "b" from 64-bit integer "a", and store the result in "dst".
//
//go:linkname MmSubSi64 MmSubSi64
//go:noescape
func MmSubSi64(r *x86.M64, v0 *x86.M64, v1 *x86.M64)


================================================
FILE: x86/mmx_ssse3/functions.c
================================================
#include <immintrin.h>

void MmAbsPi8(__m64* r, __m64* v0) { *r = _mm_abs_pi8(*v0); }
void MmAbsPi16(__m64* r, __m64* v0) { *r = _mm_abs_pi16(*v0); }
void MmAbsPi32(__m64* r, __m64* v0) { *r = _mm_abs_pi32(*v0); }
void MmHaddPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_hadd_pi16(*v0, *v1); }
void MmHaddPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_hadd_pi32(*v0, *v1); }
void MmHaddsPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_hadds_pi16(*v0, *v1); }
void MmHsubPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_hsub_pi16(*v0, *v1); }
void MmHsubPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_hsub_pi32(*v0, *v1); }
void MmHsubsPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_hsubs_pi16(*v0, *v1); }
void MmMaddubsPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_maddubs_pi16(*v0, *v1); }
void MmMulhrsPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_mulhrs_pi16(*v0, *v1); }
void MmShufflePi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_shuffle_pi8(*v0, *v1); }
void MmSignPi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sign_pi8(*v0, *v1); }
void MmSignPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sign_pi16(*v0, *v1); }
void MmSignPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sign_pi32(*v0, *v1); }


================================================
FILE: x86/mmx_ssse3/functions.go
================================================
package mmx_ssse3

import (
	"github.com/alivanz/go-simd/x86"
)

/*
#cgo CFLAGS: -mmmx -mssse3
#include <immintrin.h>
*/
import "C"

// Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst".
//
//go:linkname MmAbsPi8 MmAbsPi8
//go:noescape
func MmAbsPi8(r *x86.M64, v0 *x86.M64)

// Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst".
//
//go:linkname MmAbsPi16 MmAbsPi16
//go:noescape
func MmAbsPi16(r *x86.M64, v0 *x86.M64)

// Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst".
//
//go:linkname MmAbsPi32 MmAbsPi32
//go:noescape
func MmAbsPi32(r *x86.M64, v0 *x86.M64)

// Horizontally add adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst".
//
//go:linkname MmHaddPi16 MmHaddPi16
//go:noescape
func MmHaddPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Horizontally add adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst".
//
//go:linkname MmHaddPi32 MmHaddPi32
//go:noescape
func MmHaddPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Horizontally add adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst".
//
//go:linkname MmHaddsPi16 MmHaddsPi16
//go:noescape
func MmHaddsPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Horizontally subtract adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst".
//
//go:linkname MmHsubPi16 MmHsubPi16
//go:noescape
func MmHsubPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Horizontally subtract adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst".
//
//go:linkname MmHsubPi32 MmHsubPi32
//go:noescape
func MmHsubPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Horizontally subtract adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst".
//
//go:linkname MmHsubsPi16 MmHsubsPi16
//go:noescape
func MmHsubsPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Vertically multiply each unsigned 8-bit integer from "a" with the corresponding signed 8-bit integer from "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst".
//
//go:linkname MmMaddubsPi16 MmMaddubsPi16
//go:noescape
func MmMaddubsPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst".
//
//go:linkname MmMulhrsPi16 MmMulhrsPi16
//go:noescape
func MmMulhrsPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Shuffle packed 8-bit integers in "a" according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst".
//
//go:linkname MmShufflePi8 MmShufflePi8
//go:noescape
func MmShufflePi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Negate packed 8-bit integers in "a" when the corresponding signed 8-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero.
//
//go:linkname MmSignPi8 MmSignPi8
//go:noescape
func MmSignPi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Negate packed 16-bit integers in "a" when the corresponding signed 16-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero.
//
//go:linkname MmSignPi16 MmSignPi16
//go:noescape
func MmSignPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)

// Negate packed 32-bit integers in "a" when the corresponding signed 32-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero.
//
//go:linkname MmSignPi32 MmSignPi32
//go:noescape
func MmSignPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64)


================================================
FILE: x86/popcnt/functions.c
================================================
#include <immintrin.h>

void MmPopcntU32(int* r, unsigned int* v0) { *r = _mm_popcnt_u32(*v0); }
void MmPopcntU64(long long* r, unsigned long long* v0) { *r = _mm_popcnt_u64(*v0); }


================================================
FILE: x86/popcnt/functions.go
================================================
package popcnt

import (
	"github.com/alivanz/go-simd/x86"
)

/*
#cgo CFLAGS: -mpopcnt
#include <immintrin.h>
*/
import "C"

// Count the number of bits set to 1 in unsigned 32-bit integer "a", and return that count in "dst".
//
//go:linkname MmPopcntU32 MmPopcntU32
//go:noescape
func MmPopcntU32(r *x86.Int, v0 *x86.Uint)

// Count the number of bits set to 1 in unsigned 64-bit integer "a", and return that count in "dst".
//
//go:linkname MmPopcntU64 MmPopcntU64
//go:noescape
func MmPopcntU64(r *x86.Longlong, v0 *x86.Ulonglong)


================================================
FILE: x86/sse/functions.c
================================================
#include <immintrin.h>

void MmAddSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_add_ss(*v0, *v1); }
void MmAddPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_add_ps(*v0, *v1); }
void MmSubSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_sub_ss(*v0, *v1); }
void MmSubPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_sub_ps(*v0, *v1); }
void MmMulSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_mul_ss(*v0, *v1); }
void MmMulPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_mul_ps(*v0, *v1); }
void MmDivSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_div_ss(*v0, *v1); }
void MmDivPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_div_ps(*v0, *v1); }
void MmSqrtSs(__m128* r, __m128* v0) { *r = _mm_sqrt_ss(*v0); }
void MmSqrtPs(__m128* r, __m128* v0) { *r = _mm_sqrt_ps(*v0); }
void MmRcpSs(__m128* r, __m128* v0) { *r = _mm_rcp_ss(*v0); }
void MmRcpPs(__m128* r, __m128* v0) { *r = _mm_rcp_ps(*v0); }
void MmRsqrtSs(__m128* r, __m128* v0) { *r = _mm_rsqrt_ss(*v0); }
void MmRsqrtPs(__m128* r, __m128* v0) { *r = _mm_rsqrt_ps(*v0); }
void MmMinSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_min_ss(*v0, *v1); }
void MmMinPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_min_ps(*v0, *v1); }
void MmMaxSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_max_ss(*v0, *v1); }
void MmMaxPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_max_ps(*v0, *v1); }
void MmAndPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_and_ps(*v0, *v1); }
void MmAndnotPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_andnot_ps(*v0, *v1); }
void MmOrPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_or_ps(*v0, *v1); }
void MmXorPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_xor_ps(*v0, *v1); }
void MmCmpeqSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpeq_ss(*v0, *v1); }
void MmCmpeqPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpeq_ps(*v0, *v1); }
void MmCmpltSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmplt_ss(*v0, *v1); }
void MmCmpltPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmplt_ps(*v0, *v1); }
void MmCmpleSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmple_ss(*v0, *v1); }
void MmCmplePs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmple_ps(*v0, *v1); }
void MmCmpgtSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpgt_ss(*v0, *v1); }
void MmCmpgtPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpgt_ps(*v0, *v1); }
void MmCmpgeSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpge_ss(*v0, *v1); }
void MmCmpgePs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpge_ps(*v0, *v1); }
void MmCmpneqSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpneq_ss(*v0, *v1); }
void MmCmpneqPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpneq_ps(*v0, *v1); }
void MmCmpnltSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpnlt_ss(*v0, *v1); }
void MmCmpnltPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpnlt_ps(*v0, *v1); }
void MmCmpnleSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpnle_ss(*v0, *v1); }
void MmCmpnlePs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpnle_ps(*v0, *v1); }
void MmCmpngtSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpngt_ss(*v0, *v1); }
void MmCmpngtPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpngt_ps(*v0, *v1); }
void MmCmpngeSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpnge_ss(*v0, *v1); }
void MmCmpngePs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpnge_ps(*v0, *v1); }
void MmCmpordSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpord_ss(*v0, *v1); }
void MmCmpordPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpord_ps(*v0, *v1); }
void MmCmpunordSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpunord_ss(*v0, *v1); }
void MmCmpunordPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpunord_ps(*v0, *v1); }
void MmComieqSs(int* r, __m128* v0, __m128* v1) { *r = _mm_comieq_ss(*v0, *v1); }
void MmComiltSs(int* r, __m128* v0, __m128* v1) { *r = _mm_comilt_ss(*v0, *v1); }
void MmComileSs(int* r, __m128* v0, __m128* v1) { *r = _mm_comile_ss(*v0, *v1); }
void MmComigtSs(int* r, __m128* v0, __m128* v1) { *r = _mm_comigt_ss(*v0, *v1); }
void MmComigeSs(int* r, __m128* v0, __m128* v1) { *r = _mm_comige_ss(*v0, *v1); }
void MmComineqSs(int* r, __m128* v0, __m128* v1) { *r = _mm_comineq_ss(*v0, *v1); }
void MmUcomieqSs(int* r, __m128* v0, __m128* v1) { *r = _mm_ucomieq_ss(*v0, *v1); }
void MmUcomiltSs(int* r, __m128* v0, __m128* v1) { *r = _mm_ucomilt_ss(*v0, *v1); }
void MmUcomileSs(int* r, __m128* v0, __m128* v1) { *r = _mm_ucomile_ss(*v0, *v1); }
void MmUcomigtSs(int* r, __m128* v0, __m128* v1) { *r = _mm_ucomigt_ss(*v0, *v1); }
void MmUcomigeSs(int* r, __m128* v0, __m128* v1) { *r = _mm_ucomige_ss(*v0, *v1); }
void MmUcomineqSs(int* r, __m128* v0, __m128* v1) { *r = _mm_ucomineq_ss(*v0, *v1); }
void MmCvtssSi32(int* r, __m128* v0) { *r = _mm_cvtss_si32(*v0); }
void MmCvtSs2Si(int* r, __m128* v0) { *r = _mm_cvt_ss2si(*v0); }
void MmCvtssSi64(long long* r, __m128* v0) { *r = _mm_cvtss_si64(*v0); }
void MmCvttssSi32(int* r, __m128* v0) { *r = _mm_cvttss_si32(*v0); }
void MmCvttSs2Si(int* r, __m128* v0) { *r = _mm_cvtt_ss2si(*v0); }
void MmCvttssSi64(long long* r, __m128* v0) { *r = _mm_cvttss_si64(*v0); }
void MmCvtsi32Ss(__m128* r, __m128* v0, int* v1) { *r = _mm_cvtsi32_ss(*v0, *v1); }
void MmCvtSi2Ss(__m128* r, __m128* v0, int* v1) { *r = _mm_cvt_si2ss(*v0, *v1); }
void MmCvtsi64Ss(__m128* r, __m128* v0, long long* v1) { *r = _mm_cvtsi64_ss(*v0, *v1); }
void MmCvtssF32(float* r, __m128* v0) { *r = _mm_cvtss_f32(*v0); }
void MmUndefinedPs(__m128* r) { *r = _mm_undefined_ps(); }
void MmSetSs(__m128* r, float* v0) { *r = _mm_set_ss(*v0); }
void MmSet1Ps(__m128* r, float* v0) { *r = _mm_set1_ps(*v0); }
void MmSetPs1(__m128* r, float* v0) { *r = _mm_set_ps1(*v0); }
void MmSetPs(__m128* r, float* v0, float* v1, float* v2, float* v3) { *r = _mm_set_ps(*v0, *v1, *v2, *v3); }
void MmSetrPs(__m128* r, float* v0, float* v1, float* v2, float* v3) { *r = _mm_setr_ps(*v0, *v1, *v2, *v3); }
void MmSetzeroPs(__m128* r) { *r = _mm_setzero_ps(); }
void MmUnpackhiPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_unpackhi_ps(*v0, *v1); }
void MmUnpackloPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_unpacklo_ps(*v0, *v1); }
void MmMoveSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_move_ss(*v0, *v1); }
void MmMovehlPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_movehl_ps(*v0, *v1); }
void MmMovelhPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_movelh_ps(*v0, *v1); }
void MmMovemaskPs(int* r, __m128* v0) { *r = _mm_movemask_ps(*v0); }


================================================
FILE: x86/sse/functions.go
================================================
package sse

import (
	"github.com/alivanz/go-simd/x86"
)

/*
#cgo CFLAGS: -msse
#include <immintrin.h>
*/
import "C"

// Add the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
//
//go:linkname MmAddSs MmAddSs
//go:noescape
func MmAddSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
//
//go:linkname MmAddPs MmAddPs
//go:noescape
func MmAddPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Subtract the lower single-precision (32-bit) floating-point element in "b" from the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
//
//go:linkname MmSubSs MmSubSs
//go:noescape
func MmSubSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
//
//go:linkname MmSubPs MmSubPs
//go:noescape
func MmSubPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Multiply the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
//
//go:linkname MmMulSs MmMulSs
//go:noescape
func MmMulSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
//
//go:linkname MmMulPs MmMulPs
//go:noescape
func MmMulPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Divide the lower single-precision (32-bit) floating-point element in "a" by the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
//
//go:linkname MmDivSs MmDivSs
//go:noescape
func MmDivSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst".
//
//go:linkname MmDivPs MmDivPs
//go:noescape
func MmDivPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compute the square root of the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
//
//go:linkname MmSqrtSs MmSqrtSs
//go:noescape
func MmSqrtSs(r *x86.M128, v0 *x86.M128)

// Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
//
//go:linkname MmSqrtPs MmSqrtPs
//go:noescape
func MmSqrtPs(r *x86.M128, v0 *x86.M128)

// Compute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 1.5*2^-12.
//
//go:linkname MmRcpSs MmRcpSs
//go:noescape
func MmRcpSs(r *x86.M128, v0 *x86.M128)

// Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12.
//
//go:linkname MmRcpPs MmRcpPs
//go:noescape
func MmRcpPs(r *x86.M128, v0 *x86.M128)

// Compute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 1.5*2^-12.
//
//go:linkname MmRsqrtSs MmRsqrtSs
//go:noescape
func MmRsqrtSs(r *x86.M128, v0 *x86.M128)

// Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12.
//
//go:linkname MmRsqrtPs MmRsqrtPs
//go:noescape
func MmRsqrtPs(r *x86.M128, v0 *x86.M128)

// Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper element of "dst". [min_float_note]
//
//go:linkname MmMinSs MmMinSs
//go:noescape
func MmMinSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [min_float_note]
//
//go:linkname MmMinPs MmMinPs
//go:noescape
func MmMinPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper element of "dst". [max_float_note]
//
//go:linkname MmMaxSs MmMaxSs
//go:noescape
func MmMaxSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [max_float_note]
//
//go:linkname MmMaxPs MmMaxPs
//go:noescape
func MmMaxPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
//
//go:linkname MmAndPs MmAndPs
//go:noescape
func MmAndPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst".
//
//go:linkname MmAndnotPs MmAndnotPs
//go:noescape
func MmAndnotPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
//
//go:linkname MmOrPs MmOrPs
//go:noescape
func MmOrPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
//
//go:linkname MmXorPs MmXorPs
//go:noescape
func MmXorPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for equality, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
//
//go:linkname MmCmpeqSs MmCmpeqSs
//go:noescape
func MmCmpeqSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for equality, and store the results in "dst".
//
//go:linkname MmCmpeqPs MmCmpeqPs
//go:noescape
func MmCmpeqPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for less-than, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
//
//go:linkname MmCmpltSs MmCmpltSs
//go:noescape
func MmCmpltSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for less-than, and store the results in "dst".
//
//go:linkname MmCmpltPs MmCmpltPs
//go:noescape
func MmCmpltPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for less-than-or-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
//
//go:linkname MmCmpleSs MmCmpleSs
//go:noescape
func MmCmpleSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for less-than-or-equal, and store the results in "dst".
//
//go:linkname MmCmplePs MmCmplePs
//go:noescape
func MmCmplePs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for greater-than, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
//
//go:linkname MmCmpgtSs MmCmpgtSs
//go:noescape
func MmCmpgtSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for greater-than, and store the results in "dst".
//
//go:linkname MmCmpgtPs MmCmpgtPs
//go:noescape
func MmCmpgtPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for greater-than-or-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
//
//go:linkname MmCmpgeSs MmCmpgeSs
//go:noescape
func MmCmpgeSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for greater-than-or-equal, and store the results in "dst".
//
//go:linkname MmCmpgePs MmCmpgePs
//go:noescape
func MmCmpgePs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
//
//go:linkname MmCmpneqSs MmCmpneqSs
//go:noescape
func MmCmpneqSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-equal, and store the results in "dst".
//
//go:linkname MmCmpneqPs MmCmpneqPs
//go:noescape
func MmCmpneqPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
//
//go:linkname MmCmpnltSs MmCmpnltSs
//go:noescape
func MmCmpnltSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than, and store the results in "dst".
//
//go:linkname MmCmpnltPs MmCmpnltPs
//go:noescape
func MmCmpnltPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
//
//go:linkname MmCmpnleSs MmCmpnleSs
//go:noescape
func MmCmpnleSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, and store the results in "dst".
//
//go:linkname MmCmpnlePs MmCmpnlePs
//go:noescape
func MmCmpnlePs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-greater-than, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
//
//go:linkname MmCmpngtSs MmCmpngtSs
//go:noescape
func MmCmpngtSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-greater-than, and store the results in "dst".
//
//go:linkname MmCmpngtPs MmCmpngtPs
//go:noescape
func MmCmpngtPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-greater-than-or-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
//
//go:linkname MmCmpngeSs MmCmpngeSs
//go:noescape
func MmCmpngeSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-greater-than-or-equal, and store the results in "dst".
//
//go:linkname MmCmpngePs MmCmpngePs
//go:noescape
func MmCmpngePs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" to see if neither is NaN, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
//
//go:linkname MmCmpordSs MmCmpordSs
//go:noescape
func MmCmpordSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compare packed single-precision (32-bit) floating-point elements in "a" and "b" to see if neither is NaN, and store the results in "dst".
//
//go:linkname MmCmpordPs MmCmpordPs
//go:noescape
func MmCmpordPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" to see if either is NaN, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
//
//go:linkname MmCmpunordSs MmCmpunordSs
//go:noescape
func MmCmpunordSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compare packed single-precision (32-bit) floating-point elements in "a" and "b" to see if either is NaN, and store the results in "dst".
//
//go:linkname MmCmpunordPs MmCmpunordPs
//go:noescape
func MmCmpunordPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for equality, and return the boolean result (0 or 1).
//
//go:linkname MmComieqSs MmComieqSs
//go:noescape
func MmComieqSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)

// Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for less-than, and return the boolean result (0 or 1).
//
//go:linkname MmComiltSs MmComiltSs
//go:noescape
func MmComiltSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)

// Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for less-than-or-equal, and return the boolean result (0 or 1).
//
//go:linkname MmComileSs MmComileSs
//go:noescape
func MmComileSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)

// Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for greater-than, and return the boolean result (0 or 1).
//
//go:linkname MmComigtSs MmComigtSs
//go:noescape
func MmComigtSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)

// Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for greater-than-or-equal, and return the boolean result (0 or 1).
//
//go:linkname MmComigeSs MmComigeSs
//go:noescape
func MmComigeSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)

// Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for not-equal, and return the boolean result (0 or 1).
//
//go:linkname MmComineqSs MmComineqSs
//go:noescape
func MmComineqSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)

// Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for equality, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
//
//go:linkname MmUcomieqSs MmUcomieqSs
//go:noescape
func MmUcomieqSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)

// Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for less-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
//
//go:linkname MmUcomiltSs MmUcomiltSs
//go:noescape
func MmUcomiltSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)

// Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for less-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
//
//go:linkname MmUcomileSs MmUcomileSs
//go:noescape
func MmUcomileSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)

// Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for greater-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
//
//go:linkname MmUcomigtSs MmUcomigtSs
//go:noescape
func MmUcomigtSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)

// Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for greater-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
//
//go:linkname MmUcomigeSs MmUcomigeSs
//go:noescape
func MmUcomigeSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)

// Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for not-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
//
//go:linkname MmUcomineqSs MmUcomineqSs
//go:noescape
func MmUcomineqSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)

// Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst".
//
//go:linkname MmCvtssSi32 MmCvtssSi32
//go:noescape
func MmCvtssSi32(r *x86.Int, v0 *x86.M128)

// Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst".
//
//go:linkname MmCvtSs2Si MmCvtSs2Si
//go:noescape
func MmCvtSs2Si(r *x86.Int, v0 *x86.M128)

// Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst".
//
//go:linkname MmCvtssSi64 MmCvtssSi64
//go:noescape
func MmCvtssSi64(r *x86.Longlong, v0 *x86.M128)

// Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst".
//
//go:linkname MmCvttssSi32 MmCvttssSi32
//go:noescape
func MmCvttssSi32(r *x86.Int, v0 *x86.M128)

// Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst".
//
//go:linkname MmCvttSs2Si MmCvttSs2Si
//go:noescape
func MmCvttSs2Si(r *x86.Int, v0 *x86.M128)

// Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst".
//
//go:linkname MmCvttssSi64 MmCvttssSi64
//go:noescape
func MmCvttssSi64(r *x86.Longlong, v0 *x86.M128)

// Convert the signed 32-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
//
//go:linkname MmCvtsi32Ss MmCvtsi32Ss
//go:noescape
func MmCvtsi32Ss(r *x86.M128, v0 *x86.M128, v1 *x86.Int)

// Convert the signed 32-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
//
//go:linkname MmCvtSi2Ss MmCvtSi2Ss
//go:noescape
func MmCvtSi2Ss(r *x86.M128, v0 *x86.M128, v1 *x86.Int)

// Convert the signed 64-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
//
//go:linkname MmCvtsi64Ss MmCvtsi64Ss
//go:noescape
func MmCvtsi64Ss(r *x86.M128, v0 *x86.M128, v1 *x86.Longlong)

// Copy the lower single-precision (32-bit) floating-point element of "a" to "dst".
//
//go:linkname MmCvtssF32 MmCvtssF32
//go:noescape
func MmCvtssF32(r *x86.Float, v0 *x86.M128)

// Return vector of type __m128 with undefined elements.
//
//go:linkname MmUndefinedPs MmUndefinedPs
//go:noescape
func MmUndefinedPs(r *x86.M128, )

// Copy single-precision (32-bit) floating-point element "a" to the lower element of "dst", and zero the upper 3 elements.
//
//go:linkname MmSetSs MmSetSs
//go:noescape
func MmSetSs(r *x86.M128, v0 *x86.Float)

// Broadcast single-precision (32-bit) floating-point value "a" to all elements of "dst".
//
//go:linkname MmSet1Ps MmSet1Ps
//go:noescape
func MmSet1Ps(r *x86.M128, v0 *x86.Float)

// Broadcast single-precision (32-bit) floating-point value "a" to all elements of "dst".
//
//go:linkname MmSetPs1 MmSetPs1
//go:noescape
func MmSetPs1(r *x86.M128, v0 *x86.Float)

// Set packed single-precision (32-bit) floating-point elements in "dst" with the supplied values.
//
//go:linkname MmSetPs MmSetPs
//go:noescape
func MmSetPs(r *x86.M128, v0 *x86.Float, v1 *x86.Float, v2 *x86.Float, v3 *x86.Float)

// Set packed single-precision (32-bit) floating-point elements in "dst" with the supplied values in reverse order.
//
//go:linkname MmSetrPs MmSetrPs
//go:noescape
func MmSetrPs(r *x86.M128, v0 *x86.Float, v1 *x86.Float, v2 *x86.Float, v3 *x86.Float)

// Return vector of type __m128 with all elements set to zero.
//
//go:linkname MmSetzeroPs MmSetzeroPs
//go:noescape
func MmSetzeroPs(r *x86.M128, )

// Unpack and interleave single-precision (32-bit) floating-point elements from the high half "a" and "b", and store the results in "dst".
//
//go:linkname MmUnpackhiPs MmUnpackhiPs
//go:noescape
func MmUnpackhiPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Unpack and interleave single-precision (32-bit) floating-point elements from the low half of "a" and "b", and store the results in "dst".
//
//go:linkname MmUnpackloPs MmUnpackloPs
//go:noescape
func MmUnpackloPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Move the lower single-precision (32-bit) floating-point element from "b" to the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
//
//go:linkname MmMoveSs MmMoveSs
//go:noescape
func MmMoveSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Move the upper 2 single-precision (32-bit) floating-point elements from "b" to the lower 2 elements of "dst", and copy the upper 2 elements from "a" to the upper 2 elements of "dst".
//
//go:linkname MmMovehlPs MmMovehlPs
//go:noescape
func MmMovehlPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Move the lower 2 single-precision (32-bit) floating-point elements from "b" to the upper 2 elements of "dst", and copy the lower 2 elements from "a" to the lower 2 elements of "dst".
//
//go:linkname MmMovelhPs MmMovelhPs
//go:noescape
func MmMovelhPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Set each bit of mask "dst" based on the most significant bit of the corresponding packed single-precision (32-bit) floating-point element in "a".
//
//go:linkname MmMovemaskPs MmMovemaskPs
//go:noescape
func MmMovemaskPs(r *x86.Int, v0 *x86.M128)


================================================
FILE: x86/sse2/functions.c
================================================
#include <immintrin.h>

void MmAddSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_add_sd(*v0, *v1); }
void MmAddPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_add_pd(*v0, *v1); }
void MmSubSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_sub_sd(*v0, *v1); }
void MmSubPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_sub_pd(*v0, *v1); }
void MmMulSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_mul_sd(*v0, *v1); }
void MmMulPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_mul_pd(*v0, *v1); }
void MmDivSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_div_sd(*v0, *v1); }
void MmDivPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_div_pd(*v0, *v1); }
void MmSqrtSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_sqrt_sd(*v0, *v1); }
void MmSqrtPd(__m128d* r, __m128d* v0) { *r = _mm_sqrt_pd(*v0); }
void MmMinSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_min_sd(*v0, *v1); }
void MmMinPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_min_pd(*v0, *v1); }
void MmMaxSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_max_sd(*v0, *v1); }
void MmMaxPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_max_pd(*v0, *v1); }
void MmAndPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_and_pd(*v0, *v1); }
void MmAndnotPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_andnot_pd(*v0, *v1); }
void MmOrPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_or_pd(*v0, *v1); }
void MmXorPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_xor_pd(*v0, *v1); }
void MmCmpeqPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpeq_pd(*v0, *v1); }
void MmCmpltPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmplt_pd(*v0, *v1); }
void MmCmplePd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmple_pd(*v0, *v1); }
void MmCmpgtPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpgt_pd(*v0, *v1); }
void MmCmpgePd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpge_pd(*v0, *v1); }
void MmCmpordPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpord_pd(*v0, *v1); }
void MmCmpunordPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpunord_pd(*v0, *v1); }
void MmCmpneqPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpneq_pd(*v0, *v1); }
void MmCmpnltPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpnlt_pd(*v0, *v1); }
void MmCmpnlePd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpnle_pd(*v0, *v1); }
void MmCmpngtPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpngt_pd(*v0, *v1); }
void MmCmpngePd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpnge_pd(*v0, *v1); }
void MmCmpeqSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpeq_sd(*v0, *v1); }
void MmCmpltSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmplt_sd(*v0, *v1); }
void MmCmpleSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmple_sd(*v0, *v1); }
void MmCmpgtSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpgt_sd(*v0, *v1); }
void MmCmpgeSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpge_sd(*v0, *v1); }
void MmCmpordSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpord_sd(*v0, *v1); }
void MmCmpunordSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpunord_sd(*v0, *v1); }
void MmCmpneqSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpneq_sd(*v0, *v1); }
void MmCmpnltSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpnlt_sd(*v0, *v1); }
void MmCmpnleSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpnle_sd(*v0, *v1); }
void MmCmpngtSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpngt_sd(*v0, *v1); }
void MmCmpngeSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpnge_sd(*v0, *v1); }
void MmComieqSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_comieq_sd(*v0, *v1); }
void MmComiltSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_comilt_sd(*v0, *v1); }
void MmComileSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_comile_sd(*v0, *v1); }
void MmComigtSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_comigt_sd(*v0, *v1); }
void MmComigeSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_comige_sd(*v0, *v1); }
void MmComineqSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_comineq_sd(*v0, *v1); }
void MmUcomieqSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_ucomieq_sd(*v0, *v1); }
void MmUcomiltSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_ucomilt_sd(*v0, *v1); }
void MmUcomileSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_ucomile_sd(*v0, *v1); }
void MmUcomigtSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_ucomigt_sd(*v0, *v1); }
void MmUcomigeSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_ucomige_sd(*v0, *v1); }
void MmUcomineqSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_ucomineq_sd(*v0, *v1); }
void MmCvtpdPs(__m128* r, __m128d* v0) { *r = _mm_cvtpd_ps(*v0); }
void MmCvtpsPd(__m128d* r, __m128* v0) { *r = _mm_cvtps_pd(*v0); }
void MmCvtepi32Pd(__m128d* r, __m128i* v0) { *r = _mm_cvtepi32_pd(*v0); }
void MmCvtpdEpi32(__m128i* r, __m128d* v0) { *r = _mm_cvtpd_epi32(*v0); }
void MmCvtsdSi32(int* r, __m128d* v0) { *r = _mm_cvtsd_si32(*v0); }
void MmCvtsdSs(__m128* r, __m128* v0, __m128d* v1) { *r = _mm_cvtsd_ss(*v0, *v1); }
void MmCvtsi32Sd(__m128d* r, __m128d* v0, int* v1) { *r = _mm_cvtsi32_sd(*v0, *v1); }
void MmCvtssSd(__m128d* r, __m128d* v0, __m128* v1) { *r = _mm_cvtss_sd(*v0, *v1); }
void MmCvttpdEpi32(__m128i* r, __m128d* v0) { *r = _mm_cvttpd_epi32(*v0); }
void MmCvttsdSi32(int* r, __m128d* v0) { *r = _mm_cvttsd_si32(*v0); }
void MmCvtsdF64(double* r, __m128d* v0) { *r = _mm_cvtsd_f64(*v0); }
void MmUndefinedPd(__m128d* r) { *r = _mm_undefined_pd(); }
void MmSetSd(__m128d* r, double* v0) { *r = _mm_set_sd(*v0); }
void MmSet1Pd(__m128d* r, double* v0) { *r = _mm_set1_pd(*v0); }
void MmSetPd1(__m128d* r, double* v0) { *r = _mm_set_pd1(*v0); }
void MmSetPd(__m128d* r, double* v0, double* v1) { *r = _mm_set_pd(*v0, *v1); }
void MmSetrPd(__m128d* r, double* v0, double* v1) { *r = _mm_setr_pd(*v0, *v1); }
void MmSetzeroPd(__m128d* r) { *r = _mm_setzero_pd(); }
void MmMoveSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_move_sd(*v0, *v1); }
void MmAddEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_add_epi8(*v0, *v1); }
void MmAddEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_add_epi16(*v0, *v1); }
void MmAddEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_add_epi32(*v0, *v1); }
void MmAddEpi64(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_add_epi64(*v0, *v1); }
void MmAddsEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_adds_epi8(*v0, *v1); }
void MmAddsEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_adds_epi16(*v0, *v1); }
void MmAddsEpu8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_adds_epu8(*v0, *v1); }
void MmAddsEpu16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_adds_epu16(*v0, *v1); }
void MmAvgEpu8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_avg_epu8(*v0, *v1); }
void MmAvgEpu16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_avg_epu16(*v0, *v1); }
void MmMaddEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_madd_epi16(*v0, *v1); }
void MmMaxEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_max_epi16(*v0, *v1); }
void MmMaxEpu8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_max_epu8(*v0, *v1); }
void MmMinEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_min_epi16(*v0, *v1); }
void MmMinEpu8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_min_epu8(*v0, *v1); }
void MmMulhiEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_mulhi_epi16(*v0, *v1); }
void MmMulhiEpu16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_mulhi_epu16(*v0, *v1); }
void MmMulloEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_mullo_epi16(*v0, *v1); }
void MmMulEpu32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_mul_epu32(*v0, *v1); }
void MmSadEpu8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sad_epu8(*v0, *v1); }
void MmSubEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sub_epi8(*v0, *v1); }
void MmSubEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sub_epi16(*v0, *v1); }
void MmSubEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sub_epi32(*v0, *v1); }
void MmSubEpi64(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sub_epi64(*v0, *v1); }
void MmSubsEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_subs_epi8(*v0, *v1); }
void MmSubsEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_subs_epi16(*v0, *v1); }
void MmSubsEpu8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_subs_epu8(*v0, *v1); }
void MmSubsEpu16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_subs_epu16(*v0, *v1); }
void MmAndSi128(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_and_si128(*v0, *v1); }
void MmAndnotSi128(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_andnot_si128(*v0, *v1); }
void MmOrSi128(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_or_si128(*v0, *v1); }
void MmXorSi128(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_xor_si128(*v0, *v1); }
void MmSlliEpi16(__m128i* r, __m128i* v0, int* v1) { *r = _mm_slli_epi16(*v0, *v1); }
void MmSllEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sll_epi16(*v0, *v1); }
void MmSlliEpi32(__m128i* r, __m128i* v0, int* v1) { *r = _mm_slli_epi32(*v0, *v1); }
void MmSllEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sll_epi32(*v0, *v1); }
void MmSlliEpi64(__m128i* r, __m128i* v0, int* v1) { *r = _mm_slli_epi64(*v0, *v1); }
void MmSllEpi64(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sll_epi64(*v0, *v1); }
void MmSraiEpi16(__m128i* r, __m128i* v0, int* v1) { *r = _mm_srai_epi16(*v0, *v1); }
void MmSraEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sra_epi16(*v0, *v1); }
void MmSraiEpi32(__m128i* r, __m128i* v0, int* v1) { *r = _mm_srai_epi32(*v0, *v1); }
void MmSraEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sra_epi32(*v0, *v1); }
void MmSrliEpi16(__m128i* r, __m128i* v0, int* v1) { *r = _mm_srli_epi16(*v0, *v1); }
void MmSrlEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_srl_epi16(*v0, *v1); }
void MmSrliEpi32(__m128i* r, __m128i* v0, int* v1) { *r = _mm_srli_epi32(*v0, *v1); }
void MmSrlEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_srl_epi32(*v0, *v1); }
void MmSrliEpi64(__m128i* r, __m128i* v0, int* v1) { *r = _mm_srli_epi64(*v0, *v1); }
void MmSrlEpi64(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_srl_epi64(*v0, *v1); }
void MmCmpeqEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_cmpeq_epi8(*v0, *v1); }
void MmCmpeqEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_cmpeq_epi16(*v0, *v1); }
void MmCmpeqEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_cmpeq_epi32(*v0, *v1); }
void MmCmpgtEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_cmpgt_epi8(*v0, *v1); }
void MmCmpgtEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_cmpgt_epi16(*v0, *v1); }
void MmCmpgtEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_cmpgt_epi32(*v0, *v1); }
void MmCmpltEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_cmplt_epi8(*v0, *v1); }
void MmCmpltEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_cmplt_epi16(*v0, *v1); }
void MmCmpltEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_cmplt_epi32(*v0, *v1); }
void MmCvtsi64Sd(__m128d* r, __m128d* v0, long long* v1) { *r = _mm_cvtsi64_sd(*v0, *v1); }
void MmCvtsdSi64(long long* r, __m128d* v0) { *r = _mm_cvtsd_si64(*v0); }
void MmCvttsdSi64(long long* r, __m128d* v0) { *r = _mm_cvttsd_si64(*v0); }
void MmCvtepi32Ps(__m128* r, __m128i* v0) { *r = _mm_cvtepi32_ps(*v0); }
void MmCvtpsEpi32(__m128i* r, __m128* v0) { *r = _mm_cvtps_epi32(*v0); }
void MmCvttpsEpi32(__m128i* r, __m128* v0) { *r = _mm_cvttps_epi32(*v0); }
void MmCvtsi32Si128(__m128i* r, int* v0) { *r = _mm_cvtsi32_si128(*v0); }
void MmCvtsi64Si128(__m128i* r, long long* v0) { *r = _mm_cvtsi64_si128(*v0); }
void MmCvtsi128Si32(int* r, __m128i* v0) { *r = _mm_cvtsi128_si32(*v0); }
void MmCvtsi128Si64(long long* r, __m128i* v0) { *r = _mm_cvtsi128_si64(*v0); }
void MmUndefinedSi128(__m128i* r) { *r = _mm_undefined_si128(); }
void MmSetEpi64X(__m128i* r, long long* v0, long long* v1) { *r = _mm_set_epi64x(*v0, *v1); }
void MmSetEpi64(__m128i* r, __m64* v0, __m64* v1) { *r = _mm_set_epi64(*v0, *v1); }
void MmSetEpi32(__m128i* r, int* v0, int* v1, int* v2, int* v3) { *r = _mm_set_epi32(*v0, *v1, *v2, *v3); }
void MmSetEpi16(__m128i* r, short* v0, short* v1, short* v2, short* v3, short* v4, short* v5, short* v6, short* v7) { *r = _mm_set_epi16(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7); }
void MmSetEpi8(__m128i* r, char* v0, char* v1, char* v2, char* v3, char* v4, char* v5, char* v6, char* v7, char* v8, char* v9, char* v10, char* v11, char* v12, char* v13, char* v14, char* v15) { *r = _mm_set_epi8(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7, *v8, *v9, *v10, *v11, *v12, *v13, *v14, *v15); }
void MmSet1Epi64X(__m128i* r, long long* v0) { *r = _mm_set1_epi64x(*v0); }
void MmSet1Epi64(__m128i* r, __m64* v0) { *r = _mm_set1_epi64(*v0); }
void MmSet1Epi32(__m128i* r, int* v0) { *r = _mm_set1_epi32(*v0); }
void MmSet1Epi16(__m128i* r, short* v0) { *r = _mm_set1_epi16(*v0); }
void MmSet1Epi8(__m128i* r, char* v0) { *r = _mm_set1_epi8(*v0); }
void MmSetrEpi64(__m128i* r, __m64* v0, __m64* v1) { *r = _mm_setr_epi64(*v0, *v1); }
void MmSetrEpi32(__m128i* r, int* v0, int* v1, int* v2, int* v3) { *r = _mm_setr_epi32(*v0, *v1, *v2, *v3); }
void MmSetrEpi16(__m128i* r, short* v0, short* v1, short* v2, short* v3, short* v4, short* v5, short* v6, short* v7) { *r = _mm_setr_epi16(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7); }
void MmSetrEpi8(__m128i* r, char* v0, char* v1, char* v2, char* v3, char* v4, char* v5, char* v6, char* v7, char* v8, char* v9, char* v10, char* v11, char* v12, char* v13, char* v14, char* v15) { *r = _mm_setr_epi8(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7, *v8, *v9, *v10, *v11, *v12, *v13, *v14, *v15); }
void MmSetzeroSi128(__m128i* r) { *r = _mm_setzero_si128(); }
void MmPacksEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_packs_epi16(*v0, *v1); }
void MmPacksEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_packs_epi32(*v0, *v1); }
void MmPackusEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_packus_epi16(*v0, *v1); }
void MmMovemaskEpi8(int* r, __m128i* v0) { *r = _mm_movemask_epi8(*v0); }
void MmUnpackhiEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_unpackhi_epi8(*v0, *v1); }
void MmUnpackhiEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_unpackhi_epi16(*v0, *v1); }
void MmUnpackhiEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_unpackhi_epi32(*v0, *v1); }
void MmUnpackhiEpi64(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_unpackhi_epi64(*v0, *v1); }
void MmUnpackloEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_unpacklo_epi8(*v0, *v1); }
void MmUnpackloEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_unpacklo_epi16(*v0, *v1); }
void MmUnpackloEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_unpacklo_epi32(*v0, *v1); }
void MmUnpackloEpi64(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_unpacklo_epi64(*v0, *v1); }
void MmMovepi64Pi64(__m64* r, __m128i* v0) { *r = _mm_movepi64_pi64(*v0); }
void MmMovpi64Epi64(__m128i* r, __m64* v0) { *r = _mm_movpi64_epi64(*v0); }
void MmMoveEpi64(__m128i* r, __m128i* v0) { *r = _mm_move_epi64(*v0); }
void MmUnpackhiPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_unpackhi_pd(*v0, *v1); }
void MmUnpackloPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_unpacklo_pd(*v0, *v1); }
void MmMovemaskPd(int* r, __m128d* v0) { *r = _mm_movemask_pd(*v0); }
void MmCastpdPs(__m128* r, __m128d* v0) { *r = _mm_castpd_ps(*v0); }
void MmCastpdSi128(__m128i* r, __m128d* v0) { *r = _mm_castpd_si128(*v0); }
void MmCastpsPd(__m128d* r, __m128* v0) { *r = _mm_castps_pd(*v0); }
void MmCastpsSi128(__m128i* r, __m128* v0) { *r = _mm_castps_si128(*v0); }
void MmCastsi128Ps(__m128* r, __m128i* v0) { *r = _mm_castsi128_ps(*v0); }
void MmCastsi128Pd(__m128d* r, __m128i* v0) { *r = _mm_castsi128_pd(*v0); }


================================================
FILE: x86/sse2/functions.go
================================================
package sse2

import (
	"github.com/alivanz/go-simd/x86"
)

/*
#cgo CFLAGS: -msse2
#include <immintrin.h>
*/
import "C"

// Add the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
//
//go:linkname MmAddSd MmAddSd
//go:noescape
func MmAddSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
//
//go:linkname MmAddPd MmAddPd
//go:noescape
func MmAddPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Subtract the lower double-precision (64-bit) floating-point element in "b" from the lower double-precision (64-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
//
//go:linkname MmSubSd MmSubSd
//go:noescape
func MmSubSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
//
//go:linkname MmSubPd MmSubPd
//go:noescape
func MmSubPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Multiply the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
//
//go:linkname MmMulSd MmMulSd
//go:noescape
func MmMulSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
//
//go:linkname MmMulPd MmMulPd
//go:noescape
func MmMulPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Divide the lower double-precision (64-bit) floating-point element in "a" by the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
//
//go:linkname MmDivSd MmDivSd
//go:noescape
func MmDivSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst".
//
//go:linkname MmDivPd MmDivPd
//go:noescape
func MmDivPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compute the square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
//
//go:linkname MmSqrtSd MmSqrtSd
//go:noescape
func MmSqrtSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
//
//go:linkname MmSqrtPd MmSqrtPd
//go:noescape
func MmSqrtPd(r *x86.M128D, v0 *x86.M128D)

// Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [min_float_note]
//
//go:linkname MmMinSd MmMinSd
//go:noescape
func MmMinSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [min_float_note]
//
//go:linkname MmMinPd MmMinPd
//go:noescape
func MmMinPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [max_float_note]
//
//go:linkname MmMaxSd MmMaxSd
//go:noescape
func MmMaxSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [max_float_note]
//
//go:linkname MmMaxPd MmMaxPd
//go:noescape
func MmMaxPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
//
//go:linkname MmAndPd MmAndPd
//go:noescape
func MmAndPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst".
//
//go:linkname MmAndnotPd MmAndnotPd
//go:noescape
func MmAndnotPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
//
//go:linkname MmOrPd MmOrPd
//go:noescape
func MmOrPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
//
//go:linkname MmXorPd MmXorPd
//go:noescape
func MmXorPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for equality, and store the results in "dst".
//
//go:linkname MmCmpeqPd MmCmpeqPd
//go:noescape
func MmCmpeqPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for less-than, and store the results in "dst".
//
//go:linkname MmCmpltPd MmCmpltPd
//go:noescape
func MmCmpltPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for less-than-or-equal, and store the results in "dst".
//
//go:linkname MmCmplePd MmCmplePd
//go:noescape
func MmCmplePd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for greater-than, and store the results in "dst".
//
//go:linkname MmCmpgtPd MmCmpgtPd
//go:noescape
func MmCmpgtPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for greater-than-or-equal, and store the results in "dst".
//
//go:linkname MmCmpgePd MmCmpgePd
//go:noescape
func MmCmpgePd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compare packed double-precision (64-bit) floating-point elements in "a" and "b" to see if neither is NaN, and store the results in "dst".
//
//go:linkname MmCmpordPd MmCmpordPd
//go:noescape
func MmCmpordPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compare packed double-precision (64-bit) floating-point elements in "a" and "b" to see if either is NaN, and store the results in "dst".
//
//go:linkname MmCmpunordPd MmCmpunordPd
//go:noescape
func MmCmpunordPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-equal, and store the results in "dst".
//
//go:linkname MmCmpneqPd MmCmpneqPd
//go:noescape
func MmCmpneqPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than, and store the results in "dst".
//
//go:linkname MmCmpnltPd MmCmpnltPd
//go:noescape
func MmCmpnltPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, and store the results in "dst".
//
//go:linkname MmCmpnlePd MmCmpnlePd
//go:noescape
func MmCmpnlePd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-greater-than, and store the results in "dst".
//
//go:linkname MmCmpngtPd MmCmpngtPd
//go:noescape
func MmCmpngtPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-greater-than-or-equal, and store the results in "dst".
//
//go:linkname MmCmpngePd MmCmpngePd
//go:noescape
func MmCmpngePd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for equality, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
//
//go:linkname MmCmpeqSd MmCmpeqSd
//go:noescape
func MmCmpeqSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for less-than, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
//
//go:linkname MmCmpltSd MmCmpltSd
//go:noescape
func MmCmpltSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for less-than-or-equal, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
//
//go:linkname MmCmpleSd MmCmpleSd
//go:noescape
func MmCmpleSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for greater-than, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
//
//go:linkname MmCmpgtSd MmCmpgtSd
//go:noescape
func MmCmpgtSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for greater-than-or-equal, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
//
//go:linkname MmCmpgeSd MmCmpgeSd
//go:noescape
func MmCmpgeSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" to see if neither is NaN, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
//
//go:linkname MmCmpordSd MmCmpordSd
//go:noescape
func MmCmpordSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" to see if either is NaN, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
//
//go:linkname MmCmpunordSd MmCmpunordSd
//go:noescape
func MmCmpunordSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for not-equal, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
//
//go:linkname MmCmpneqSd MmCmpneqSd
//go:noescape
func MmCmpneqSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
//
//go:linkname MmCmpnltSd MmCmpnltSd
//go:noescape
func MmCmpnltSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
//
//go:linkname MmCmpnleSd MmCmpnleSd
//go:noescape
func MmCmpnleSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for not-greater-than, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
//
//go:linkname MmCmpngtSd MmCmpngtSd
//go:noescape
func MmCmpngtSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for not-greater-than-or-equal, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
//
//go:linkname MmCmpngeSd MmCmpngeSd
//go:noescape
func MmCmpngeSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for equality, and return the boolean result (0 or 1).
//
//go:linkname MmComieqSd MmComieqSd
//go:noescape
func MmComieqSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)

// Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for less-than, and return the boolean result (0 or 1).
//
//go:linkname MmComiltSd MmComiltSd
//go:noescape
func MmComiltSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)

// Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for less-than-or-equal, and return the boolean result (0 or 1).
//
//go:linkname MmComileSd MmComileSd
//go:noescape
func MmComileSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)

// Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for greater-than, and return the boolean result (0 or 1).
//
//go:linkname MmComigtSd MmComigtSd
//go:noescape
func MmComigtSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)

// Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for greater-than-or-equal, and return the boolean result (0 or 1).
//
//go:linkname MmComigeSd MmComigeSd
//go:noescape
func MmComigeSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)

// Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for not-equal, and return the boolean result (0 or 1).
//
//go:linkname MmComineqSd MmComineqSd
//go:noescape
func MmComineqSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)

// Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for equality, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
//
//go:linkname MmUcomieqSd MmUcomieqSd
//go:noescape
func MmUcomieqSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)

// Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for less-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
//
//go:linkname MmUcomiltSd MmUcomiltSd
//go:noescape
func MmUcomiltSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)

// Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for less-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
//
//go:linkname MmUcomileSd MmUcomileSd
//go:noescape
func MmUcomileSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)

// Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for greater-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
//
//go:linkname MmUcomigtSd MmUcomigtSd
//go:noescape
func MmUcomigtSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)

// Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for greater-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
//
//go:linkname MmUcomigeSd MmUcomigeSd
//go:noescape
func MmUcomigeSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)

// Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for not-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
//
//go:linkname MmUcomineqSd MmUcomineqSd
//go:noescape
func MmUcomineqSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)

// Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
//
//go:linkname MmCvtpdPs MmCvtpdPs
//go:noescape
func MmCvtpdPs(r *x86.M128, v0 *x86.M128D)

// Convert packed single-precision (32-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".
//
//go:linkname MmCvtpsPd MmCvtpsPd
//go:noescape
func MmCvtpsPd(r *x86.M128D, v0 *x86.M128)

// Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".
//
//go:linkname MmCvtepi32Pd MmCvtepi32Pd
//go:noescape
func MmCvtepi32Pd(r *x86.M128D, v0 *x86.M128I)

// Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".
//
//go:linkname MmCvtpdEpi32 MmCvtpdEpi32
//go:noescape
func MmCvtpdEpi32(r *x86.M128I, v0 *x86.M128D)

// Convert the lower double-precision (64-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst".
//
//go:linkname MmCvtsdSi32 MmCvtsdSi32
//go:noescape
func MmCvtsdSi32(r *x86.Int, v0 *x86.M128D)

// Convert the lower double-precision (64-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
//
//go:linkname MmCvtsdSs MmCvtsdSs
//go:noescape
func MmCvtsdSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128D)

// Convert the signed 32-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
//
//go:linkname MmCvtsi32Sd MmCvtsi32Sd
//go:noescape
func MmCvtsi32Sd(r *x86.M128D, v0 *x86.M128D, v1 *x86.Int)

// Convert the lower single-precision (32-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
//
//go:linkname MmCvtssSd MmCvtssSd
//go:noescape
func MmCvtssSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128)

// Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".
//
//go:linkname MmCvttpdEpi32 MmCvttpdEpi32
//go:noescape
func MmCvttpdEpi32(r *x86.M128I, v0 *x86.M128D)

// Convert the lower double-precision (64-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst".
//
//go:linkname MmCvttsdSi32 MmCvttsdSi32
//go:noescape
func MmCvttsdSi32(r *x86.Int, v0 *x86.M128D)

// Copy the lower double-precision (64-bit) floating-point element of "a" to "dst".
//
//go:linkname MmCvtsdF64 MmCvtsdF64
//go:noescape
func MmCvtsdF64(r *x86.Double, v0 *x86.M128D)

// Return vector of type __m128d with undefined elements.
//
//go:linkname MmUndefinedPd MmUndefinedPd
//go:noescape
func MmUndefinedPd(r *x86.M128D, )

// Copy double-precision (64-bit) floating-point element "a" to the lower element of "dst", and zero the upper element.
//
//go:linkname MmSetSd MmSetSd
//go:noescape
func MmSetSd(r *x86.M128D, v0 *x86.Double)

// Broadcast double-precision (64-bit) floating-point value "a" to all elements of "dst".
//
//go:linkname MmSet1Pd MmSet1Pd
//go:noescape
func MmSet1Pd(r *x86.M128D, v0 *x86.Double)

// Broadcast double-precision (64-bit) floating-point value "a" to all elements of "dst".
//
//go:linkname MmSetPd1 MmSetPd1
//go:noescape
func MmSetPd1(r *x86.M128D, v0 *x86.Double)

// Set packed double-precision (64-bit) floating-point elements in "dst" with the supplied values.
//
//go:linkname MmSetPd MmSetPd
//go:noescape
func MmSetPd(r *x86.M128D, v0 *x86.Double, v1 *x86.Double)

// Set packed double-precision (64-bit) floating-point elements in "dst" with the supplied values in reverse order.
//
//go:linkname MmSetrPd MmSetrPd
//go:noescape
func MmSetrPd(r *x86.M128D, v0 *x86.Double, v1 *x86.Double)

// Return vector of type __m128d with all elements set to zero.
//
//go:linkname MmSetzeroPd MmSetzeroPd
//go:noescape
func MmSetzeroPd(r *x86.M128D, )

// Move the lower double-precision (64-bit) floating-point element from "b" to the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
//
//go:linkname MmMoveSd MmMoveSd
//go:noescape
func MmMoveSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Add packed 8-bit integers in "a" and "b", and store the results in "dst".
//
//go:linkname MmAddEpi8 MmAddEpi8
//go:noescape
func MmAddEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Add packed 16-bit integers in "a" and "b", and store the results in "dst".
//
//go:linkname MmAddEpi16 MmAddEpi16
//go:noescape
func MmAddEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Add packed 32-bit integers in "a" and "b", and store the results in "dst".
//
//go:linkname MmAddEpi32 MmAddEpi32
//go:noescape
func MmAddEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Add packed 64-bit integers in "a" and "b", and store the results in "dst".
//
//go:linkname MmAddEpi64 MmAddEpi64
//go:noescape
func MmAddEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst".
//
//go:linkname MmAddsEpi8 MmAddsEpi8
//go:noescape
func MmAddsEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst".
//
//go:linkname MmAddsEpi16 MmAddsEpi16
//go:noescape
func MmAddsEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst".
//
//go:linkname MmAddsEpu8 MmAddsEpu8
//go:noescape
func MmAddsEpu8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst".
//
//go:linkname MmAddsEpu16 MmAddsEpu16
//go:noescape
func MmAddsEpu16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst".
//
//go:linkname MmAvgEpu8 MmAvgEpu8
//go:noescape
func MmAvgEpu8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst".
//
//go:linkname MmAvgEpu16 MmAvgEpu16
//go:noescape
func MmAvgEpu16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst".
//
//go:linkname MmMaddEpi16 MmMaddEpi16
//go:noescape
func MmMaddEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst".
//
//go:linkname MmMaxEpi16 MmMaxEpi16
//go:noescape
func MmMaxEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst".
//
//go:linkname MmMaxEpu8 MmMaxEpu8
//go:noescape
func MmMaxEpu8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst".
//
//go:linkname MmMinEpi16 MmMinEpi16
//go:noescape
func MmMinEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst".
//
//go:linkname MmMinEpu8 MmMinEpu8
//go:noescape
func MmMinEpu8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst".
//
//go:linkname MmMulhiEpi16 MmMulhiEpi16
//go:noescape
func MmMulhiEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst".
//
//go:linkname MmMulhiEpu16 MmMulhiEpu16
//go:noescape
func MmMulhiEpu16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst".
//
//go:linkname MmMulloEpi16 MmMulloEpi16
//go:noescape
func MmMulloEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst".
//
//go:linkname MmMulEpu32 MmMulEpu32
//go:noescape
func MmMulEpu32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Compute the absolute differences of packed unsigned 8-bit integers in "a" and "b", then horizontally sum each consecutive 8 differences to produce two unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of 64-bit elements in "dst".
//
//go:linkname MmSadEpu8 MmSadEpu8
//go:noescape
func MmSadEpu8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst".
//
//go:linkname MmSubEpi8 MmSubEpi8
//go:noescape
func MmSubEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst".
//
//go:linkname MmSubEpi16 MmSubEpi16
//go:noescape
func MmSubEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst".
//
//go:linkname MmSubEpi32 MmSubEpi32
//go:noescape
func MmSubEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst".
//
//go:linkname MmSubEpi64 MmSubEpi64
//go:noescape
func MmSubEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst".
//
//go:linkname MmSubsEpi8 MmSubsEpi8
//go:noescape
func MmSubsEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst".
//
//go:linkname MmSubsEpi16 MmSubsEpi16
//go:noescape
func MmSubsEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst".
//
//go:linkname MmSubsEpu8 MmSubsEpu8
//go:noescape
func MmSubsEpu8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst".
//
//go:linkname MmSubsEpu16 MmSubsEpu16
//go:noescape
func MmSubsEpu16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Compute the bitwise AND of 128 bits (representing integer data) in "a" and "b", and store the result in "dst".
//
//go:linkname MmAndSi128 MmAndSi128
//go:noescape
func MmAndSi128(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Compute the bitwise NOT of 128 bits (representing integer data) in "a" and then AND with "b", and store the result in "dst".
//
//go:linkname MmAndnotSi128 MmAndnotSi128
//go:noescape
func MmAndnotSi128(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Compute the bitwise OR of 128 bits (representing integer data) in "a" and "b", and store the result in "dst".
//
//go:linkname MmOrSi128 MmOrSi128
//go:noescape
func MmOrSi128(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Compute the bitwise XOR of 128 bits (representing integer data) in "a" and "b", and store the result in "dst".
//
//go:linkname MmXorSi128 MmXorSi128
//go:noescape
func MmXorSi128(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".
//
//go:linkname MmSlliEpi16 MmSlliEpi16
//go:noescape
func MmSlliEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.Int)

// Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".
//
//go:linkname MmSllEpi16 MmSllEpi16
//go:noescape
func MmSllEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".
//
//go:linkname MmSlliEpi32 MmSlliEpi32
//go:noescape
func MmSlliEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.Int)

// Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".
//
//go:linkname MmSllEpi32 MmSllEpi32
//go:noescape
func MmSllEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".
//
//go:linkname MmSlliEpi64 MmSlliEpi64
//go:noescape
func MmSlliEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.Int)

// Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".
//
//go:linkname MmSllEpi64 MmSllEpi64
//go:noescape
func MmSllEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".
//
//go:linkname MmSraiEpi16 MmSraiEpi16
//go:noescape
func MmSraiEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.Int)

// Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".
//
//go:linkname MmSraEpi16 MmSraEpi16
//go:noescape
func MmSraEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".
//
//go:linkname MmSraiEpi32 MmSraiEpi32
//go:noescape
func MmSraiEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.Int)

// Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".
//
//go:linkname MmSraEpi32 MmSraEpi32
//go:noescape
func MmSraEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".
//
//go:linkname MmSrliEpi16 MmSrliEpi16
//go:noescape
func MmSrliEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.Int)

// Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".
//
//go:linkname MmSrlEpi16 MmSrlEpi16
//go:noescape
func MmSrlEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".
//
//go:linkname MmSrliEpi32 MmSrliEpi32
//go:noescape
func MmSrliEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.Int)

// Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".
//
//go:linkname MmSrlEpi32 MmSrlEpi32
//go:noescape
func MmSrlEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".
//
//go:linkname MmSrliEpi64 MmSrliEpi64
//go:noescape
func MmSrliEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.Int)

// Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".
//
//go:linkname MmSrlEpi64 MmSrlEpi64
//go:noescape
func MmSrlEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Compare packed 8-bit integers in "a" and "b" for equality, and store the results in "dst".
//
//go:linkname MmCmpeqEpi8 MmCmpeqEpi8
//go:noescape
func MmCmpeqEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Compare packed 16-bit integers in "a" and "b" for equality, and store the results in "dst".
//
//go:linkname MmCmpeqEpi16 MmCmpeqEpi16
//go:noescape
func MmCmpeqEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Compare packed 32-bit integers in "a" and "b" for equality, and store the results in "dst".
//
//go:linkname MmCmpeqEpi32 MmCmpeqEpi32
//go:noescape
func MmCmpeqEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in "dst".
//
//go:linkname MmCmpgtEpi8 MmCmpgtEpi8
//go:noescape
func MmCmpgtEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in "dst".
//
//go:linkname MmCmpgtEpi16 MmCmpgtEpi16
//go:noescape
func MmCmpgtEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in "dst".
//
//go:linkname MmCmpgtEpi32 MmCmpgtEpi32
//go:noescape
func MmCmpgtEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Compare packed signed 8-bit integers in "a" and "b" for less-than, and store the results in "dst". Note: This intrinsic emits the pcmpgtb instruction with the order of the operands switched.
//
//go:linkname MmCmpltEpi8 MmCmpltEpi8
//go:noescape
func MmCmpltEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Compare packed signed 16-bit integers in "a" and "b" for less-than, and store the results in "dst". Note: This intrinsic emits the pcmpgtw instruction with the order of the operands switched.
//
//go:linkname MmCmpltEpi16 MmCmpltEpi16
//go:noescape
func MmCmpltEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Compare packed signed 32-bit integers in "a" and "b" for less-than, and store the results in "dst". Note: This intrinsic emits the pcmpgtd instruction with the order of the operands switched.
//
//go:linkname MmCmpltEpi32 MmCmpltEpi32
//go:noescape
func MmCmpltEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Convert the signed 64-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
//
//go:linkname MmCvtsi64Sd MmCvtsi64Sd
//go:noescape
func MmCvtsi64Sd(r *x86.M128D, v0 *x86.M128D, v1 *x86.Longlong)

// Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst".
//
//go:linkname MmCvtsdSi64 MmCvtsdSi64
//go:noescape
func MmCvtsdSi64(r *x86.Longlong, v0 *x86.M128D)

// Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst".
//
//go:linkname MmCvttsdSi64 MmCvttsdSi64
//go:noescape
func MmCvttsdSi64(r *x86.Longlong, v0 *x86.M128D)

// Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
//
//go:linkname MmCvtepi32Ps MmCvtepi32Ps
//go:noescape
func MmCvtepi32Ps(r *x86.M128, v0 *x86.M128I)

// Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".
//
//go:linkname MmCvtpsEpi32 MmCvtpsEpi32
//go:noescape
func MmCvtpsEpi32(r *x86.M128I, v0 *x86.M128)

// Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".
//
//go:linkname MmCvttpsEpi32 MmCvttpsEpi32
//go:noescape
func MmCvttpsEpi32(r *x86.M128I, v0 *x86.M128)

// Copy 32-bit integer "a" to the lower elements of "dst", and zero the upper elements of "dst".
//
//go:linkname MmCvtsi32Si128 MmCvtsi32Si128
//go:noescape
func MmCvtsi32Si128(r *x86.M128I, v0 *x86.Int)

// Copy 64-bit integer "a" to the lower element of "dst", and zero the upper element.
//
//go:linkname MmCvtsi64Si128 MmCvtsi64Si128
//go:noescape
func MmCvtsi64Si128(r *x86.M128I, v0 *x86.Longlong)

// Copy the lower 32-bit integer in "a" to "dst".
//
//go:linkname MmCvtsi128Si32 MmCvtsi128Si32
//go:noescape
func MmCvtsi128Si32(r *x86.Int, v0 *x86.M128I)

// Copy the lower 64-bit integer in "a" to "dst".
//
//go:linkname MmCvtsi128Si64 MmCvtsi128Si64
//go:noescape
func MmCvtsi128Si64(r *x86.Longlong, v0 *x86.M128I)

// Return vector of type __m128i with undefined elements.
//
//go:linkname MmUndefinedSi128 MmUndefinedSi128
//go:noescape
func MmUndefinedSi128(r *x86.M128I, )

// Set packed 64-bit integers in "dst" with the supplied values.
//
//go:linkname MmSetEpi64X MmSetEpi64X
//go:noescape
func MmSetEpi64X(r *x86.M128I, v0 *x86.Longlong, v1 *x86.Longlong)

// Set packed 64-bit integers in "dst" with the supplied values.
//
//go:linkname MmSetEpi64 MmSetEpi64
//go:noescape
func MmSetEpi64(r *x86.M128I, v0 *x86.M64, v1 *x86.M64)

// Set packed 32-bit integers in "dst" with the supplied values.
//
//go:linkname MmSetEpi32 MmSetEpi32
//go:noescape
func MmSetEpi32(r *x86.M128I, v0 *x86.Int, v1 *x86.Int, v2 *x86.Int, v3 *x86.Int)

// Set packed 16-bit integers in "dst" with the supplied values.
//
//go:linkname MmSetEpi16 MmSetEpi16
//go:noescape
func MmSetEpi16(r *x86.M128I, v0 *x86.Short, v1 *x86.Short, v2 *x86.Short, v3 *x86.Short, v4 *x86.Short, v5 *x86.Short, v6 *x86.Short, v7 *x86.Short)

// Set packed 8-bit integers in "dst" with the supplied values.
//
//go:linkname MmSetEpi8 MmSetEpi8
//go:noescape
func MmSetEpi8(r *x86.M128I, v0 *x86.Char, v1 *x86.Char, v2 *x86.Char, v3 *x86.Char, v4 *x86.Char, v5 *x86.Char, v6 *x86.Char, v7 *x86.Char, v8 *x86.Char, v9 *x86.Char, v10 *x86.Char, v11 *x86.Char, v12 *x86.Char, v13 *x86.Char, v14 *x86.Char, v15 *x86.Char)

// Broadcast 64-bit integer "a" to all elements of "dst". This intrinsic may generate the "vpbroadcastq".
//
//go:linkname MmSet1Epi64X MmSet1Epi64X
//go:noescape
func MmSet1Epi64X(r *x86.M128I, v0 *x86.Longlong)

// Broadcast 64-bit integer "a" to all elements of "dst".
//
//go:linkname MmSet1Epi64 MmSet1Epi64
//go:noescape
func MmSet1Epi64(r *x86.M128I, v0 *x86.M64)

// Broadcast 32-bit integer "a" to all elements of "dst". This intrinsic may generate "vpbroadcastd".
//
//go:linkname MmSet1Epi32 MmSet1Epi32
//go:noescape
func MmSet1Epi32(r *x86.M128I, v0 *x86.Int)

// Broadcast 16-bit integer "a" to all all elements of "dst". This intrinsic may generate "vpbroadcastw".
//
//go:linkname MmSet1Epi16 MmSet1Epi16
//go:noescape
func MmSet1Epi16(r *x86.M128I, v0 *x86.Short)

// Broadcast 8-bit integer "a" to all elements of "dst". This intrinsic may generate "vpbroadcastb".
//
//go:linkname MmSet1Epi8 MmSet1Epi8
//go:noescape
func MmSet1Epi8(r *x86.M128I, v0 *x86.Char)

// Set packed 64-bit integers in "dst" with the supplied values in reverse order.
//
//go:linkname MmSetrEpi64 MmSetrEpi64
//go:noescape
func MmSetrEpi64(r *x86.M128I, v0 *x86.M64, v1 *x86.M64)

// Set packed 32-bit integers in "dst" with the supplied values in reverse order.
//
//go:linkname MmSetrEpi32 MmSetrEpi32
//go:noescape
func MmSetrEpi32(r *x86.M128I, v0 *x86.Int, v1 *x86.Int, v2 *x86.Int, v3 *x86.Int)

// Set packed 16-bit integers in "dst" with the supplied values in reverse order.
//
//go:linkname MmSetrEpi16 MmSetrEpi16
//go:noescape
func MmSetrEpi16(r *x86.M128I, v0 *x86.Short, v1 *x86.Short, v2 *x86.Short, v3 *x86.Short, v4 *x86.Short, v5 *x86.Short, v6 *x86.Short, v7 *x86.Short)

// Set packed 8-bit integers in "dst" with the supplied values in reverse order.
//
//go:linkname MmSetrEpi8 MmSetrEpi8
//go:noescape
func MmSetrEpi8(r *x86.M128I, v0 *x86.Char, v1 *x86.Char, v2 *x86.Char, v3 *x86.Char, v4 *x86.Char, v5 *x86.Char, v6 *x86.Char, v7 *x86.Char, v8 *x86.Char, v9 *x86.Char, v10 *x86.Char, v11 *x86.Char, v12 *x86.Char, v13 *x86.Char, v14 *x86.Char, v15 *x86.Char)

// Return vector of type __m128i with all elements set to zero.
//
//go:linkname MmSetzeroSi128 MmSetzeroSi128
//go:noescape
func MmSetzeroSi128(r *x86.M128I, )

// Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst".
//
//go:linkname MmPacksEpi16 MmPacksEpi16
//go:noescape
func MmPacksEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst".
//
//go:linkname MmPacksEpi32 MmPacksEpi32
//go:noescape
func MmPacksEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst".
//
//go:linkname MmPackusEpi16 MmPackusEpi16
//go:noescape
func MmPackusEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Create mask from the most significant bit of each 8-bit element in "a", and store the result in "dst".
//
//go:linkname MmMovemaskEpi8 MmMovemaskEpi8
//go:noescape
func MmMovemaskEpi8(r *x86.Int, v0 *x86.M128I)

// Unpack and interleave 8-bit integers from the high half of "a" and "b", and store the results in "dst".
//
//go:linkname MmUnpackhiEpi8 MmUnpackhiEpi8
//go:noescape
func MmUnpackhiEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Unpack and interleave 16-bit integers from the high half of "a" and "b", and store the results in "dst".
//
//go:linkname MmUnpackhiEpi16 MmUnpackhiEpi16
//go:noescape
func MmUnpackhiEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Unpack and interleave 32-bit integers from the high half of "a" and "b", and store the results in "dst".
//
//go:linkname MmUnpackhiEpi32 MmUnpackhiEpi32
//go:noescape
func MmUnpackhiEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Unpack and interleave 64-bit integers from the high half of "a" and "b", and store the results in "dst".
//
//go:linkname MmUnpackhiEpi64 MmUnpackhiEpi64
//go:noescape
func MmUnpackhiEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Unpack and interleave 8-bit integers from the low half of "a" and "b", and store the results in "dst".
//
//go:linkname MmUnpackloEpi8 MmUnpackloEpi8
//go:noescape
func MmUnpackloEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Unpack and interleave 16-bit integers from the low half of "a" and "b", and store the results in "dst".
//
//go:linkname MmUnpackloEpi16 MmUnpackloEpi16
//go:noescape
func MmUnpackloEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Unpack and interleave 32-bit integers from the low half of "a" and "b", and store the results in "dst".
//
//go:linkname MmUnpackloEpi32 MmUnpackloEpi32
//go:noescape
func MmUnpackloEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Unpack and interleave 64-bit integers from the low half of "a" and "b", and store the results in "dst".
//
//go:linkname MmUnpackloEpi64 MmUnpackloEpi64
//go:noescape
func MmUnpackloEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Copy the lower 64-bit integer in "a" to "dst".
//
//go:linkname MmMovepi64Pi64 MmMovepi64Pi64
//go:noescape
func MmMovepi64Pi64(r *x86.M64, v0 *x86.M128I)

// Copy the 64-bit integer "a" to the lower element of "dst", and zero the upper element.
//
//go:linkname MmMovpi64Epi64 MmMovpi64Epi64
//go:noescape
func MmMovpi64Epi64(r *x86.M128I, v0 *x86.M64)

// Copy the lower 64-bit integer in "a" to the lower element of "dst", and zero the upper element.
//
//go:linkname MmMoveEpi64 MmMoveEpi64
//go:noescape
func MmMoveEpi64(r *x86.M128I, v0 *x86.M128I)

// Unpack and interleave double-precision (64-bit) floating-point elements from the high half of "a" and "b", and store the results in "dst".
//
//go:linkname MmUnpackhiPd MmUnpackhiPd
//go:noescape
func MmUnpackhiPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Unpack and interleave double-precision (64-bit) floating-point elements from the low half of "a" and "b", and store the results in "dst".
//
//go:linkname MmUnpackloPd MmUnpackloPd
//go:noescape
func MmUnpackloPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Set each bit of mask "dst" based on the most significant bit of the corresponding packed double-precision (64-bit) floating-point element in "a".
//
//go:linkname MmMovemaskPd MmMovemaskPd
//go:noescape
func MmMovemaskPd(r *x86.Int, v0 *x86.M128D)

// Cast vector of type __m128d to type __m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
//
//go:linkname MmCastpdPs MmCastpdPs
//go:noescape
func MmCastpdPs(r *x86.M128, v0 *x86.M128D)

// Cast vector of type __m128d to type __m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
//
//go:linkname MmCastpdSi128 MmCastpdSi128
//go:noescape
func MmCastpdSi128(r *x86.M128I, v0 *x86.M128D)

// Cast vector of type __m128 to type __m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
//
//go:linkname MmCastpsPd MmCastpsPd
//go:noescape
func MmCastpsPd(r *x86.M128D, v0 *x86.M128)

// Cast vector of type __m128 to type __m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
//
//go:linkname MmCastpsSi128 MmCastpsSi128
//go:noescape
func MmCastpsSi128(r *x86.M128I, v0 *x86.M128)

// Cast vector of type __m128i to type __m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
//
//go:linkname MmCastsi128Ps MmCastsi128Ps
//go:noescape
func MmCastsi128Ps(r *x86.M128, v0 *x86.M128I)

// Cast vector of type __m128i to type __m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
//
//go:linkname MmCastsi128Pd MmCastsi128Pd
//go:noescape
func MmCastsi128Pd(r *x86.M128D, v0 *x86.M128I)


================================================
FILE: x86/sse3/functions.c
================================================
#include <immintrin.h>

void MmAddsubPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_addsub_ps(*v0, *v1); }
void MmHaddPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_hadd_ps(*v0, *v1); }
void MmHsubPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_hsub_ps(*v0, *v1); }
void MmMovehdupPs(__m128* r, __m128* v0) { *r = _mm_movehdup_ps(*v0); }
void MmMoveldupPs(__m128* r, __m128* v0) { *r = _mm_moveldup_ps(*v0); }
void MmAddsubPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_addsub_pd(*v0, *v1); }
void MmHaddPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_hadd_pd(*v0, *v1); }
void MmHsubPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_hsub_pd(*v0, *v1); }
void MmMovedupPd(__m128d* r, __m128d* v0) { *r = _mm_movedup_pd(*v0); }
void MmMwait(unsigned* v0, unsigned* v1) { _mm_mwait(*v0, *v1); }


================================================
FILE: x86/sse3/functions.go
================================================
package sse3

import (
	"github.com/alivanz/go-simd/x86"
)

/*
#cgo CFLAGS: -msse3
#include <immintrin.h>
*/
import "C"

// Alternatively add and subtract packed single-precision (32-bit) floating-point elements in "a" to/from packed elements in "b", and store the results in "dst".
//
//go:linkname MmAddsubPs MmAddsubPs
//go:noescape
func MmAddsubPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in "a" and "b", and pack the results in "dst".
//
//go:linkname MmHaddPs MmHaddPs
//go:noescape
func MmHaddPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in "a" and "b", and pack the results in "dst".
//
//go:linkname MmHsubPs MmHsubPs
//go:noescape
func MmHsubPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)

// Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst".
//
//go:linkname MmMovehdupPs MmMovehdupPs
//go:noescape
func MmMovehdupPs(r *x86.M128, v0 *x86.M128)

// Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst".
//
//go:linkname MmMoveldupPs MmMoveldupPs
//go:noescape
func MmMoveldupPs(r *x86.M128, v0 *x86.M128)

// Alternatively add and subtract packed double-precision (64-bit) floating-point elements in "a" to/from packed elements in "b", and store the results in "dst".
//
//go:linkname MmAddsubPd MmAddsubPd
//go:noescape
func MmAddsubPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Horizontally add adjacent pairs of double-precision (64-bit) floating-point elements in "a" and "b", and pack the results in "dst".
//
//go:linkname MmHaddPd MmHaddPd
//go:noescape
func MmHaddPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Horizontally subtract adjacent pairs of double-precision (64-bit) floating-point elements in "a" and "b", and pack the results in "dst".
//
//go:linkname MmHsubPd MmHsubPd
//go:noescape
func MmHsubPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)

// Duplicate the low double-precision (64-bit) floating-point element from "a", and store the results in "dst".
//
//go:linkname MmMovedupPd MmMovedupPd
//go:noescape
func MmMovedupPd(r *x86.M128D, v0 *x86.M128D)

// Hint to the processor that it can enter an implementation-dependent-optimized state while waiting for an event or store operation to the address range specified by MONITOR.
//
//go:linkname MmMwait MmMwait
//go:noescape
func MmMwait(v0 *x86.Unsigned, v1 *x86.Unsigned)


================================================
FILE: x86/ssse3/functions.c
================================================
#include <immintrin.h>

void MmAbsEpi8(__m128i* r, __m128i* v0) { *r = _mm_abs_epi8(*v0); }
void MmAbsEpi16(__m128i* r, __m128i* v0) { *r = _mm_abs_epi16(*v0); }
void MmAbsEpi32(__m128i* r, __m128i* v0) { *r = _mm_abs_epi32(*v0); }
void MmHaddEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_hadd_epi16(*v0, *v1); }
void MmHaddEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_hadd_epi32(*v0, *v1); }
void MmHaddsEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_hadds_epi16(*v0, *v1); }
void MmHsubEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_hsub_epi16(*v0, *v1); }
void MmHsubEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_hsub_epi32(*v0, *v1); }
void MmHsubsEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_hsubs_epi16(*v0, *v1); }
void MmMaddubsEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_maddubs_epi16(*v0, *v1); }
void MmMulhrsEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_mulhrs_epi16(*v0, *v1); }
void MmShuffleEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_shuffle_epi8(*v0, *v1); }
void MmSignEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sign_epi8(*v0, *v1); }
void MmSignEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sign_epi16(*v0, *v1); }
void MmSignEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sign_epi32(*v0, *v1); }


================================================
FILE: x86/ssse3/functions.go
================================================
package ssse3

import (
	"github.com/alivanz/go-simd/x86"
)

/*
#cgo CFLAGS: -mssse3
#include <immintrin.h>
*/
import "C"

// Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst".
//
//go:linkname MmAbsEpi8 MmAbsEpi8
//go:noescape
func MmAbsEpi8(r *x86.M128I, v0 *x86.M128I)

// Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst".
//
//go:linkname MmAbsEpi16 MmAbsEpi16
//go:noescape
func MmAbsEpi16(r *x86.M128I, v0 *x86.M128I)

// Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst".
//
//go:linkname MmAbsEpi32 MmAbsEpi32
//go:noescape
func MmAbsEpi32(r *x86.M128I, v0 *x86.M128I)

// Horizontally add adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst".
//
//go:linkname MmHaddEpi16 MmHaddEpi16
//go:noescape
func MmHaddEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Horizontally add adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst".
//
//go:linkname MmHaddEpi32 MmHaddEpi32
//go:noescape
func MmHaddEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Horizontally add adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst".
//
//go:linkname MmHaddsEpi16 MmHaddsEpi16
//go:noescape
func MmHaddsEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Horizontally subtract adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst".
//
//go:linkname MmHsubEpi16 MmHsubEpi16
//go:noescape
func MmHsubEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Horizontally subtract adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst".
//
//go:linkname MmHsubEpi32 MmHsubEpi32
//go:noescape
func MmHsubEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Horizontally subtract adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst".
//
//go:linkname MmHsubsEpi16 MmHsubsEpi16
//go:noescape
func MmHsubsEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Vertically multiply each unsigned 8-bit integer from "a" with the corresponding signed 8-bit integer from "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst".
//
//go:linkname MmMaddubsEpi16 MmMaddubsEpi16
//go:noescape
func MmMaddubsEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst".
//
//go:linkname MmMulhrsEpi16 MmMulhrsEpi16
//go:noescape
func MmMulhrsEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Shuffle packed 8-bit integers in "a" according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst".
//
//go:linkname MmShuffleEpi8 MmShuffleEpi8
//go:noescape
func MmShuffleEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Negate packed 8-bit integers in "a" when the corresponding signed 8-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero.
//
//go:linkname MmSignEpi8 MmSignEpi8
//go:noescape
func MmSignEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Negate packed 16-bit integers in "a" when the corresponding signed 16-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero.
//
//go:linkname MmSignEpi16 MmSignEpi16
//go:noescape
func MmSignEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)

// Negate packed 32-bit integers in "a" when the corresponding signed 32-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero.
//
//go:linkname MmSignEpi32 MmSignEpi32
//go:noescape
func MmSignEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)


================================================
FILE: x86/types.go
================================================
package x86

/*
#include <immintrin.h>
*/
import "C"

// typedef longlong __m64 __attribute__((__vector_size__(8), __aligned__(8)));
type M64 = C.__m64

// typedef float __m128 __attribute__((__vector_size__(16), __aligned__(16)));
type M128 = C.__m128

// typedef double __m128d __attribute__((__vector_size__(16), __aligned__(16)));
type M128D = C.__m128d

// typedef longlong __m128i __attribute__((__vector_size__(16), __aligned__(16)));
type M128I = C.__m128i

// typedef double __m256d __attribute__((__vector_size__(32), __aligned__(32)));
type M256D = C.__m256d

// typedef longlong __m256i __attribute__((__vector_size__(32), __aligned__(32)));
type M256I = C.__m256i

// uint
type Uint = C.uint

// uchar __D
type Uchar = C.uchar

// ushort __D
type Ushort = C.ushort

// ulonglong
type Ulonglong = C.ulonglong

// int __i
type Int = C.int

// longlong __i
type Longlong = C.longlong

// short __s3
type Short = C.short

// char __b7
type Char = C.char

// float
type Float = C.float

// double
type Double = C.double

// unsigned __extensions
type Unsigned = C.unsigned

// __m256
type M256 = C.__m256