ADD (to array, array and multiple vectors): Multi-vector accumulate to ZA array vectors.
ADD (to array, multiple and single vector): Multi-vector add by vector to ZA array vectors.
ADD (to array, multiple vectors): Multi-vector add to ZA array vectors.
ADD (to vector, multiple vectors): Multi-vector add by vector to multi-vector.
ADDHA: Add horizontally vector elements to ZA tile.
ADDSPL: Add multiple of Streaming SVE predicate register size to scalar register.
ADDSVL: Add multiple of Streaming SVE vector register size to scalar register.
ADDVA: Add vertically vector elements to ZA tile.
BF1CVT, BF2CVT: Multi-vector 8-bit floating-point convert to BFloat16.
BF1CVTL, BF2CVTL: Multi-vector 8-bit floating-point convert to deinterleaved BFloat16.
BFADD: Multi-vector BFloat16 accumulate to ZA array vectors.
BFCLAMP: Multi-vector BFloat16 clamp to minimum/maximum number.
BFCVT (BFloat16 to 8-bit floating-point): Multi-vector BFloat16 convert to 8-bit floating-point.
BFCVT (single-precision to BFloat16): Multi-vector single-precision convert to BFloat16.
BFCVTN: Multi-vector single-precision convert to interleaved BFloat16.
BFDOT (multiple and indexed vector): Multi-vector BFloat16 dot product by indexed element to single-precision.
BFDOT (multiple and single vector): Multi-vector BFloat16 dot product by vector to single-precision.
BFDOT (multiple vectors): Multi-vector BFloat16 dot product to single-precision.
BFMAX (multiple and single vector): Multi-vector BFloat16 maximum by vector.
BFMAX (multiple vectors): Multi-vector BFloat16 maximum.
BFMAXNM (multiple and single vector): Multi-vector BFloat16 maximum number by vector.
BFMAXNM (multiple vectors): Multi-vector BFloat16 maximum number.
BFMIN (multiple and single vector): Multi-vector BFloat16 minimum by vector.
BFMIN (multiple vectors): Multi-vector BFloat16 minimum.
BFMINNM (multiple and single vector): Multi-vector BFloat16 minimum number by vector.
BFMINNM (multiple vectors): Multi-vector BFloat16 minimum number.
BFMLA (multiple and indexed vector): Multi-vector BFloat16 fused multiply-add by indexed element.
BFMLA (multiple and single vector): Multi-vector BFloat16 fused multiply-add by vector.
BFMLA (multiple vectors): Multi-vector BFloat16 fused multiply-add.
BFMLAL (multiple and indexed vector): Multi-vector BFloat16 multiply-add by indexed element to single-precision.
BFMLAL (multiple and single vector): Multi-vector BFloat16 multiply-add by vector to single-precision.
BFMLAL (multiple vectors): Multi-vector BFloat16 multiply-add to single-precision.
BFMLS (multiple and indexed vector): Multi-vector BFloat16 fused multiply-subtract by indexed element.
BFMLS (multiple and single vector): Multi-vector BFloat16 fused multiply-subtract by vector.
BFMLS (multiple vectors): Multi-vector BFloat16 fused multiply-subtract.
BFMLSL (multiple and indexed vector): Multi-vector BFloat16 multiply-subtract by indexed element from single-precision.
BFMLSL (multiple and single vector): Multi-vector BFloat16 multiply-subtract by vector from single-precision.
BFMLSL (multiple vectors): Multi-vector BFloat16 multiply-subtract from single-precision.
BFMOP4A (non-widening): BFloat16 quarter-tile outer product, accumulating.
BFMOP4A (widening): BFloat16 quarter-tile sum of outer products to single-precision, accumulating.
BFMOP4S (non-widening): BFloat16 quarter-tile outer product, subtracting.
BFMOP4S (widening): BFloat16 quarter-tile sum of outer products to single-precision, subtracting.
BFMOPA (non-widening): BFloat16 outer product, accumulating.
BFMOPA (widening): BFloat16 sum of outer products to single-precision, accumulating.
BFMOPS (non-widening): BFloat16 outer product, subtracting.
BFMOPS (widening): BFloat16 sum of outer products to single-precision, subtracting.
BFMUL (multiple and single vector): Multi-vector BFloat16 multiply by vector.
BFMUL (multiple vectors): Multi-vector BFloat16 multiply.
BFSCALE (multiple and single vector): Multi-vector BFloat16 adjust exponent by vector.
BFSCALE (multiple vectors): Multi-vector BFloat16 adjust exponent.
BFSUB: Multi-vector BFloat16 subtract from ZA array vectors.
BFTMOPA (non-widening): BFloat16 sparse outer product, accumulating.
BFTMOPA (widening): BFloat16 sparse sum of outer products to single-precision, accumulating.
BFVDOT: Multi-vector BFloat16 vertical dot product by indexed element to single-precision.
BMOPA: Bitwise exclusive NOR population count outer product, accumulating.
BMOPS: Bitwise exclusive NOR population count outer product, subtracting.
F1CVT, F2CVT: Multi-vector 8-bit floating-point convert to half-precision.
F1CVTL, F2CVTL: Multi-vector 8-bit floating-point convert to deinterleaved half-precision.
FADD: Multi-vector floating-point accumulate to ZA array vectors.
FAMAX: Multi-vector floating-point absolute maximum.
FAMIN: Multi-vector floating-point absolute minimum.
FCLAMP: Multi-vector floating-point clamp to minimum/maximum number.
FCVT (narrowing, FP16 to FP8): Multi-vector half-precision convert to 8-bit floating-point.
FCVT (narrowing, FP32 to FP16): Multi-vector single-precision convert to half-precision.
FCVT (narrowing, FP32 to FP8): Multi-vector single-precision convert to 8-bit floating-point.
FCVT (widening): Multi-vector half-precision convert to single-precision.
FCVTL: Multi-vector half-precision convert to deinterleaved single-precision.
FCVTN (FP32 to FP16): Multi-vector single-precision convert to interleaved half-precision.
FCVTN (FP32 to FP8): Multi-vector single-precision convert to interleaved 8-bit floating-point.
FCVTZS: Multi-vector single-precision convert to signed 32-bit integer, rounding toward zero.
FCVTZU: Multi-vector single-precision convert to unsigned 32-bit integer, rounding toward zero.
FDOT (2-way, multiple and indexed vector, FP16 to FP32): Multi-vector half-precision dot product by indexed element to single-precision.
FDOT (2-way, multiple and indexed vector, FP8 to FP16): Multi-vector 8-bit floating-point dot product by indexed element to half-precision.
FDOT (2-way, multiple and single vector, FP16 to FP32): Multi-vector half-precision dot product by vector to single-precision.
FDOT (2-way, multiple and single vector, FP8 to FP16): Multi-vector 8-bit floating-point dot product by vector to half-precision.
FDOT (2-way, multiple vectors, FP16 to FP32): Multi-vector half-precision dot product to single-precision.
FDOT (2-way, multiple vectors, FP8 to FP16): Multi-vector 8-bit floating-point dot product to half-precision.
FDOT (4-way, multiple and indexed vector): Multi-vector 8-bit floating-point dot product by indexed element to single-precision.
FDOT (4-way, multiple and single vector): Multi-vector 8-bit floating-point dot product by vector to single-precision.
FDOT (4-way, multiple vectors): Multi-vector 8-bit floating-point dot product to single-precision.
FMAX (multiple and single vector): Multi-vector floating-point maximum by vector.
FMAX (multiple vectors): Multi-vector floating-point maximum.
FMAXNM (multiple and single vector): Multi-vector floating-point maximum number by vector.
FMAXNM (multiple vectors): Multi-vector floating-point maximum number.
FMIN (multiple and single vector): Multi-vector floating-point minimum by vector.
FMIN (multiple vectors): Multi-vector floating-point minimum.
FMINNM (multiple and single vector): Multi-vector floating-point minimum number by vector.
FMINNM (multiple vectors): Multi-vector floating-point minimum number.
FMLA (multiple and indexed vector): Multi-vector floating-point fused multiply-add by indexed element.
FMLA (multiple and single vector): Multi-vector floating-point fused multiply-add by vector.
FMLA (multiple vectors): Multi-vector floating-point fused multiply-add.
FMLAL (multiple and indexed vector, FP16 to FP32): Multi-vector half-precision multiply-add by indexed element to single-precision.
FMLAL (multiple and indexed vector, FP8 to FP16): Multi-vector 8-bit floating-point multiply-add by indexed element to half-precision.
FMLAL (multiple and single vector, FP16 to FP32): Multi-vector half-precision multiply-add by vector to single-precision.
FMLAL (multiple and single vector, FP8 to FP16): Multi-vector 8-bit floating-point multiply-add by vector to half-precision.
FMLAL (multiple vectors, FP16 to FP32): Multi-vector half-precision multiply-add to single-precision.
FMLAL (multiple vectors, FP8 to FP16): Multi-vector 8-bit floating-point multiply-add to half-precision.
FMLALL (multiple and indexed vector): Multi-vector 8-bit floating-point multiply-add by indexed element to single-precision.
FMLALL (multiple and single vector): Multi-vector 8-bit floating-point multiply-add by vector to single-precision.
FMLALL (multiple vectors): Multi-vector 8-bit floating-point multiply-add to single-precision.
FMLS (multiple and indexed vector): Multi-vector floating-point fused multiply-subtract by indexed element.
FMLS (multiple and single vector): Multi-vector floating-point fused multiply-subtract by vector.
FMLS (multiple vectors): Multi-vector floating-point fused multiply-subtract.
FMLSL (multiple and indexed vector): Multi-vector half-precision multiply-subtract by indexed element from single-precision.
FMLSL (multiple and single vector): Multi-vector half-precision multiply-subtract by vector from single-precision.
FMLSL (multiple vectors): Multi-vector half-precision multiply-subtract from single-precision.
FMOP4A (non-widening): Floating-point quarter-tile outer product, accumulating.
FMOP4A (widening, 2-way, FP16 to FP32): Half-precision quarter-tile sum of outer products to single-precision, accumulating.
FMOP4A (widening, 2-way, FP8 to FP16): 8-bit floating-point quarter-tile sum of outer products to half-precision, accumulating.
FMOP4A (widening, 4-way): 8-bit floating-point quarter-tile sum of outer products to single-precision, accumulating.
FMOP4S (non-widening): Floating-point quarter-tile outer product, subtracting.
FMOP4S (widening): Half-precision quarter-tile sum of outer products to single-precision, subtracting.
FMOPA (non-widening): Floating-point outer product, accumulating.
FMOPA (widening, 2-way, FP16 to FP32): Half-precision sum of outer products to single-precision, accumulating.
FMOPA (widening, 2-way, FP8 to FP16): 8-bit floating-point sum of outer products to half-precision, accumulating.
FMOPA (widening, 4-way): 8-bit floating-point sum of outer products to single-precision, accumulating.
FMOPS (non-widening): Floating-point outer product, subtracting.
FMOPS (widening): Half-precision sum of outer products to single-precision, subtracting.
FMUL (multiple and single vector): Multi-vector floating-point multiply by vector.
FMUL (multiple vectors): Multi-vector floating-point multiply.
FRINTA: Multi-vector single-precision round to integral value, to nearest with ties away from zero.
FRINTM: Multi-vector single-precision round to integral value, toward minus Infinity.
FRINTN: Multi-vector single-precision round to integral value, to nearest with ties to even.
FRINTP: Multi-vector single-precision round to integral value, toward plus Infinity.
FSCALE (multiple and single vector): Multi-vector floating-point adjust exponent by vector.
FSCALE (multiple vectors): Multi-vector floating-point adjust exponent.
FSUB: Multi-vector floating-point subtract from ZA array vectors.
FTMOPA (non-widening): Floating-point sparse outer product, accumulating.
FTMOPA (widening, 2-way, FP16 to FP32): Half-precision sparse sum of outer products to single-precision, accumulating.
FTMOPA (widening, 2-way, FP8 to FP16): 8-bit floating-point sparse sum of outer products to half-precision, accumulating.
FTMOPA (widening, 4-way): 8-bit floating-point sparse sum of outer products to single-precision, accumulating.
FVDOT (FP16 to FP32): Multi-vector half-precision vertical dot product by indexed element to single-precision.
FVDOT (FP8 to FP16): Multi-vector 8-bit floating-point vertical dot product by indexed element to half-precision.
FVDOTB: Multi-vector 8-bit floating-point vertical dot product by indexed element to single-precision (bottom).
FVDOTT: Multi-vector 8-bit floating-point vertical dot product by indexed element to single-precision (top).
LD1B (scalar plus immediate, strided registers): Contiguous load of bytes to multiple strided vectors (immediate index).
LD1B (scalar plus scalar, strided registers): Contiguous load of bytes to multiple strided vectors (scalar index).
LD1B (scalar plus scalar, tile slice): Contiguous load of bytes to 8-bit element ZA tile slice.
LD1D (scalar plus immediate, strided registers): Contiguous load of doublewords to multiple strided vectors (immediate index).
LD1D (scalar plus scalar, strided registers): Contiguous load of doublewords to multiple strided vectors (scalar index).
LD1D (scalar plus scalar, tile slice): Contiguous load of doublewords to 64-bit element ZA tile slice.
LD1H (scalar plus immediate, strided registers): Contiguous load of halfwords to multiple strided vectors (immediate index).
LD1H (scalar plus scalar, strided registers): Contiguous load of halfwords to multiple strided vectors (scalar index).
LD1H (scalar plus scalar, tile slice): Contiguous load of halfwords to 16-bit element ZA tile slice.
LD1Q: Contiguous load of quadwords to 128-bit element ZA tile slice.
LD1W (scalar plus immediate, strided registers): Contiguous load of words to multiple strided vectors (immediate index).
LD1W (scalar plus scalar, strided registers): Contiguous load of words to multiple strided vectors (scalar index).
LD1W (scalar plus scalar, tile slice): Contiguous load of words to 32-bit element ZA tile slice.
LDNT1B (scalar plus immediate, strided registers): Contiguous load non-temporal of bytes to multiple strided vectors (immediate index).
LDNT1B (scalar plus scalar, strided registers): Contiguous load non-temporal of bytes to multiple strided vectors (scalar index).
LDNT1D (scalar plus immediate, strided registers): Contiguous load non-temporal of doublewords to multiple strided vectors (immediate index).
LDNT1D (scalar plus scalar, strided registers): Contiguous load non-temporal of doublewords to multiple strided vectors (scalar index).
LDNT1H (scalar plus immediate, strided registers): Contiguous load non-temporal of halfwords to multiple strided vectors (immediate index).
LDNT1H (scalar plus scalar, strided registers): Contiguous load non-temporal of halfwords to multiple strided vectors (scalar index).
LDNT1W (scalar plus immediate, strided registers): Contiguous load non-temporal of words to multiple strided vectors (immediate index).
LDNT1W (scalar plus scalar, strided registers): Contiguous load non-temporal of words to multiple strided vectors (scalar index).
LDR (array vector): Load ZA array vector.
LDR (table): Load ZT0 register.
LUTI2 (four registers): Lookup table read with 2-bit indexes (four registers).
LUTI2 (single): Lookup table read with 2-bit indexes (single).
LUTI2 (two registers): Lookup table read with 2-bit indexes (two registers).
LUTI4 (four registers, 16-bit and 32-bit): Lookup table read with 4-bit indexes (four registers).
LUTI4 (four registers, 8-bit): Lookup table read with 4-bit indexes and 8-bit elements (four registers).
LUTI4 (single): Lookup table read with 4-bit indexes (single).
LUTI4 (two registers): Lookup table read with 4-bit indexes (two registers).
LUTI6 (table, four registers, 8-bit): Lookup table read with 6-bit indices (8-bit).
LUTI6 (table, single, 8-bit): Lookup table read with 6-bit indices (single).
LUTI6 (vector, 16-bit): Lookup table read with 6-bit indices (16-bit).
MOV (array to vector, four registers): Move four ZA single-vector groups to Z four-vector operand: an alias of MOVA (array to vector, four registers).
MOV (array to vector, two registers): Move two ZA single-vector groups to Z two-vector operand: an alias of MOVA (array to vector, two registers).
MOV (tile to vector, four registers): Move ZA four-slice operand to Z four-vector operand: an alias of MOVA (tile to vector, four registers).
MOV (tile to vector, single): Move ZA tile slice to Z vector: an alias of MOVA (tile to vector, single).
MOV (tile to vector, two registers): Move ZA two-slice operand to Z two-vector operand: an alias of MOVA (tile to vector, two registers).
MOV (vector to array, four registers): Move Z four-vector operand to four ZA single-vector groups: an alias of MOVA (vector to array, four registers).
MOV (vector to array, two registers): Move Z two-vector operand to two ZA single-vector groups: an alias of MOVA (vector to array, two registers).
MOV (vector to tile, four registers): Move Z four-vector operand to ZA four-slice operand: an alias of MOVA (vector to tile, four registers).
MOV (vector to tile, single): Move Z vector to ZA tile slice: an alias of MOVA (vector to tile, single).
MOV (vector to tile, two registers): Move Z two-vector operand to ZA two-slice operand: an alias of MOVA (vector to tile, two registers).
MOVA (array to vector, four registers): Move four ZA single-vector groups to Z four-vector operand.
MOVA (array to vector, two registers): Move two ZA single-vector groups to Z two-vector operand.
MOVA (tile to vector, four registers): Move ZA four-slice operand to Z four-vector operand.
MOVA (tile to vector, single): Move ZA tile slice to Z vector.
MOVA (tile to vector, two registers): Move ZA two-slice operand to Z two-vector operand.
MOVA (vector to array, four registers): Move Z four-vector operand to four ZA single-vector groups.
MOVA (vector to array, two registers): Move Z two-vector operand to two ZA single-vector groups.
MOVA (vector to tile, four registers): Move Z four-vector operand to ZA four-slice operand.
MOVA (vector to tile, single): Move Z vector to ZA tile slice.
MOVA (vector to tile, two registers): Move Z two-vector operand to ZA two-slice operand.
MOVAZ (array to vector, four registers): Move and zero four ZA single-vector groups to Z four-vector operand.
MOVAZ (array to vector, two registers): Move and zero two ZA single-vector groups to Z two-vector operand.
MOVAZ (tile to vector, four registers): Move and zero ZA four-slice operand to Z four-vector operand.
MOVAZ (tile to vector, single): Move and zero ZA tile slice to Z vector.
MOVAZ (tile to vector, two registers): Move and zero ZA two-slice operand to Z two-vector operand.
MOVT (scalar to table): Move 8 bytes from general-purpose register to ZT0.
MOVT (table to scalar): Move 8 bytes from ZT0 to general-purpose register.
MOVT (vector to table): Move vector register to ZT0.
RDSVL: Read multiple of Streaming SVE vector register size to scalar register.
SCLAMP: Multi-vector signed clamp to minimum/maximum.
SCVTF: Multi-vector signed 32-bit integer convert to single-precision.
SDOT (2-way, multiple and indexed vector): Multi-vector signed 16-bit integer dot product by indexed element to 32-bit integer.
SDOT (2-way, multiple and single vector): Multi-vector signed 16-bit integer dot product by vector to 32-bit integer.
SDOT (2-way, multiple vectors): Multi-vector signed 16-bit integer dot product to 32-bit integer.
SDOT (4-way, multiple and indexed vector): Multi-vector signed integer dot product by indexed element.
SDOT (4-way, multiple and single vector): Multi-vector signed integer dot product by vector.
SDOT (4-way, multiple vectors): Multi-vector signed integer dot product.
SEL: Multi-vector conditional select.
SMAX (multiple and single vector): Multi-vector signed maximum by vector.
SMAX (multiple vectors): Multi-vector signed maximum.
SMIN (multiple and single vector): Multi-vector signed minimum by vector.
SMIN (multiple vectors): Multi-vector signed minimum.
SMLAL (multiple and indexed vector): Multi-vector signed 16-bit integer multiply-add by indexed element to 32-bit integer.
SMLAL (multiple and single vector): Multi-vector signed 16-bit integer multiply-add by vector to 32-bit integer.
SMLAL (multiple vectors): Multi-vector signed 16-bit integer multiply-add to 32-bit integer.
SMLALL (multiple and indexed vector): Multi-vector signed integer multiply-add long long by indexed element.
SMLALL (multiple and single vector): Multi-vector signed integer multiply-add long long by vector.
SMLALL (multiple vectors): Multi-vector signed integer multiply-add long long.
SMLSL (multiple and indexed vector): Multi-vector signed 16-bit integer multiply-subtract by indexed element from 32-bit integer.
SMLSL (multiple and single vector): Multi-vector signed 16-bit integer multiply-subtract by vector from 32-bit integer.
SMLSL (multiple vectors): Multi-vector signed 16-bit integer multiply-subtract from 32-bit integer.
SMLSLL (multiple and indexed vector): Multi-vector signed integer multiply-subtract long long by indexed element.
SMLSLL (multiple and single vector): Multi-vector signed integer multiply-subtract long long by vector.
SMLSLL (multiple vectors): Multi-vector signed integer multiply-subtract long long.
SMOP4A (2-way): Signed 16-bit integer quarter-tile sum of outer products to 32-bit integer, accumulating.
SMOP4A (4-way): Signed integer quarter-tile sum of outer products, accumulating.
SMOP4S (2-way): Signed 16-bit integer quarter-tile sum of outer products to 32-bit integer, subtracting.
SMOP4S (4-way): Signed integer quarter-tile sum of outer products, subtracting.
SMOPA (2-way): Signed 16-bit integer sum of outer products to 32-bit integer, accumulating.
SMOPA (4-way): Signed integer sum of outer products, accumulating.
SMOPS (2-way): Signed 16-bit integer sum of outer products to 32-bit integer, subtracting.
SMOPS (4-way): Signed integer sum of outer products, subtracting.
SQCVT (four registers): Multi-vector signed saturating extract narrow.
SQCVT (two registers): Multi-vector signed 32-bit integer saturating extract narrow to 16-bit integer.
SQCVTN: Multi-vector signed saturating extract narrow to interleaved integer.
SQCVTU (four registers): Multi-vector signed saturating extract narrow to unsigned integer.
SQCVTU (two registers): Multi-vector signed 32-bit integer saturating extract narrow to unsigned 16-bit integer.
SQCVTUN: Multi-vector signed saturating extract narrow to interleaved unsigned integer.
SQDMULH (multiple and single vector): Multi-vector signed saturating doubling multiply high by vector.
SQDMULH (multiple vectors): Multi-vector signed saturating doubling multiply high.
SQRSHR (four registers): Multi-vector signed saturating rounding shift right narrow by immediate.
SQRSHR (two registers): Multi-vector signed 32-bit integer saturating rounding shift right narrow by immediate to 16-bit integer.
SQRSHRN: Multi-vector signed saturating rounding shift right narrow by immediate to interleaved integer.
SQRSHRU (four registers): Multi-vector signed saturating rounding shift right narrow by immediate to unsigned integer.
SQRSHRU (two registers): Multi-vector signed 32-bit integer saturating rounding shift right narrow by immediate to unsigned 16-bit integer.
SQRSHRUN: Multi-vector signed saturating rounding shift right narrow by immediate to interleaved unsigned integer.
SRSHL (multiple and single vector): Multi-vector signed rounding shift left by vector.
SRSHL (multiple vectors): Multi-vector signed rounding shift left.
ST1B (scalar plus immediate, strided registers): Contiguous store of bytes from multiple strided vectors (immediate index).
ST1B (scalar plus scalar, strided registers): Contiguous store of bytes from multiple strided vectors (scalar index).
ST1B (scalar plus scalar, tile slice): Contiguous store of bytes from 8-bit element ZA tile slice.
ST1D (scalar plus immediate, strided registers): Contiguous store of doublewords from multiple strided vectors (immediate index).
ST1D (scalar plus scalar, strided registers): Contiguous store of doublewords from multiple strided vectors (scalar index).
ST1D (scalar plus scalar, tile slice): Contiguous store of doublewords from 64-bit element ZA tile slice.
ST1H (scalar plus immediate, strided registers): Contiguous store of halfwords from multiple strided vectors (immediate index).
ST1H (scalar plus scalar, strided registers): Contiguous store of halfwords from multiple strided vectors (scalar index).
ST1H (scalar plus scalar, tile slice): Contiguous store of halfwords from 16-bit element ZA tile slice.
ST1Q: Contiguous store of quadwords from 128-bit element ZA tile slice.
ST1W (scalar plus immediate, strided registers): Contiguous store of words from multiple strided vectors (immediate index).
ST1W (scalar plus scalar, strided registers): Contiguous store of words from multiple strided vectors (scalar index).
ST1W (scalar plus scalar, tile slice): Contiguous store of words from 32-bit element ZA tile slice.
STMOPA (2-way): Signed 16-bit integer sparse sum of outer products to 32-bit integer, accumulating.
STMOPA (4-way): Signed 8-bit integer sparse sum of outer products to 32-bit integer, accumulating.
STNT1B (scalar plus immediate, strided registers): Contiguous store non-temporal of bytes from multiple strided vectors (immediate index).
STNT1B (scalar plus scalar, strided registers): Contiguous store non-temporal of bytes from multiple strided vectors (scalar index).
STNT1D (scalar plus immediate, strided registers): Contiguous store non-temporal of doublewords from multiple strided vectors (immediate index).
STNT1D (scalar plus scalar, strided registers): Contiguous store non-temporal of doublewords from multiple strided vectors (scalar index).
STNT1H (scalar plus immediate, strided registers): Contiguous store non-temporal of halfwords from multiple strided vectors (immediate index).
STNT1H (scalar plus scalar, strided registers): Contiguous store non-temporal of halfwords from multiple strided vectors (scalar index).
STNT1W (scalar plus immediate, strided registers): Contiguous store non-temporal of words from multiple strided vectors (immediate index).
STNT1W (scalar plus scalar, strided registers): Contiguous store non-temporal of words from multiple strided vectors (scalar index).
STR (array vector): Store ZA array vector.
STR (table): Store ZT0 register.
SUB (to array, array and multiple vectors): Multi-vector subtract from ZA array vectors.
SUB (to array, multiple and single vector): Multi-vector subtract by vector to ZA array vectors.
SUB (to array, multiple vectors): Multi-vector subtract to ZA array vectors.
SUDOT (4-way, multiple and indexed vector): Multi-vector signed by unsigned 8-bit integer dot product by indexed elements to 32-bit integer.
SUDOT (4-way, multiple and single vector): Multi-vector signed by unsigned 8-bit integer dot product by vector to 32-bit integer.
SUMLALL (multiple and indexed vector): Multi-vector signed by unsigned 8-bit integer multiply-add by indexed element to 32-bit integer.
SUMLALL (multiple and single vector): Multi-vector signed by unsigned 8-bit integer multiply-add by vector to 32-bit integer.
SUMOP4A: Signed by unsigned integer quarter-tile sum of outer products, accumulating.
SUMOP4S: Signed by unsigned integer quarter-tile sum of outer products, subtracting.
SUMOPA (4-way): Signed by unsigned integer sum of outer products, accumulating.
SUMOPS: Signed by unsigned integer sum of outer products, subtracting.
SUNPK: Unpack and sign-extend multi-vector elements.
SUTMOPA: Signed by unsigned 8-bit integer sparse sum of outer products to 32-bit integer, accumulating.
SUVDOT: Multi-vector signed by unsigned 8-bit integer vertical dot product by indexed element to 32-bit integer.
SVDOT (2-way): Multi-vector signed 16-bit integer vertical dot product by indexed element to 32-bit integer.
SVDOT (4-way): Multi-vector signed integer vertical dot product by indexed element.
UCLAMP: Multi-vector unsigned clamp to minimum/maximum.
UCVTF: Multi-vector unsigned 32-bit integer convert to single-precision.
UDOT (2-way, multiple and indexed vector): Multi-vector unsigned 16-bit integer dot product by indexed element to 32-bit integer.
UDOT (2-way, multiple and single vector): Multi-vector unsigned 16-bit integer dot product by vector to 32-bit integer.
UDOT (2-way, multiple vectors): Multi-vector unsigned 16-bit integer dot product to 32-bit integer.
UDOT (4-way, multiple and indexed vector): Multi-vector unsigned integer dot product by indexed element.
UDOT (4-way, multiple and single vector): Multi-vector unsigned integer dot product by vector.
UDOT (4-way, multiple vectors): Multi-vector unsigned integer dot product.
UMAX (multiple and single vector): Multi-vector unsigned maximum by vector.
UMAX (multiple vectors): Multi-vector unsigned maximum.
UMIN (multiple and single vector): Multi-vector unsigned minimum by vector.
UMIN (multiple vectors): Multi-vector unsigned minimum.
UMLAL (multiple and indexed vector): Multi-vector unsigned 16-bit integer multiply-add by indexed element to 32-bit integer.
UMLAL (multiple and single vector): Multi-vector unsigned 16-bit integer multiply-add by vector to 32-bit integer.
UMLAL (multiple vectors): Multi-vector unsigned 16-bit integer multiply-add to 32-bit integer.
UMLALL (multiple and indexed vector): Multi-vector unsigned integer multiply-add long long by indexed element.
UMLALL (multiple and single vector): Multi-vector unsigned integer multiply-add long long by vector.
UMLALL (multiple vectors): Multi-vector unsigned integer multiply-add long long.
UMLSL (multiple and indexed vector): Multi-vector unsigned 16-bit integer multiply-subtract by indexed element from 32-bit integer.
UMLSL (multiple and single vector): Multi-vector unsigned 16-bit integer multiply-subtract by vector from 32-bit integer.
UMLSL (multiple vectors): Multi-vector unsigned 16-bit integer multiply-subtract from 32-bit integer.
UMLSLL (multiple and indexed vector): Multi-vector unsigned integer multiply-subtract long long by indexed element.
UMLSLL (multiple and single vector): Multi-vector unsigned integer multiply-subtract long long by vector.
UMLSLL (multiple vectors): Multi-vector unsigned integer multiply-subtract long long.
UMOP4A (2-way): Unsigned 16-bit integer quarter-tile sum of outer products to 32-bit integer, accumulating.
UMOP4A (4-way): Unsigned integer quarter-tile sum of outer products, accumulating.
UMOP4S (2-way): Unsigned 16-bit integer quarter-tile sum of outer products to 32-bit integer, subtracting.
UMOP4S (4-way): Unsigned integer quarter-tile sum of outer products, subtracting.
UMOPA (2-way): Unsigned 16-bit integer sum of outer products to 32-bit integer, accumulating.
UMOPA (4-way): Unsigned integer sum of outer products, accumulating.
UMOPS (2-way): Unsigned 16-bit integer sum of outer products to 32-bit integer, subtracting.
UMOPS (4-way): Unsigned integer sum of outer products, subtracting.
UQCVT (four registers): Multi-vector unsigned saturating extract narrow.
UQCVT (two registers): Multi-vector unsigned 32-bit integer saturating extract narrow to 16-bit integer.
UQCVTN: Multi-vector unsigned saturating extract narrow to interleaved integer.
UQRSHR (four registers): Multi-vector unsigned saturating rounding shift right narrow by immediate.
UQRSHR (two registers): Multi-vector unsigned 32-bit integer saturating rounding shift right narrow by immediate to 16-bit integer.
UQRSHRN: Multi-vector unsigned saturating rounding shift right narrow by immediate to interleaved integer.
URSHL (multiple and single vector): Multi-vector unsigned rounding shift left by vector.
URSHL (multiple vectors): Multi-vector unsigned rounding shift left.
USDOT (4-way, multiple and indexed vector): Multi-vector unsigned by signed 8-bit integer dot product by indexed element to 32-bit integer.
USDOT (4-way, multiple and single vector): Multi-vector unsigned by signed 8-bit integer dot product by vector to 32-bit integer.
USDOT (4-way, multiple vectors): Multi-vector unsigned by signed 8-bit integer dot product to 32-bit integer.
USMLALL (multiple and indexed vector): Multi-vector unsigned by signed 8-bit integer multiply-add by indexed element to 32-bit integer.
USMLALL (multiple and single vector): Multi-vector unsigned by signed 8-bit integer multiply-add by vector to 32-bit integer.
USMLALL (multiple vectors): Multi-vector unsigned by signed 8-bit integer multiply-add to 32-bit integer.
USMOP4A: Unsigned by signed integer quarter-tile sum of outer products, accumulating.
USMOP4S: Unsigned by signed integer quarter-tile sum of outer products, subtracting.
USMOPA (4-way): Unsigned by signed integer sum of outer products, accumulating.
USMOPS: Unsigned by signed integer sum of outer products, subtracting.
USTMOPA: Unsigned by signed 8-bit integer sparse sum of outer products to 32-bit integer, accumulating.
USVDOT: Multi-vector unsigned by signed 8-bit integer vertical dot product by indexed element to 32-bit integer.
UTMOPA (2-way): Unsigned 16-bit integer sparse sum of outer products to 32-bit integer, accumulating.
UTMOPA (4-way): Unsigned 8-bit integer sparse sum of outer products to 32-bit integer, accumulating.
UUNPK: Unpack and zero-extend multi-vector elements.
UVDOT (2-way): Multi-vector unsigned 16-bit integer vertical dot product by indexed element to 32-bit integer.
UVDOT (4-way): Multi-vector unsigned integer vertical dot product by indexed element.
UZP (four registers): Concatenate elements from four vectors.
UZP (two registers): Concatenate elements from two vectors.
ZERO (double-vector): Zero ZA double-vector groups.
ZERO (quad-vector): Zero ZA quad-vector groups.
ZERO (single-vector): Zero ZA single-vector groups.
ZERO (table): Zero ZT0.
ZERO (tiles): Zero a list of 64-bit element ZA tiles.
ZIP (four registers): Interleave elements from four vectors.
ZIP (two registers): Interleave elements from two vectors.