FDOT (half-precision to single-precision, by element)

Half-precision dot product to single-precision (vector, by element)

This instruction computes the fused sum-of-products of a pair of half-precision values held in each 32-bit element of the first source vector and a pair of half-precision values held in an indexed 32-bit element of the second source vector, without intermediate rounding, and then destructively adds the single-precision sum-of-products to the corresponding single-precision element of the destination vector.

Advanced SIMD class

(FEAT_F16F32DOT)

31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
0	Q	0	0	1	1	1	1	0	1	L	M	Rm				1	0	0	1	H	0	Rn					Rd
		U						size								opcode

Encoding

FDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.2H[<index>]

Decode

if !IsFeatureImplemented(FEAT_F16F32DOT) then EndOfDecode(Decode_UNDEF); constant integer n = UInt(Rn); constant integer m = UInt(M:Rm); constant integer d = UInt(Rd); constant integer i = UInt(H:L); constant integer datasize = 64 << UInt(Q); constant integer elements = datasize DIV 32;

Assembler Symbols

<Vd>	Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta>

Is an arrangement specifier, encoded in Q:

Q	<Ta>
0	2S
1	4S

<Vn>	Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb>

Is an arrangement specifier, encoded in Q:

Q	<Tb>
0	4H
1	8H

<Vm>	Is the name of the second SIMD&FP source register, encoded in the "M:Rm" fields.

<index>

Is the immediate index of a pair of 16-bit elements in the range 0 to 3, encoded in the "H:L" fields.

Operation

AArch64.CheckFPAdvSIMDEnabled(); constant bits(datasize) operand1 = V[n, datasize]; constant bits(128) operand2 = V[m, 128]; constant bits(datasize) operand3 = V[d, datasize]; bits(datasize) result; for e = 0 to elements-1 constant bits(16) elt1_a = Elem[operand1, 2 * e + 0, 16]; constant bits(16) elt1_b = Elem[operand1, 2 * e + 1, 16]; constant bits(16) elt2_a = Elem[operand2, 2 * i + 0, 16]; constant bits(16) elt2_b = Elem[operand2, 2 * i + 1, 16]; constant bits(32) sum = Elem[operand3, e, 32]; Elem[result, e, 32] = FPDotAdd(sum, elt1_a, elt1_b, elt2_a, elt2_b, FPCR); V[d, datasize] = result;