LDNT1W (scalar plus immediate, consecutive registers)

Contiguous load non-temporal of words to multiple consecutive vectors (immediate index)

This instruction performs a contiguous non-temporal load of words to elements of two or four consecutive vector registers from the memory address generated by a 64-bit scalar base and immediate index that is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address.

Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.

A non-temporal load is a hint to the system that this data is unlikely to be referenced again soon.

Two registers class

(FEAT_SME2 || FEAT_SVE2p1)

Decode

if !IsFeatureImplemented(FEAT_SME2) && !IsFeatureImplemented(FEAT_SVE2p1) then EndOfDecode(Decode_UNDEF); constant integer n = UInt(Rn); constant integer g = UInt('1':PNg); constant integer nreg = 2; constant integer t = UInt(Zt:'0'); constant integer esize = 32; constant integer offset = SInt(imm4);

Four registers class

(FEAT_SME2 || FEAT_SVE2p1)

Decode

if !IsFeatureImplemented(FEAT_SME2) && !IsFeatureImplemented(FEAT_SVE2p1) then EndOfDecode(Decode_UNDEF); constant integer n = UInt(Rn); constant integer g = UInt('1':PNg); constant integer nreg = 4; constant integer t = UInt(Zt:'00'); constant integer esize = 32; constant integer offset = SInt(imm4);

Assembler Symbols

<Zt1>	For the "Two registers" variant: is the name of the first scalable vector register to be transferred, encoded as "Zt" times 2.
	For the "Four registers" variant: is the name of the first scalable vector register to be transferred, encoded as "Zt" times 4.

<Zt2>

Is the name of the second scalable vector register to be transferred, encoded as "Zt" times 2 plus 1.

<PNg>

Is the name of the governing scalable predicate register PN8-PN15, with predicate-as-counter encoding, encoded in the "PNg" field.

<Xn|SP>

Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.

<imm>	For the "Two registers" variant: is the optional signed immediate vector offset, a multiple of 2 in the range -16 to 14, defaulting to 0, encoded in the "imm4" field.
	For the "Four registers" variant: is the optional signed immediate vector offset, a multiple of 4 in the range -32 to 28, defaulting to 0, encoded in the "imm4" field.

<Zt4>

Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" times 4 plus 3.

Operation

if IsFeatureImplemented(FEAT_SVE2p1) then CheckSVEEnabled(); else CheckStreamingSVEEnabled(); constant integer VL = CurrentVL; constant integer PL = VL DIV 8; constant integer elements = VL DIV esize; constant integer mbytes = esize DIV 8; bits(64) base; bits(64) addr; constant bits(PL) pred = P[g, PL]; constant bits(PL * nreg) mask = CounterToPredicate(pred<15:0>, PL * nreg); array [0..3] of bits(VL) values; constant boolean contiguous = TRUE; constant boolean nontemporal = TRUE; constant integer transfer = t; constant boolean tagchecked = n != 31; constant AccessDescriptor accdesc = CreateAccDescSVE(MemOp_LOAD, nontemporal, contiguous, tagchecked); if !AnyActiveElement(mask, esize) then if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then CheckSPAlignment(); else if n == 31 then CheckSPAlignment(); base = if n == 31 then SP[64] else X[n, 64]; addr = AddressAdd(base, offset * nreg * elements * mbytes, accdesc); for r = 0 to nreg-1 for e = 0 to elements-1 if ActivePredicateElement(mask, r * elements + e, esize) then Elem[values[r], e, esize] = Mem[addr, mbytes, accdesc]; else Elem[values[r], e, esize] = Zeros(esize); addr = AddressIncrement(addr, mbytes, accdesc); for r = 0 to nreg-1 Z[transfer+r, VL] = values[r];

Operational information

This instruction is a data-independent-time instruction as described in About PSTATE.DIT.

31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
1	0	1	0	0	0	0	0	0	1	0	0	imm4				0	1	0	PNg			Rn					Zt				1
																	msz														N

31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
1	0	1	0	0	0	0	0	0	1	0	0	imm4				1	1	0	PNg			Rn					Zt			0	1
																	msz														N

LDNT1W (scalar plus immediate, consecutive registers)

Two registers class

(FEAT_SME2 || FEAT_SVE2p1)

Encoding

Decode

Four registers class

(FEAT_SME2 || FEAT_SVE2p1)

Encoding

Decode

Assembler Symbols

Operation

Operational information

LDNT1W (scalar plus immediate, consecutive registers)

Two registers class (FEAT_SME2 || FEAT_SVE2p1)

Encoding

Decode

Four registers class (FEAT_SME2 || FEAT_SVE2p1)

Encoding

Decode

Assembler Symbols

Operation

Operational information

Two registers class

(FEAT_SME2 || FEAT_SVE2p1)

Four registers class

(FEAT_SME2 || FEAT_SVE2p1)