SSE4 instruction set
Mathieu Monnier
mathieu.monnier at polytechnique.org
Mon Sep 18 09:53:38 PDT 2006
Hi,
We finally manage to test all SSE4 instructions. They all work. The
following page is a rather good description of what they do :
http://dango.kousaku.in/hiki/SSE4.html
However, it makes some mistakes / omissions :
- phaddsw, phsubsw : the result is saturated to [-32768:32767]
- pshufb : the bytes are shuffled according to the second argument's
bytes, if the second argument's byte is negative, the result is zeroed
- palignr : it works the other way around ( the example describe
palignr mm0, mm1, 3, and not palignr mm0, mm1, 5 as said )
- psign{b,w,d} : zeroes the result if the sign register is zero
- pmulhrsw : does (((x*y)>>14) + 1)>>1 (IIRC, to be confirmed)
- pabs{b,w,d} : I don't remember what is does on -128, -32768 and
-2^31 (see the test code tomorrow)
I'll provide the test program with both the C code and the ASM one tomorrow.
Regards,
Mathieu
More information about the yasm-devel
mailing list