SSE4 instruction set

Mathieu Monnier mathieu.monnier at polytechnique.org
Mon Sep 18 09:53:38 PDT 2006


Hi,

We finally manage to test all SSE4 instructions. They all work. The 
following page is a rather good description of what they do : 
http://dango.kousaku.in/hiki/SSE4.html

However, it makes some mistakes / omissions :
  - phaddsw, phsubsw : the result is saturated to [-32768:32767]
  - pshufb : the bytes are shuffled according to the second argument's 
bytes, if the second argument's byte is negative, the result is zeroed
  - palignr : it works the other way around ( the example describe 
palignr mm0, mm1, 3, and not palignr mm0, mm1, 5 as said )
  - psign{b,w,d} : zeroes the result if the sign register is zero
  - pmulhrsw : does (((x*y)>>14) + 1)>>1 (IIRC, to be confirmed)
  - pabs{b,w,d} : I don't remember what is does on -128, -32768 and 
-2^31 (see the test code tomorrow)

I'll provide the test program with both the C code and the ASM one tomorrow.

Regards,

Mathieu



More information about the yasm-devel mailing list