I tried to find out the speed difference between plain loops, loop loops and builtin rep loops. I wrote three programs to compare the behavior:
Program 1
_start: xor %ecx,%ecx
0: not %ecx
dec %ecx
jnz 0b
mov $1,%eax
xor %ebx,%ebx
int $0x80 # syscall 1: exit
Program 2
_start: xor %ecx,%ecx
not %ecx
loop .
mov $1,%eax
xor %ebx,%ebx
int $0x80
Program 3
_start: xor %ecx,%ecx
not %ecx
rep nop # Do nothing but decrement ecx
mov $1,%eax
xor %ebx,%ebx
int $0x80
It turned out the third program doesn't work as expected, and some recherche tells me, that rep nop aka pause does something completely unrelated.
What are the rep, repz and repnz prefixes doing, when the instruction following them is not a string instruction?