Revisiting if/else in Assembly
In my previous post
("Condition Codes 1"),
I explained that some instructions can set some global
condition codes, and that these codes can be used to
conditionally execute code. I gave some examples of usage. One
such example was an assembly implementation of C's
if/else construct:
cmp r0, #20
bhi do_something_else
do_something:
@ This code runs if (r0 <= 20).
b continue @ Prevent do_something_else from executing.
do_something_else:
@ This code runs if (r0 > 20).
continue:
@ Other code.
The example is valid, and will work on any ARM core. However, is this an efficient solution if you only need to execute one or two instructions in each case? Consider the following C code:
if (a >= 10) {
a = 10;
} else {
a = a + 1;
}
It should be clear that the code increments a
unless it has hit or exceeded a limit of 10, in which case it
is set to 10. Mapping this onto our if/else
example, this might be implemented in assembly as follows:
cmp r0, #10
blo r0_is_small
r0_is_big:
mov r0, #10
b continue
r0_is_small:
add r0, r0, #1
continue:
@ Other code.
The above code executes one of two instructions, either the
mov or the add. However, it uses
two branch instructions to achieve this. Without branch
prediction, these branches can take several cycles to execute.
Even with branch prediction, the pattern may not be easily
predicted. Finally, even with perfect branch prediction, each
branch instruction takes four bytes of instruction memory, so
code size may become a problem.
An Improved Example
One of the features of the ARM instruction set is that almost
every instruction encoding includes a 4-bit field that
represents a condition code. If the condition attached to an
instruction passes, the instruction executes. Otherwise, it
has no effect, as if you had used a nop
instruction. Using this knowledge, we can implement the
previous example more efficiently as follows:
cmp r0, #10
movhs r0, #10
addlo r0, r0, #1
Unconditionally-Executed Instructions
In the ARM instruction set, the condition code is encoded using
a 4-bit field in the instruction. The encoding includes 3 bits
to identify an operation, and a fourth bit to invert the
condition. The eq condition, for example, is the
exact opposite of the ne condition. It may
interest authors of JIT compilers to know that the least
significant bit of the condition code can be inverted to obtain
the opposite condition code. For example, eq
(equal) is encoded as '0000' and
ne (not equal) as '0001'.
This works for every condition code with the exception of the
al (always) condition, encoded as
'1110'. It would be wasteful to dedicate one
sixteenth of the instruction set to instructions that can never
execute. Instead, this portion of the instruction set is used
for the few instructions which cannot be executed
conditionally.
Here are a few examples of instructions which will always execute unconditionally in the ARM instruction set:
blx <label>cannot be conditionally executed, butblx <register>(and all other branch instructions) can.- Most NEON instructions. For example, SIMD (NEON) variants
of
vaddcannot be conditionally executed, though the scalar (VFP) variants can. - Hint instructions, such as
pld(preload data). - Barriers, such as
dmb(data memory barrier),dsb(data synchronization barrier),isb(instruction synchronization barrier).
As always, the ARMv7-AR Architecture Reference Manual contains the most complete and accurate information, as does the Instruction Set Quick Reference Card.
Conditional Execution and High-Performance Processors
In the time when few processors had branch prediction and when
code size was very constrained, conditional execution was an
excellent way to save code space whilst also improving
performance in many programs. This is still true for today's
real-time processors and micro-controllers. However, ARM's
application-class processors include branch predictors which
often make the branch-based if/else construction
more attractive than conditional instructions. A predicted
branch may be very cheap, or even free in some cases. In
addition, conditional execution can, in some cases, prevent
out-of-order execution as it adds additional instruction stream
dependencies.
In some cases, it can be difficult to know whether to use conditional execution or traditional conditional branches for a particular application. However, as a general rule-of-thumb, it's probably best to use conditional instructions for sequences of three instructions or fewer, and branches for longer sequences. The best-performing solution varies between processors as they have different pipeline and branch predictor designs, and it also varies depending on the specific instruction sequence you are using. Also note that the fastest solution is not necessarily the smallest.
Thumb
In the original 16-bit Thumb instruction set, only branches
could be conditional. In Thumb-2, the it
instruction was added to provide functionality and behaviour
similar to conditional instructions in ARM. Thumb-2's
it instruction can also conditionally execute some
instructions which are normally unconditionally executed in ARM
state. I won't say more about it now, though it might be
covered by a future blog post.













