Saturated arithmetic
Unsigned types in C and C++ have pretty straightforward and predictable behavior upon an overflow or underflow – namely, they’re wrapping around (whereas signed types don’t like {over,under}flows – the behavior is undefined). However, sometimes wraparound doesn’t fit our needs, because we want the value to be set to the maximum in case of overflow and to the minimum in case of underflow. That’s exactly what saturated arithmetic is.
Of course, it’s very easy to implement it using a lot of if statements and other things involving branches – however, as you may already noticed, I don’t like branches ;> So, we’re going to use branchless equivalents.
We’ll be interested in keeping our values in range [0..255], i.e. the range of unsigned char on x86 and ARM (these are the only architectures of my interest ;>). Also, we’ll assume that values are already clamped to that range. It should also be simple to make the code work on unsigned shorts and the range [0..65535], by replacing every 8 for 16 and 16 for 32 (including these in type names) ;>
Okay, so let’s start by doing something simple, like incrementation. There’s a nice piece of code for doing that in a comment by Michael. I improved that code a bit, getting:
inline uint8 IncSaturated8(uint8 x) { uint32 carry = ((uint32(x) + 1) >> 8); uint32 temp = uint32(x) + (carry ^ 1); return uint8(temp); }
carry is equal to 1 if and only if x is equal to 255, because then x + 1 is equal to 256, and 256 >> 8 is of course 1. We XOR carry with 1, to get 1 IFF there was no carry and 0 otherwise. Finally, 255 + 0 = 255 and any other value + 1 = any other value + 1, giving us nice, branchless saturated increment.
Decrementation is similar:
inline uint8 DecSaturated8(uint8 x) { uint32 carry = ((uint32(x) - 1) >> 31); uint32 temp = uint32(x) - (carry ^ 1); return uint8(temp); }
Here, carry equals to 1 only when x equals to 0, because x - 1 underflows and gives us nice, large result with highest bit set. The rest is the same as in incrementation.
Next operation is addition:
inline uint8 AddSaturated8(uint8 x, uint8 y) { uint32 sum = uint32(x) + uint32(y); uint32 carry = sum >> 8; return (-carry | sum); }
In this one, carry is 1 IFF the sum didn’t fit in 8 bytes (as in incrementation). We’re then OR’ing the sum with negated carry (i.e. 0 if no carry and -1 if carry), receiving 0xFF..FF | x, which resolves to 0xFF..FF, in case of overflow and 0 | x, which is the same as x, when no overflow occurs. After getting truncated to 8 bits the result is either 255 or x + y, exactly what we wanted it to be.
Subtraction is similar to addition:
inline uint8 SubSaturated8(uint8 x, uint8 y) { uint32 difference = uint32(x) - uint32(y); uint32 carry = (difference >> 31) ^ 1; return (-carry & difference); }
This time, carry is actually !carry, since it’s 0 when underflow occurred and 1 otherwise. After negation, it’s 0 or -1, respectively. So we have two cases – 0xFF..FF & difference (which equals to difference) when there was no underflow and 0 & difference (which is 0) when operation underflowed.
Only two operations are left – multiplication and division. We’ll cope with the latter first, because it’s simplier:
inline uint8 DivSaturated8(uint8 x, uint8 y) { return x / y; }
Yes, that’s right – it’s just plain old division. Unsigned division can overflow only if y is 0, which is undefined behavior anyway.
And, finally, the multiplication:
inline uint8 MulSaturated8(uint8 x, uint8 y) { uint16 product = uint16(x) * uint16(y); uint8 hi = product >> 8; uint8 lo = product & 0xFF; return sex(hi | -hi) | lo; }
We’re calculating the product of the operands, getting 16 bit result. If it’s higher part is not 0, that means operation overflowed. We’re checking for this 0 using our old & trustworthy sex function, taking advantage of the fact that x | -x is always a negative number, unless x = 0. So we end up with 0 | lo or 0xFF..FF | lo. They’re the same as lo and 255, respectively, after being simplified and truncated.
Maybe there exists even better way to implement saturated arithmetic while retaining branchlessness (does this word exist, actually? :>). I don’t know; if you know, please tell me.

In Z8/eZ8 assembly, an 8 bit saturated increment is two instructions:
ADD dst, #1 ; dst <- dst + 1, update C
SBC dst, #0 ; dst <- dst – 0 – C
An 8 bit saturated decrement:
SUB dst, #1 ; dst <- dst – 1, C is set if there was a borrow
ADC dst, #0 ; dst <- dst + 0 + C
Longer precision works similarly:
ADD dst+n-0, #1
ADC dst+n-1, #0 ; repeat as needed
SBC dst+n-0, #0
SBC dst+n-1, #0 ; repeat as needed
Etc.
This will work on any architecture where SBC and SUB works that way.
Kuba Ober powiedział Grudzień 6, 2011 @ 16:00