How to avoid overflow in expr. A * B - C * D

Question

I need to compute an expression which looks like: A*B - C*D, where their types are: signed long long int A, B, C, D; Each number can be really big (not overflowing its type). While A*B could cause overflow, at same time expression A*B - C*D can be really small. How can I compute it correctly?

For example: MAX * MAX - (MAX - 1) * (MAX + 1) == 1, where MAX = LLONG_MAX - n and n - some natural number.

@Cthulhu, great question. He could try to make an equivalent function using smaller number by dividing all of them by 10 or something, then multiplying the result. — Chris, Nov 05 '12 at 17:23
Actually accuracy is important, as I use it in further calcuations; if result is greater then LLONG_MAX then I need to exit the function. — NGix, Nov 05 '12 at 17:38
@lr0nm, just check for overflow after the operation and exit if the overflow flag is set then. My C is rusty so I do not remember how to do this. — Chris, Nov 05 '12 at 17:41
Vars A,B,C,D are signed. This implies `A - C` could overflow. It that an issue to consider or do you know that this is not going to happen with your data? — William Morris, Nov 05 '12 at 17:47
@Chris: There's no _portable_ way to check for signed overflow. — Mooing Duck, Nov 05 '12 at 17:59
@Mooning Duck, so my guess would be you are referring to using inline assembly to grab the carry flag? — Chris, Nov 05 '12 at 18:12
@MooingDuck but you can check beforehand if the operation will overflow http://stackoverflow.com/a/3224630/158285 — bradgonesurfing, Nov 05 '12 at 18:24
@Chris: No, I'm saying that there's no portable way to check if a signed overflow has happened. (Brad is correct that you can portably detect that it _will_ happen). Using inline assembly is one of many non-portable ways to check. — Mooing Duck, Nov 05 '12 at 18:31
@Ir0nm how do you wish to handle overflow? Does that matter to you? Or are you guaranteeing that no overflow will occur with your given inputs? — John, Nov 05 '12 at 20:00
@John I need that if result overflows then post warning or exit function, if not then just output result as in example. Given inputs are recursively increasing, mainly it can depend on result of this expression — NGix, Nov 05 '12 at 21:15
if you're using gcc, Clang or any other compilers that have0 [`__int128`](https://gcc.gnu.org/onlinedocs/gcc/_005f_005fint128.html), simply use `(__int128)A*B - (__int128)C*D` — phuclv, Aug 06 '15 at 12:33

Anirudh Ramanathan · Accepted Answer · 2012-11-06T09:10:11.407

120

This seems too trivial I guess. But A*B is the one that could overflow.

You could do the following, without losing precision

A*B - C*D = A(D+E) - (A+F)D
          = AD + AE - AD - DF
          = AE - DF
             ^smaller quantities E & F

E = B - D (hence, far smaller than B)
F = C - A (hence, far smaller than C)

This decomposition can be done further.
As @Gian pointed out, care might need to be taken during the subtraction operation if the type is unsigned long long.

For example, with the case you have in the question, it takes just one iteration,

 MAX * MAX - (MAX - 1) * (MAX + 1)
  A     B       C           D

E = B - D = -1
F = C - A = -1

AE - DF = {MAX * -1} - {(MAX + 1) * -1} = -MAX + MAX + 1 = 1

edited Nov 06 '12 at 09:10

answered Nov 05 '12 at 17:30

Anirudh Ramanathan

46,179
22
132
191

What about `C*D`? Seems like an overflow could happen there too. – Caleb Nov 05 '12 at 17:33
You mean `D*E`? That is the difference, and is smaller than `C*D`. – Anirudh Ramanathan Nov 05 '12 at 17:34
4

@Caleb, just apply the same algorithm to `C*D` – Chris Nov 05 '12 at 17:35
2

I think you should explain what E represents. – Caleb Nov 05 '12 at 17:37
7

Both long long and double are 64 bits. Since double has to allocate some bits for the exponent, it has a _smaller_ range of possible values without loss of precision. – Jim Garrison Nov 05 '12 at 19:03
+1 for the first suggestion, it's what I thought of upon seeing the question. You (the question asker) may want to add a comment in the code if you use it, though. – Izkata Nov 05 '12 at 19:05
@JimGarrison I see that now. Don't know what I was thinking! – Anirudh Ramanathan Nov 05 '12 at 19:10
+1 Looks like some kind of Euclidean algorithm... maybe can be done with cycle – NGix Nov 05 '12 at 19:42
3

@Cthulhu - seems to me like this would only work if all the number are very large...for example, you would still have overflow with {A,B,C,D} = {MAX,MAX,MAX,2}. The OP says "Each number can be really big", but it isn't clear from the problem statement that each number _must_ be really big. – Kevin K Nov 05 '12 at 21:28
1

This solution looks like it relies on signed values? Didn't the OP specify 'unsigned'? It seems like you could get a demonstrably wrong answer if you ended up treated unsigned values as signed ones. – Gian Nov 05 '12 at 21:32
@Gian It is just the concept illustrated here. The OP could put in checks to ensure the value after subtraction is positive. – Anirudh Ramanathan Nov 05 '12 at 21:44
1

Fair enough, although I think it's worth pointing out for completeness' sake :) – Gian Nov 05 '12 at 21:46
1

@kevinK In case one of them is smaller, this decomposition will have to be carried out multiple times I guess. – Anirudh Ramanathan Nov 05 '12 at 21:56
4

What if any of `A,B,C,D` are negative though? Won't `E` or `F` be even bigger then? – Supr Nov 06 '12 at 14:36
@JimGarrison: `Both long long and double are 64 bits.` Sometimes. Maybe. – Lightness Races in Orbit Nov 07 '12 at 16:24

Ofir · Answer 2 · 2012-11-05T19:38:25.030

The simplest and most general solution is to use a representation that can't overflow, either by using a long integer library (e.g. http://gmplib.org/) or representing using a struct or array and implementing a kind of long multiplication (i.e. separating each number to two 32bit halves and performing the multiplication as below:

(R1 + R2 * 2^32 + R3 * 2^64 + R4 * 2^96) = R = A*B = (A1 + A2 * 2^32) * (B1 + B2 * 2^32) 
R1 = (A1*B1) % 2^32
R2 = ((A1*B1) / 2^32 + (A1*B2) % 2^32 + (A2*B1) % 2^32) % 2^32
R3 = (((A1*B1) / 2^32 + (A1*B2) % 2^32 + (A2*B1) % 2^32) / 2^32 + (A1*B2) / 2^32 + (A2*B1) / 2^32 + (A2*B2) % 2^32) %2^32
R4 = ((((A1*B1) / 2^32 + (A1*B2) % 2^32 + (A2*B1) % 2^32) / 2^32 + (A1*B2) / 2^32 + (A2*B1) / 2^32 + (A2*B2) % 2^32) / 2^32) + (A2*B2) / 2^32

Assuming the end result fits in 64 bits you actually don't really need most bits of R3 and none of R4

The calculation above is really not as complicated as it looks, it really is simple long multiplication in base 2^32, and the code in C should look simpler. Also, it will be a good idea to create generic functions to do this work in your program. — Ofir, Nov 05 '12 at 17:44

score 46 · Answer 3 · edited Nov 05 '12 at 18:43

46

Note that this is not standard since it relies on wrap-around signed-overflow. (GCC has compiler flags which enable this.)

But if you just do all the calculations in long long, the result of applying the formula directly:
(A * B - C * D) will be accurate as long as the correct result fits into a long long.

Here's a work-around that only relies on implementation-defined behavior of casting unsigned integer to signed integer. But this can be expected to work on almost every system today.

(long long)((unsigned long long)A * B - (unsigned long long)C * D)

This casts the inputs to unsigned long long where the overflow behavior is guaranteed to be wrap-around by the standard. Casting back to a signed integer at the end is the implementation-defined part, but will work on nearly all environments today.

If you need more pedantic solution, I think you have to use "long arithmetic"

edited Nov 05 '12 at 18:43

Mysticial

464,885
45
335
332

answered Nov 05 '12 at 17:57

RiaD

46,822
11
79
123

+1 You're the only one to notice this. The only tricky part is setting the compiler to do wrap-around overflow and checking if the correct result actually does fit into a `long long`. – Mysticial Nov 05 '12 at 18:27
2

Even the naive version without any tricks at all will do the right thing on *most* implementations; it's not guaranteed to by the standard, but you would have to find a 1's-complement machine or some other quite weird device to make it fail. – hobbs Nov 06 '12 at 02:23
1

I think this is an important answer. I agree it may not be correct programming to assume implementation specific behavior, but every engineer should understand modulo arithmetic and how to get the right compiler flags to ensure consistent behavior if performance is essential. DSP engineers rely on this behavior for fixed point filter implementations, for which the accepted answer will have unacceptable performance. – Peter M May 16 '13 at 16:39

paquetp · Answer 4 · 2012-11-05T18:25:23.673

18

This should work ( I think ):

signed long long int a = 0x7ffffffffffffffd;
signed long long int b = 0x7ffffffffffffffd;
signed long long int c = 0x7ffffffffffffffc;
signed long long int d = 0x7ffffffffffffffe;
signed long long int bd = b / d;
signed long long int bdmod = b % d;
signed long long int ca = c / a;
signed long long int camod = c % a;
signed long long int x = (bd - ca) * a * d - (camod * d - bdmod * a);

Here's my derivation:

x = a * b - c * d
x / (a * d) = (a * b - c * d) / (a * d)
x / (a * d) = b / d - c / a

now, the integer/mod stuff:
x / (a * d) = (b / d + ( b % d ) / d) - (c / a + ( c % a ) / a )
x / (a * d) = (b / d - c / a) - ( ( c % a ) / a - ( b % d ) / d)
x = (b / d - c / a) * a * d - ( ( c % a ) * d - ( b % d ) * a)

edited Nov 05 '12 at 18:25

answered Nov 05 '12 at 17:46

paquetp

1,639
11
19

Elegant but possibly (bd - ca) * a * d could still overflow even if the whole equation is in range – bradgonesurfing Nov 05 '12 at 17:58
1

Thanks @bradgonesurfing - could you provide such an input? I've updated my answer, executed it and it works (bd and ca are 0)... – paquetp Nov 05 '12 at 18:11
1

Hmmm. Now I think about it maybe not. Degenerate case with d = 1 and a = 1 and b = maxint and c = maxint it still works. Cool :) – bradgonesurfing Nov 05 '12 at 18:16
1

@paquetp: a=1, b=0x7fffffffffffffff, c=-0x7fffffffffffffff, d=1 (note c is negative). Clever though, I'm certain your code handles all positive numbers correctly. – Mooing Duck Nov 05 '12 at 18:16
3

@MooingDuck but the final answer for your set is overflowed as well so it is not a valid setup. It only works if each side is of the same sign so the resulting subtraction is within range. – bradgonesurfing Nov 05 '12 at 18:20
I don't have the patience to verify that this is correct, but it seems good, and it deserves an upvote for the sheer chutzpah! The only flaw is that it doesn't attempt to detect overflow in any way. – user4815162342 Nov 05 '12 at 19:52
1

There is something strange with StackOverflow when this answer which is the simplest and the best has got such a low score compared to the top scored answer. – bradgonesurfing Nov 06 '12 at 15:22
1

This answer uses 64 bits division, which isn't very efficient. In comparison, Ofir uses only additions and multiplications. I expect that to be an order of magnitude more efficient on a wide range of hardware, from a lowly ARM9 to a top-end x86 server CPU. – MSalters Nov 11 '12 at 01:30

score 11 · Answer 5 · edited Nov 06 '12 at 06:50

11

E = max(A,B,C,D)
A1 = A -E;
B1 = B -E;
C1 = C -E;
D1 = D -E;

then

A*B - C*D = (A1+E)*(B1+E)-(C1+E)(D1+E) = (A1+B1-C1-D1)*E + A1*B1 -C1*D1

edited Nov 06 '12 at 06:50

Anirudh Ramanathan

46,179
22
132
191

answered Nov 06 '12 at 06:15

landry

561
1
7
19

score 9 · Answer 6 · answered Nov 05 '12 at 17:24

9

You could consider computing a greatest common factor for all your values, and then dividing them by that factor before doing your arithmetic operations, then multiplying again. This assumes that such a factor exists, however (for example, if A, B, C and D happen to be relatively prime, they won't have a common factor).

Similarly, you could consider working on log-scales, but this is going to be a little scary, subject to numerical precision.

answered Nov 05 '12 at 17:24

Gian

13,735
44
51

1

Logarithming seems good if `long double` is available. In that case, an acceptable level of precision can be achieved (and the result cen be rounded). – Nov 05 '12 at 18:06

score 9 · Answer 7 · answered Nov 05 '12 at 20:43

9

If the result fits in a long long int then the expression A*B-C*D is okay as it performs the arithmetic mod 2^64, and will give the correct result. The problem is to know if the result fits in a long long int. To detect this, you can use the following trick using doubles:

if( abs( (double)A*B - (double)C*D ) > MAX_LLONG ) 
    Overflow
else 
    return A*B-C*D;

The problem with this approach is that you are limited by the precision of the mantissa of the doubles (54bits?) so you need to limit the products A*B and C*D to 63+54 bits (or probably a little less).

answered Nov 05 '12 at 20:43

Esteban Crespi

315
1
8

This is the most practical example. Clear and gives the right answer (or throws an Exception when the inputs are bad). – Mark Lakata Nov 05 '12 at 23:32
1

Nice and elegant! You did not fall for the trap the others fell for. Just one more thing: I would bet there are some examples where the double calculation is below MAX_LLONG just due to rounding errors. My mathematical instinct tells me you should calculate the difference of the double and the long result instead, and compare that to MAX_LLONG/2 or something. This difference are the rounding errors of the double calculation and plus the overflow and should normally be relatively low, but in the case I mentioned it will be large. But right now I am too lazy to find out for sure. :-) – Dr. Hans-Peter Störr Nov 12 '12 at 08:18

score 7 · Answer 8 · edited Nov 07 '12 at 04:52

7

You could write each number in an array, each element being a digit and do the calculations as polynomials. Take the resulting polynomial, which is an array, and compute the result by multiplying each element of the array with 10 to the power of the position in the array (the first position being the largest and the last being zero).

The number 123 can be expressed as:

123 = 100 * 1 + 10 * 2 + 3

for which you just create an array [1 2 3].

You do this for all numbers A, B, C and D, and then you multiply them as polynomials. Once you have the resulting polynomial, you just reconstruct the number from it.

edited Nov 07 '12 at 04:52

Pierre Arnaud

10,212
11
77
108

answered Nov 05 '12 at 17:26

Mihai

2,740
31
45

2

don't know what that is but I'll have to find. put :) . this is a solution of the top of my head while I'm shopping with my girlfriend :) – Mihai Nov 05 '12 at 17:40
you're implementing bignums in a base10 array. GMP is a quality bignum library that uses base 4294967296. MUCH faster. No downvote though, because the answer is correct, and useful. – Mooing Duck Nov 05 '12 at 17:43
thanks :) . it`s usefully to know that this one way of doing it but there are better ways so don't do it like this. at least not in this situation :) – Mihai Nov 05 '12 at 17:48
anyway ... using this solution you could computer much bigger number than any primitive type could bold (like 100 digit number s) and keep the result as an array. this deserves an up votez :p – Mihai Nov 05 '12 at 18:09
I'm not certain it gets an upvote, since this method (though effective and relatively easy to understand) is memory hungry and slow. – Mooing Duck Nov 05 '12 at 18:23
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/19108/discussion-between-mihai-and-mooing-duck) – Mihai Nov 05 '12 at 18:48

score 6 · Answer 9 · answered Nov 05 '12 at 23:42

While a signed long long int will not hold A*B, two of them will. So A*B could be decomposed to tree terms of different exponent, any of them fitting one signed long long int.

A1=A>>32;
A0=A & 0xffffffff;
B1=B>>32;
B0=B & 0xffffffff;

AB_0=A0*B0;
AB_1=A0*B1+A1*B0;
AB_2=A1*B1;

Same for C*D.

Folowing the straight way, the subraction could be done to every pair of AB_i and CD_i likewise, using an additional carry bit (accurately a 1-bit integer) for each. So if we say E=A*B-C*D you get something like:

E_00=AB_0-CD_0 
E_01=(AB_0 > CD_0) == (AB_0 - CD_0 < 0) ? 0 : 1  // carry bit if overflow
E_10=AB_1-CD_1 
...

We continue by transferring the upper-half of E_10 to E_20 (shift by 32 and add, then erase upper half of E_10).

Now you can get rid of the carry bit E_11 by adding it with the right sign (obtained from the non-carry part) to E_20. If this triggers an overflow, the result wouldn't fit either.

E_10 now has enough 'space' to take the upper half from E_00 (shift, add, erase) and the carry bit E_01.

E_10 may be larger now again, so we repeat the transfer to E_20.

At this point, E_20 must become zero, otherwise the result won't fit. The upper half of E_10 is empty as result of the transfer too.

The final step is to transfer the lower half of E_20 into E_10 again.

If the expectation that E=A*B+C*D would fit the signed long long int holds, we now have

E_20=0
E_10=0
E_00=E

This is actually the simplified formula that one would get if using Ofir's multiplication formula and removing every useless temporary result. — dronus, Nov 05 '12 at 23:55

score 3 · Answer 10 · answered Nov 05 '12 at 18:41

If you know the final result is representable in your integer type, you can perform this calculation quickly using the code below. Because the C standard specifies that unsigned arithmetic is modulo arithmetic and does not overflow, you can use an unsigned type to perform the calculation.

The following code assumes there is an unsigned type of the same width and that the signed type uses all bit patterns to represent values (no trap representations, the minimum of the signed type is the negative of half the modulus of the unsigned type). If this does not hold in a C implementation, simple adjustments can be made to the ConvertToSigned routine for that.

The following uses signed char and unsigned char to demonstrate the code. For your implementation, change the definition of Signed to typedef signed long long int Signed; and the definition of Unsigned to typedef unsigned long long int Unsigned;.

#include <limits.h>
#include <stdio.h>
#include <stdlib.h>


//  Define the signed and unsigned types we wish to use.
typedef signed char   Signed;
typedef unsigned char Unsigned;

//  uHalfModulus is half the modulus of the unsigned type.
static const Unsigned uHalfModulus = UCHAR_MAX/2+1;

//  sHalfModulus is the negation of half the modulus of the unsigned type.
static const Signed   sHalfModulus = -1 - (Signed) (UCHAR_MAX/2);


/*  Map the unsigned value to the signed value that is the same modulo the
    modulus of the unsigned type.  If the input x maps to a positive value, we
    simply return x.  If it maps to a negative value, we return x minus the
    modulus of the unsigned type.

    In most C implementations, this routine could simply be "return x;".
    However, this version uses several steps to convert x to a negative value
    so that overflow is avoided.
*/
static Signed ConvertToSigned(Unsigned x)
{
    /*  If x is representable in the signed type, return it.  (In some
        implementations, 
    */
    if (x < uHalfModulus)
        return x;

    /*  Otherwise, return x minus the modulus of the unsigned type, taking
        care not to overflow the signed type.
    */
    return (Signed) (x - uHalfModulus) - sHalfModulus;
}


/*  Calculate A*B - C*D given that the result is representable as a Signed
    value.
*/
static signed char Calculate(Signed A, Signed B, Signed C, Signed D)
{
    /*  Map signed values to unsigned values.  Positive values are unaltered.
        Negative values have the modulus of the unsigned type added.  Because
        we do modulo arithmetic below, adding the modulus does not change the
        final result.
    */
    Unsigned a = A;
    Unsigned b = B;
    Unsigned c = C;
    Unsigned d = D;

    //  Calculate with modulo arithmetic.
    Unsigned t = a*b - c*d;

    //  Map the unsigned value to the corresponding signed value.
    return ConvertToSigned(t);
}


int main()
{
    //  Test every combination of inputs for signed char.
    for (int A = SCHAR_MIN; A <= SCHAR_MAX; ++A)
    for (int B = SCHAR_MIN; B <= SCHAR_MAX; ++B)
    for (int C = SCHAR_MIN; C <= SCHAR_MAX; ++C)
    for (int D = SCHAR_MIN; D <= SCHAR_MAX; ++D)
    {
        //  Use int to calculate the expected result.
        int t0 = A*B - C*D;

        //  If the result is not representable in signed char, skip this case.
        if (t0 < SCHAR_MIN || SCHAR_MAX < t0)
            continue;

        //  Calculate the result with the sample code.
        int t1 = Calculate(A, B, C, D);

        //  Test the result for errors.
        if (t0 != t1)
        {
            printf("%d*%d - %d*%d = %d, but %d was returned.\n",
                A, B, C, D, t0, t1);
            exit(EXIT_FAILURE);
        }
    }
    return 0;
}

score 2 · Answer 11 · answered Nov 05 '12 at 17:51

2

You could try breaking the equation into smaller components which don't overflow.

AB - CD
= [ A(B - N) - C( D - M )] + [AN - CM]

= ( AK - CJ ) + ( AN - CM)

    where K = B - N
          J = D - M

If the components still overflow you could break them into smaller components recursively and then recombine.

answered Nov 05 '12 at 17:51

bradgonesurfing

30,949
17
114
217

This may or may not be correct, but is definitely confusing. You define `K` and `J`, why not `N` and `M`. Also, I think you're breaking the equation into _bigger_ pieces. Since your step 3 is the same as the OP's question, except more complicated `(AK-CJ)` -> `(AB-CD)` – Mooing Duck Nov 05 '12 at 18:24
N is not simplified from anything. It's just a number subtracted from A to make it smaller. Actually it is a similar but inferior solution to paquetp. Here I'm using subtraction instead of integer division to make it smaller. – bradgonesurfing Nov 05 '12 at 18:27

OldCurmudgeon · Answer 12 · 2012-11-21T01:16:57.763

I may not have covered all of the edge cases, nor have I rigorously tested this but this implements a technique I remember using in the 80s when trying to do 32-bit integer maths on a 16-bit cpu. Essentially you split the 32 bits into two 16-bit units and work with them separately.

public class DoubleMaths {
  private static class SplitLong {
    // High half (or integral part).
    private final long h;
    // Low half.
    private final long l;
    // Split.
    private static final int SPLIT = (Long.SIZE / 2);

    // Make from an existing pair.
    private SplitLong(long h, long l) {
      // Let l overflow into h.
      this.h = h + (l >> SPLIT);
      this.l = l % (1l << SPLIT);
    }

    public SplitLong(long v) {
      h = v >> SPLIT;
      l = v % (1l << SPLIT);
    }

    public long longValue() {
      return (h << SPLIT) + l;
    }

    public SplitLong add ( SplitLong b ) {
      // TODO: Check for overflow.
      return new SplitLong ( longValue() + b.longValue() );
    }

    public SplitLong sub ( SplitLong b ) {
      // TODO: Check for overflow.
      return new SplitLong ( longValue() - b.longValue() );
    }

    public SplitLong mul ( SplitLong b ) {
      /*
       * e.g. 10 * 15 = 150
       * 
       * Divide 10 and 15 by 5
       * 
       * 2 * 3 = 5
       * 
       * Must therefore multiply up by 5 * 5 = 25
       * 
       * 5 * 25 = 150
       */
      long lbl = l * b.l;
      long hbh = h * b.h;
      long lbh = l * b.h;
      long hbl = h * b.l;
      return new SplitLong ( lbh + hbl, lbl + hbh );
    }

    @Override
    public String toString () {
      return Long.toHexString(h)+"|"+Long.toHexString(l);
    }
  }

  // I'll use long and int but this can apply just as easily to long-long and long.
  // The aim is to calculate A*B - C*D without overflow.
  static final long A = Long.MAX_VALUE;
  static final long B = Long.MAX_VALUE - 1;
  static final long C = Long.MAX_VALUE;
  static final long D = Long.MAX_VALUE - 2;

  public static void main(String[] args) throws InterruptedException {
    // First do it with BigIntegers to get what the result should be.
    BigInteger a = BigInteger.valueOf(A);
    BigInteger b = BigInteger.valueOf(B);
    BigInteger c = BigInteger.valueOf(C);
    BigInteger d = BigInteger.valueOf(D);
    BigInteger answer = a.multiply(b).subtract(c.multiply(d));
    System.out.println("A*B - C*D = "+answer+" = "+answer.toString(16));

    // Make one and test its integrity.
    SplitLong sla = new SplitLong(A);
    System.out.println("A="+Long.toHexString(A)+" ("+sla.toString()+") = "+Long.toHexString(sla.longValue()));

    // Start small.
    SplitLong sl10 = new SplitLong(10);
    SplitLong sl15 = new SplitLong(15);
    SplitLong sl150 = sl10.mul(sl15);
    System.out.println("10="+sl10.longValue()+"("+sl10.toString()+") * 15="+sl15.longValue()+"("+sl15.toString()+") = "+sl150.longValue() + " ("+sl150.toString()+")");

    // The real thing.
    SplitLong slb = new SplitLong(B);
    SplitLong slc = new SplitLong(C);
    SplitLong sld = new SplitLong(D);
    System.out.println("B="+Long.toHexString(B)+" ("+slb.toString()+") = "+Long.toHexString(slb.longValue()));
    System.out.println("C="+Long.toHexString(C)+" ("+slc.toString()+") = "+Long.toHexString(slc.longValue()));
    System.out.println("D="+Long.toHexString(D)+" ("+sld.toString()+") = "+Long.toHexString(sld.longValue()));
    SplitLong sanswer = sla.mul(slb).sub(slc.mul(sld));
    System.out.println("A*B - C*D = "+sanswer+" = "+sanswer.longValue());

  }

}

Prints:

A*B - C*D = 9223372036854775807 = 7fffffffffffffff
A=7fffffffffffffff (7fffffff|ffffffff) = 7fffffffffffffff
10=10(0|a) * 15=15(0|f) = 150 (0|96)
B=7ffffffffffffffe (7fffffff|fffffffe) = 7ffffffffffffffe
C=7fffffffffffffff (7fffffff|ffffffff) = 7fffffffffffffff
D=7ffffffffffffffd (7fffffff|fffffffd) = 7ffffffffffffffd
A*B - C*D = 7fffffff|ffffffff = 9223372036854775807

which looks to me like it's working.

I bet I've missed some of the subtleties such as watching for sign overflow etc. but I think the essence is there.

I think this is an implementation of what @Ofir suggested. – OldCurmudgeon Nov 07 '12 at 16:15 — OldCurmudgeon, Nov 07 '12 at 16:15

score 2 · Answer 13 · answered Jun 28 '13 at 19:14

2

For the sake of completeness, since no one mentioned it, some compilers (e.g. GCC) actually provide you with a 128 bit integer nowadays.

Thus an easy solution could be:

(long long)((__int128)A * B - (__int128)C * D)

answered Jun 28 '13 at 19:14

i Code 4 Food

2,144
1
15
21

SomeWittyUsername · Answer 14 · 2012-11-05T17:35:11.203

1

AB-CD = (AB-CD) * AC / AC = (B/C-D/A)*A*C. Neither B/C nor D/A can overflow, so calculate (B/C-D/A) first. Since the final result won't overflow according to your definition, you can safely perform the remaining multiplications and calculate (B/C-D/A)*A*C which is the required result.

Note, if your input can be extremely small as well, the B/C or D/A can overflow. If it's possible, more complex manipulations might be required according to input inspection.

edited Nov 05 '12 at 17:35

answered Nov 05 '12 at 17:28

SomeWittyUsername

18,025
3
42
85

2

That won't work as integer division loses information (the fraction of the result) – Ofir Nov 05 '12 at 17:33
@Ofir that's correct, however you can't eat the cake and leave it untouched. You have to pay either by precision or by using additional resources (like you suggested in your answer). My answer is of mathematical nature while yours' is computer-oriented. Each can be correct depending on the circumstances. – SomeWittyUsername Nov 05 '12 at 17:59
2

You are correct - I should have phrased it as - will not give an exact result rather than won't work, as the math is correct. However, note in the cases that likely interest the question submitter (e.g. in the example in the question), the error will probably be surprisingly large - much larger than can be acceptable for any practical application. In any case - it was an insightful answer and I shouldn't have used that language. – Ofir Nov 05 '12 at 18:09
@Ofir I don't think your language was inappropriate. The OP clearly requested a "correct" calculation, not one that would lose precision for the sake of being performed under extreme resource constraints. – user4815162342 Nov 05 '12 at 19:50

Amir Saniyan · Answer 15 · 2012-11-07T12:30:39.147

1

Choose K = a big number (eg. K = A - sqrt(A))

A*B - C*D = (A-K)*(B-K) - (C-K)*(D-K) + K*(A-C+B-D); // Avoid overflow.

Why?

(A-K)*(B-K) = A*B - K*(A+B) + K^2
(C-K)*(D-K) = C*D - K*(C+D) + K^2

=>
(A-K)*(B-K) - (C-K)*(D-K) = A*B - K*(A+B) + K^2 - {C*D - K*(C+D) + K^2}
(A-K)*(B-K) - (C-K)*(D-K) = A*B - C*D - K*(A+B) + K*(C+D) + K^2 - K^2
(A-K)*(B-K) - (C-K)*(D-K) = A*B - C*D - K*(A+B-C-D)

=>
A*B - C*D = (A-K)*(B-K) - (C-K)*(D-K) + K*(A+B-C-D)

=>
A*B - C*D = (A-K)*(B-K) - (C-K)*(D-K) + K*(A-C+B-D)

Note that Because A, B, C and D are big numbers, thus A-C and B-D are small numbers.

edited Nov 07 '12 at 12:30

answered Nov 07 '12 at 10:52

Amir Saniyan

13,014
20
92
137

How do you choose **K** in practical? Besides, **K*(A-C+B-D)** might still overflow. – ylc Nov 07 '12 at 12:01
@ylc: Choose K = sqrt(A), Not that `A-C+B-D` is a small number. Because A, B, C and D are big numbers, thus A-C is small number. – Amir Saniyan Nov 07 '12 at 12:07
If you choose **K = sqrt(A)**, then **(A-K)*(B-K)** might overflow again. – ylc Nov 07 '12 at 12:15
@ylc: OK! I change it to `A - sqrt(A)` :) – Amir Saniyan Nov 07 '12 at 12:31
Then **K*(A-C+B-D)** may overflow. – ylc Nov 07 '12 at 12:34

How to avoid overflow in expr. A * B - C * D

15 Answers15

Linked