While a signed long long int will not hold A*B, two of them will. So A*B could be decomposed to tree terms of different exponent, any of them fitting one signed long long int.
A1=A>>32;
A0=A & 0xffffffff;
B1=B>>32;
B0=B & 0xffffffff;
AB_0=A0*B0;
AB_1=A0*B1+A1*B0;
AB_2=A1*B1;
Same for C*D.
Folowing the straight way, the subraction could be done to every pair of AB_i and CD_i likewise, using an additional carry bit (accurately a 1-bit integer) for each. So if we say E=A*B-C*D you get something like:
E_00=AB_0-CD_0
E_01=(AB_0 > CD_0) == (AB_0 - CD_0 < 0) ? 0 : 1 // carry bit if overflow
E_10=AB_1-CD_1
...
We continue by transferring the upper-half of E_10 to E_20 (shift by 32 and add, then erase upper half of E_10).
Now you can get rid of the carry bit E_11 by adding it with the right sign (obtained from the non-carry part) to E_20. If this triggers an overflow, the result wouldn't fit either.
E_10 now has enough 'space' to take the upper half from E_00 (shift, add, erase) and the carry bit E_01.
E_10 may be larger now again, so we repeat the transfer to E_20.
At this point, E_20 must become zero, otherwise the result won't fit. The upper half of E_10 is empty as result of the transfer too.
The final step is to transfer the lower half of E_20 into E_10 again.
If the expectation that E=A*B+C*D would fit the signed long long int holds, we now have
E_20=0
E_10=0
E_00=E