IEEE floating point operations are deterministic, but see How can floating point calculations be made deterministic? for one way that an overall floating point computation can be non-deterministic:
... parallel computations are non-deterministic in terms of the order in which floating-point computations are performed, which can result in non-bit-exact results across runs.
Two-part question:
- How else can an overall floating point computation be non-deterministic, yielding results that are not exactly equal?
Consider a single-threaded Python program that calls NumPy, CVXOPT, and SciPy subroutines such as
scipy.optimize.fsolve(), which in turn call native libraries like MINPACK and GLPK and optimized linear algebra subroutines like BLAS, ATLAS, and MKL. “If your numpy/scipy is compiled using one of these, then dot() will be computed in parallel (if this is faster) without you doing anything.”Do these native libraries ever parallelize in a way that introduces non-deterministic results?
Assumptions:
- The same software, with the same inputs, on the same hardware. The output of multiple runs should be equal.
- If that works, it's highly desirable to test that the output after doing a code refactoring is equal. (Yes, some changes in order of operations can make some of the output not-equal.)
- All random numbers in the program are psuedo-random numbers used in a consistent way from the same seeds across all runs.
No uninitialized values. Python is generally safe in that way but
numpy.empty()returns a new array without initializing entries. And it's not clear that it's much faster in practice. So beware!@PaulPanzer's test shows that
numpy.empty()does return an uninitialized array and it can easily and quickly recycle a recent array:import numpy as np np.arange(100); np.empty(100, int); np.empty(100, int) np.arange(100, 200.0); np.empty(100, float); np.empty(100, float)It's tricky to get useful timing measurements for these routines! In a
timeitloop,numpy.empty()can just keep reallocating the same one or two memory nodes. The time is independent of the array size. To prevent recycling:from timeit import timeit timeit('l.append(numpy.empty(100000))', 'import numpy; l = []') timeit('l.append(numpy.zeros(100000))', 'import numpy; l = []')but reducing that array size to
numpy.zeros(10000)takes 15x as long; reducing it tonumpy.zeros(1000)takes 1.3x as long (on my MBP). Puzzling.
See also: Hash values are salted in Python 3 and each dict preserves insertion order. That could vary the order of operations from run to run. [I'm wrangling with this problem in Python 2.7.15.]