1*cda5da8dSAndroid Build Coastguard Worker""" 2*cda5da8dSAndroid Build Coastguard WorkerBasic statistics module. 3*cda5da8dSAndroid Build Coastguard Worker 4*cda5da8dSAndroid Build Coastguard WorkerThis module provides functions for calculating statistics of data, including 5*cda5da8dSAndroid Build Coastguard Workeraverages, variance, and standard deviation. 6*cda5da8dSAndroid Build Coastguard Worker 7*cda5da8dSAndroid Build Coastguard WorkerCalculating averages 8*cda5da8dSAndroid Build Coastguard Worker-------------------- 9*cda5da8dSAndroid Build Coastguard Worker 10*cda5da8dSAndroid Build Coastguard Worker================== ================================================== 11*cda5da8dSAndroid Build Coastguard WorkerFunction Description 12*cda5da8dSAndroid Build Coastguard Worker================== ================================================== 13*cda5da8dSAndroid Build Coastguard Workermean Arithmetic mean (average) of data. 14*cda5da8dSAndroid Build Coastguard Workerfmean Fast, floating point arithmetic mean. 15*cda5da8dSAndroid Build Coastguard Workergeometric_mean Geometric mean of data. 16*cda5da8dSAndroid Build Coastguard Workerharmonic_mean Harmonic mean of data. 17*cda5da8dSAndroid Build Coastguard Workermedian Median (middle value) of data. 18*cda5da8dSAndroid Build Coastguard Workermedian_low Low median of data. 19*cda5da8dSAndroid Build Coastguard Workermedian_high High median of data. 20*cda5da8dSAndroid Build Coastguard Workermedian_grouped Median, or 50th percentile, of grouped data. 21*cda5da8dSAndroid Build Coastguard Workermode Mode (most common value) of data. 22*cda5da8dSAndroid Build Coastguard Workermultimode List of modes (most common values of data). 23*cda5da8dSAndroid Build Coastguard Workerquantiles Divide data into intervals with equal probability. 24*cda5da8dSAndroid Build Coastguard Worker================== ================================================== 25*cda5da8dSAndroid Build Coastguard Worker 26*cda5da8dSAndroid Build Coastguard WorkerCalculate the arithmetic mean ("the average") of data: 27*cda5da8dSAndroid Build Coastguard Worker 28*cda5da8dSAndroid Build Coastguard Worker>>> mean([-1.0, 2.5, 3.25, 5.75]) 29*cda5da8dSAndroid Build Coastguard Worker2.625 30*cda5da8dSAndroid Build Coastguard Worker 31*cda5da8dSAndroid Build Coastguard Worker 32*cda5da8dSAndroid Build Coastguard WorkerCalculate the standard median of discrete data: 33*cda5da8dSAndroid Build Coastguard Worker 34*cda5da8dSAndroid Build Coastguard Worker>>> median([2, 3, 4, 5]) 35*cda5da8dSAndroid Build Coastguard Worker3.5 36*cda5da8dSAndroid Build Coastguard Worker 37*cda5da8dSAndroid Build Coastguard Worker 38*cda5da8dSAndroid Build Coastguard WorkerCalculate the median, or 50th percentile, of data grouped into class intervals 39*cda5da8dSAndroid Build Coastguard Workercentred on the data values provided. E.g. if your data points are rounded to 40*cda5da8dSAndroid Build Coastguard Workerthe nearest whole number: 41*cda5da8dSAndroid Build Coastguard Worker 42*cda5da8dSAndroid Build Coastguard Worker>>> median_grouped([2, 2, 3, 3, 3, 4]) #doctest: +ELLIPSIS 43*cda5da8dSAndroid Build Coastguard Worker2.8333333333... 44*cda5da8dSAndroid Build Coastguard Worker 45*cda5da8dSAndroid Build Coastguard WorkerThis should be interpreted in this way: you have two data points in the class 46*cda5da8dSAndroid Build Coastguard Workerinterval 1.5-2.5, three data points in the class interval 2.5-3.5, and one in 47*cda5da8dSAndroid Build Coastguard Workerthe class interval 3.5-4.5. The median of these data points is 2.8333... 48*cda5da8dSAndroid Build Coastguard Worker 49*cda5da8dSAndroid Build Coastguard Worker 50*cda5da8dSAndroid Build Coastguard WorkerCalculating variability or spread 51*cda5da8dSAndroid Build Coastguard Worker--------------------------------- 52*cda5da8dSAndroid Build Coastguard Worker 53*cda5da8dSAndroid Build Coastguard Worker================== ============================================= 54*cda5da8dSAndroid Build Coastguard WorkerFunction Description 55*cda5da8dSAndroid Build Coastguard Worker================== ============================================= 56*cda5da8dSAndroid Build Coastguard Workerpvariance Population variance of data. 57*cda5da8dSAndroid Build Coastguard Workervariance Sample variance of data. 58*cda5da8dSAndroid Build Coastguard Workerpstdev Population standard deviation of data. 59*cda5da8dSAndroid Build Coastguard Workerstdev Sample standard deviation of data. 60*cda5da8dSAndroid Build Coastguard Worker================== ============================================= 61*cda5da8dSAndroid Build Coastguard Worker 62*cda5da8dSAndroid Build Coastguard WorkerCalculate the standard deviation of sample data: 63*cda5da8dSAndroid Build Coastguard Worker 64*cda5da8dSAndroid Build Coastguard Worker>>> stdev([2.5, 3.25, 5.5, 11.25, 11.75]) #doctest: +ELLIPSIS 65*cda5da8dSAndroid Build Coastguard Worker4.38961843444... 66*cda5da8dSAndroid Build Coastguard Worker 67*cda5da8dSAndroid Build Coastguard WorkerIf you have previously calculated the mean, you can pass it as the optional 68*cda5da8dSAndroid Build Coastguard Workersecond argument to the four "spread" functions to avoid recalculating it: 69*cda5da8dSAndroid Build Coastguard Worker 70*cda5da8dSAndroid Build Coastguard Worker>>> data = [1, 2, 2, 4, 4, 4, 5, 6] 71*cda5da8dSAndroid Build Coastguard Worker>>> mu = mean(data) 72*cda5da8dSAndroid Build Coastguard Worker>>> pvariance(data, mu) 73*cda5da8dSAndroid Build Coastguard Worker2.5 74*cda5da8dSAndroid Build Coastguard Worker 75*cda5da8dSAndroid Build Coastguard Worker 76*cda5da8dSAndroid Build Coastguard WorkerStatistics for relations between two inputs 77*cda5da8dSAndroid Build Coastguard Worker------------------------------------------- 78*cda5da8dSAndroid Build Coastguard Worker 79*cda5da8dSAndroid Build Coastguard Worker================== ==================================================== 80*cda5da8dSAndroid Build Coastguard WorkerFunction Description 81*cda5da8dSAndroid Build Coastguard Worker================== ==================================================== 82*cda5da8dSAndroid Build Coastguard Workercovariance Sample covariance for two variables. 83*cda5da8dSAndroid Build Coastguard Workercorrelation Pearson's correlation coefficient for two variables. 84*cda5da8dSAndroid Build Coastguard Workerlinear_regression Intercept and slope for simple linear regression. 85*cda5da8dSAndroid Build Coastguard Worker================== ==================================================== 86*cda5da8dSAndroid Build Coastguard Worker 87*cda5da8dSAndroid Build Coastguard WorkerCalculate covariance, Pearson's correlation, and simple linear regression 88*cda5da8dSAndroid Build Coastguard Workerfor two inputs: 89*cda5da8dSAndroid Build Coastguard Worker 90*cda5da8dSAndroid Build Coastguard Worker>>> x = [1, 2, 3, 4, 5, 6, 7, 8, 9] 91*cda5da8dSAndroid Build Coastguard Worker>>> y = [1, 2, 3, 1, 2, 3, 1, 2, 3] 92*cda5da8dSAndroid Build Coastguard Worker>>> covariance(x, y) 93*cda5da8dSAndroid Build Coastguard Worker0.75 94*cda5da8dSAndroid Build Coastguard Worker>>> correlation(x, y) #doctest: +ELLIPSIS 95*cda5da8dSAndroid Build Coastguard Worker0.31622776601... 96*cda5da8dSAndroid Build Coastguard Worker>>> linear_regression(x, y) #doctest: 97*cda5da8dSAndroid Build Coastguard WorkerLinearRegression(slope=0.1, intercept=1.5) 98*cda5da8dSAndroid Build Coastguard Worker 99*cda5da8dSAndroid Build Coastguard Worker 100*cda5da8dSAndroid Build Coastguard WorkerExceptions 101*cda5da8dSAndroid Build Coastguard Worker---------- 102*cda5da8dSAndroid Build Coastguard Worker 103*cda5da8dSAndroid Build Coastguard WorkerA single exception is defined: StatisticsError is a subclass of ValueError. 104*cda5da8dSAndroid Build Coastguard Worker 105*cda5da8dSAndroid Build Coastguard Worker""" 106*cda5da8dSAndroid Build Coastguard Worker 107*cda5da8dSAndroid Build Coastguard Worker__all__ = [ 108*cda5da8dSAndroid Build Coastguard Worker 'NormalDist', 109*cda5da8dSAndroid Build Coastguard Worker 'StatisticsError', 110*cda5da8dSAndroid Build Coastguard Worker 'correlation', 111*cda5da8dSAndroid Build Coastguard Worker 'covariance', 112*cda5da8dSAndroid Build Coastguard Worker 'fmean', 113*cda5da8dSAndroid Build Coastguard Worker 'geometric_mean', 114*cda5da8dSAndroid Build Coastguard Worker 'harmonic_mean', 115*cda5da8dSAndroid Build Coastguard Worker 'linear_regression', 116*cda5da8dSAndroid Build Coastguard Worker 'mean', 117*cda5da8dSAndroid Build Coastguard Worker 'median', 118*cda5da8dSAndroid Build Coastguard Worker 'median_grouped', 119*cda5da8dSAndroid Build Coastguard Worker 'median_high', 120*cda5da8dSAndroid Build Coastguard Worker 'median_low', 121*cda5da8dSAndroid Build Coastguard Worker 'mode', 122*cda5da8dSAndroid Build Coastguard Worker 'multimode', 123*cda5da8dSAndroid Build Coastguard Worker 'pstdev', 124*cda5da8dSAndroid Build Coastguard Worker 'pvariance', 125*cda5da8dSAndroid Build Coastguard Worker 'quantiles', 126*cda5da8dSAndroid Build Coastguard Worker 'stdev', 127*cda5da8dSAndroid Build Coastguard Worker 'variance', 128*cda5da8dSAndroid Build Coastguard Worker] 129*cda5da8dSAndroid Build Coastguard Worker 130*cda5da8dSAndroid Build Coastguard Workerimport math 131*cda5da8dSAndroid Build Coastguard Workerimport numbers 132*cda5da8dSAndroid Build Coastguard Workerimport random 133*cda5da8dSAndroid Build Coastguard Workerimport sys 134*cda5da8dSAndroid Build Coastguard Worker 135*cda5da8dSAndroid Build Coastguard Workerfrom fractions import Fraction 136*cda5da8dSAndroid Build Coastguard Workerfrom decimal import Decimal 137*cda5da8dSAndroid Build Coastguard Workerfrom itertools import groupby, repeat 138*cda5da8dSAndroid Build Coastguard Workerfrom bisect import bisect_left, bisect_right 139*cda5da8dSAndroid Build Coastguard Workerfrom math import hypot, sqrt, fabs, exp, erf, tau, log, fsum 140*cda5da8dSAndroid Build Coastguard Workerfrom functools import reduce 141*cda5da8dSAndroid Build Coastguard Workerfrom operator import mul 142*cda5da8dSAndroid Build Coastguard Workerfrom collections import Counter, namedtuple, defaultdict 143*cda5da8dSAndroid Build Coastguard Worker 144*cda5da8dSAndroid Build Coastguard Worker_SQRT2 = sqrt(2.0) 145*cda5da8dSAndroid Build Coastguard Worker 146*cda5da8dSAndroid Build Coastguard Worker# === Exceptions === 147*cda5da8dSAndroid Build Coastguard Worker 148*cda5da8dSAndroid Build Coastguard Workerclass StatisticsError(ValueError): 149*cda5da8dSAndroid Build Coastguard Worker pass 150*cda5da8dSAndroid Build Coastguard Worker 151*cda5da8dSAndroid Build Coastguard Worker 152*cda5da8dSAndroid Build Coastguard Worker# === Private utilities === 153*cda5da8dSAndroid Build Coastguard Worker 154*cda5da8dSAndroid Build Coastguard Workerdef _sum(data): 155*cda5da8dSAndroid Build Coastguard Worker """_sum(data) -> (type, sum, count) 156*cda5da8dSAndroid Build Coastguard Worker 157*cda5da8dSAndroid Build Coastguard Worker Return a high-precision sum of the given numeric data as a fraction, 158*cda5da8dSAndroid Build Coastguard Worker together with the type to be converted to and the count of items. 159*cda5da8dSAndroid Build Coastguard Worker 160*cda5da8dSAndroid Build Coastguard Worker Examples 161*cda5da8dSAndroid Build Coastguard Worker -------- 162*cda5da8dSAndroid Build Coastguard Worker 163*cda5da8dSAndroid Build Coastguard Worker >>> _sum([3, 2.25, 4.5, -0.5, 0.25]) 164*cda5da8dSAndroid Build Coastguard Worker (<class 'float'>, Fraction(19, 2), 5) 165*cda5da8dSAndroid Build Coastguard Worker 166*cda5da8dSAndroid Build Coastguard Worker Some sources of round-off error will be avoided: 167*cda5da8dSAndroid Build Coastguard Worker 168*cda5da8dSAndroid Build Coastguard Worker # Built-in sum returns zero. 169*cda5da8dSAndroid Build Coastguard Worker >>> _sum([1e50, 1, -1e50] * 1000) 170*cda5da8dSAndroid Build Coastguard Worker (<class 'float'>, Fraction(1000, 1), 3000) 171*cda5da8dSAndroid Build Coastguard Worker 172*cda5da8dSAndroid Build Coastguard Worker Fractions and Decimals are also supported: 173*cda5da8dSAndroid Build Coastguard Worker 174*cda5da8dSAndroid Build Coastguard Worker >>> from fractions import Fraction as F 175*cda5da8dSAndroid Build Coastguard Worker >>> _sum([F(2, 3), F(7, 5), F(1, 4), F(5, 6)]) 176*cda5da8dSAndroid Build Coastguard Worker (<class 'fractions.Fraction'>, Fraction(63, 20), 4) 177*cda5da8dSAndroid Build Coastguard Worker 178*cda5da8dSAndroid Build Coastguard Worker >>> from decimal import Decimal as D 179*cda5da8dSAndroid Build Coastguard Worker >>> data = [D("0.1375"), D("0.2108"), D("0.3061"), D("0.0419")] 180*cda5da8dSAndroid Build Coastguard Worker >>> _sum(data) 181*cda5da8dSAndroid Build Coastguard Worker (<class 'decimal.Decimal'>, Fraction(6963, 10000), 4) 182*cda5da8dSAndroid Build Coastguard Worker 183*cda5da8dSAndroid Build Coastguard Worker Mixed types are currently treated as an error, except that int is 184*cda5da8dSAndroid Build Coastguard Worker allowed. 185*cda5da8dSAndroid Build Coastguard Worker """ 186*cda5da8dSAndroid Build Coastguard Worker count = 0 187*cda5da8dSAndroid Build Coastguard Worker types = set() 188*cda5da8dSAndroid Build Coastguard Worker types_add = types.add 189*cda5da8dSAndroid Build Coastguard Worker partials = {} 190*cda5da8dSAndroid Build Coastguard Worker partials_get = partials.get 191*cda5da8dSAndroid Build Coastguard Worker for typ, values in groupby(data, type): 192*cda5da8dSAndroid Build Coastguard Worker types_add(typ) 193*cda5da8dSAndroid Build Coastguard Worker for n, d in map(_exact_ratio, values): 194*cda5da8dSAndroid Build Coastguard Worker count += 1 195*cda5da8dSAndroid Build Coastguard Worker partials[d] = partials_get(d, 0) + n 196*cda5da8dSAndroid Build Coastguard Worker if None in partials: 197*cda5da8dSAndroid Build Coastguard Worker # The sum will be a NAN or INF. We can ignore all the finite 198*cda5da8dSAndroid Build Coastguard Worker # partials, and just look at this special one. 199*cda5da8dSAndroid Build Coastguard Worker total = partials[None] 200*cda5da8dSAndroid Build Coastguard Worker assert not _isfinite(total) 201*cda5da8dSAndroid Build Coastguard Worker else: 202*cda5da8dSAndroid Build Coastguard Worker # Sum all the partial sums using builtin sum. 203*cda5da8dSAndroid Build Coastguard Worker total = sum(Fraction(n, d) for d, n in partials.items()) 204*cda5da8dSAndroid Build Coastguard Worker T = reduce(_coerce, types, int) # or raise TypeError 205*cda5da8dSAndroid Build Coastguard Worker return (T, total, count) 206*cda5da8dSAndroid Build Coastguard Worker 207*cda5da8dSAndroid Build Coastguard Worker 208*cda5da8dSAndroid Build Coastguard Workerdef _ss(data, c=None): 209*cda5da8dSAndroid Build Coastguard Worker """Return the exact mean and sum of square deviations of sequence data. 210*cda5da8dSAndroid Build Coastguard Worker 211*cda5da8dSAndroid Build Coastguard Worker Calculations are done in a single pass, allowing the input to be an iterator. 212*cda5da8dSAndroid Build Coastguard Worker 213*cda5da8dSAndroid Build Coastguard Worker If given *c* is used the mean; otherwise, it is calculated from the data. 214*cda5da8dSAndroid Build Coastguard Worker Use the *c* argument with care, as it can lead to garbage results. 215*cda5da8dSAndroid Build Coastguard Worker 216*cda5da8dSAndroid Build Coastguard Worker """ 217*cda5da8dSAndroid Build Coastguard Worker if c is not None: 218*cda5da8dSAndroid Build Coastguard Worker T, ssd, count = _sum((d := x - c) * d for x in data) 219*cda5da8dSAndroid Build Coastguard Worker return (T, ssd, c, count) 220*cda5da8dSAndroid Build Coastguard Worker count = 0 221*cda5da8dSAndroid Build Coastguard Worker types = set() 222*cda5da8dSAndroid Build Coastguard Worker types_add = types.add 223*cda5da8dSAndroid Build Coastguard Worker sx_partials = defaultdict(int) 224*cda5da8dSAndroid Build Coastguard Worker sxx_partials = defaultdict(int) 225*cda5da8dSAndroid Build Coastguard Worker for typ, values in groupby(data, type): 226*cda5da8dSAndroid Build Coastguard Worker types_add(typ) 227*cda5da8dSAndroid Build Coastguard Worker for n, d in map(_exact_ratio, values): 228*cda5da8dSAndroid Build Coastguard Worker count += 1 229*cda5da8dSAndroid Build Coastguard Worker sx_partials[d] += n 230*cda5da8dSAndroid Build Coastguard Worker sxx_partials[d] += n * n 231*cda5da8dSAndroid Build Coastguard Worker if not count: 232*cda5da8dSAndroid Build Coastguard Worker ssd = c = Fraction(0) 233*cda5da8dSAndroid Build Coastguard Worker elif None in sx_partials: 234*cda5da8dSAndroid Build Coastguard Worker # The sum will be a NAN or INF. We can ignore all the finite 235*cda5da8dSAndroid Build Coastguard Worker # partials, and just look at this special one. 236*cda5da8dSAndroid Build Coastguard Worker ssd = c = sx_partials[None] 237*cda5da8dSAndroid Build Coastguard Worker assert not _isfinite(ssd) 238*cda5da8dSAndroid Build Coastguard Worker else: 239*cda5da8dSAndroid Build Coastguard Worker sx = sum(Fraction(n, d) for d, n in sx_partials.items()) 240*cda5da8dSAndroid Build Coastguard Worker sxx = sum(Fraction(n, d*d) for d, n in sxx_partials.items()) 241*cda5da8dSAndroid Build Coastguard Worker # This formula has poor numeric properties for floats, 242*cda5da8dSAndroid Build Coastguard Worker # but with fractions it is exact. 243*cda5da8dSAndroid Build Coastguard Worker ssd = (count * sxx - sx * sx) / count 244*cda5da8dSAndroid Build Coastguard Worker c = sx / count 245*cda5da8dSAndroid Build Coastguard Worker T = reduce(_coerce, types, int) # or raise TypeError 246*cda5da8dSAndroid Build Coastguard Worker return (T, ssd, c, count) 247*cda5da8dSAndroid Build Coastguard Worker 248*cda5da8dSAndroid Build Coastguard Worker 249*cda5da8dSAndroid Build Coastguard Workerdef _isfinite(x): 250*cda5da8dSAndroid Build Coastguard Worker try: 251*cda5da8dSAndroid Build Coastguard Worker return x.is_finite() # Likely a Decimal. 252*cda5da8dSAndroid Build Coastguard Worker except AttributeError: 253*cda5da8dSAndroid Build Coastguard Worker return math.isfinite(x) # Coerces to float first. 254*cda5da8dSAndroid Build Coastguard Worker 255*cda5da8dSAndroid Build Coastguard Worker 256*cda5da8dSAndroid Build Coastguard Workerdef _coerce(T, S): 257*cda5da8dSAndroid Build Coastguard Worker """Coerce types T and S to a common type, or raise TypeError. 258*cda5da8dSAndroid Build Coastguard Worker 259*cda5da8dSAndroid Build Coastguard Worker Coercion rules are currently an implementation detail. See the CoerceTest 260*cda5da8dSAndroid Build Coastguard Worker test class in test_statistics for details. 261*cda5da8dSAndroid Build Coastguard Worker """ 262*cda5da8dSAndroid Build Coastguard Worker # See http://bugs.python.org/issue24068. 263*cda5da8dSAndroid Build Coastguard Worker assert T is not bool, "initial type T is bool" 264*cda5da8dSAndroid Build Coastguard Worker # If the types are the same, no need to coerce anything. Put this 265*cda5da8dSAndroid Build Coastguard Worker # first, so that the usual case (no coercion needed) happens as soon 266*cda5da8dSAndroid Build Coastguard Worker # as possible. 267*cda5da8dSAndroid Build Coastguard Worker if T is S: return T 268*cda5da8dSAndroid Build Coastguard Worker # Mixed int & other coerce to the other type. 269*cda5da8dSAndroid Build Coastguard Worker if S is int or S is bool: return T 270*cda5da8dSAndroid Build Coastguard Worker if T is int: return S 271*cda5da8dSAndroid Build Coastguard Worker # If one is a (strict) subclass of the other, coerce to the subclass. 272*cda5da8dSAndroid Build Coastguard Worker if issubclass(S, T): return S 273*cda5da8dSAndroid Build Coastguard Worker if issubclass(T, S): return T 274*cda5da8dSAndroid Build Coastguard Worker # Ints coerce to the other type. 275*cda5da8dSAndroid Build Coastguard Worker if issubclass(T, int): return S 276*cda5da8dSAndroid Build Coastguard Worker if issubclass(S, int): return T 277*cda5da8dSAndroid Build Coastguard Worker # Mixed fraction & float coerces to float (or float subclass). 278*cda5da8dSAndroid Build Coastguard Worker if issubclass(T, Fraction) and issubclass(S, float): 279*cda5da8dSAndroid Build Coastguard Worker return S 280*cda5da8dSAndroid Build Coastguard Worker if issubclass(T, float) and issubclass(S, Fraction): 281*cda5da8dSAndroid Build Coastguard Worker return T 282*cda5da8dSAndroid Build Coastguard Worker # Any other combination is disallowed. 283*cda5da8dSAndroid Build Coastguard Worker msg = "don't know how to coerce %s and %s" 284*cda5da8dSAndroid Build Coastguard Worker raise TypeError(msg % (T.__name__, S.__name__)) 285*cda5da8dSAndroid Build Coastguard Worker 286*cda5da8dSAndroid Build Coastguard Worker 287*cda5da8dSAndroid Build Coastguard Workerdef _exact_ratio(x): 288*cda5da8dSAndroid Build Coastguard Worker """Return Real number x to exact (numerator, denominator) pair. 289*cda5da8dSAndroid Build Coastguard Worker 290*cda5da8dSAndroid Build Coastguard Worker >>> _exact_ratio(0.25) 291*cda5da8dSAndroid Build Coastguard Worker (1, 4) 292*cda5da8dSAndroid Build Coastguard Worker 293*cda5da8dSAndroid Build Coastguard Worker x is expected to be an int, Fraction, Decimal or float. 294*cda5da8dSAndroid Build Coastguard Worker """ 295*cda5da8dSAndroid Build Coastguard Worker 296*cda5da8dSAndroid Build Coastguard Worker # XXX We should revisit whether using fractions to accumulate exact 297*cda5da8dSAndroid Build Coastguard Worker # ratios is the right way to go. 298*cda5da8dSAndroid Build Coastguard Worker 299*cda5da8dSAndroid Build Coastguard Worker # The integer ratios for binary floats can have numerators or 300*cda5da8dSAndroid Build Coastguard Worker # denominators with over 300 decimal digits. The problem is more 301*cda5da8dSAndroid Build Coastguard Worker # acute with decimal floats where the default decimal context 302*cda5da8dSAndroid Build Coastguard Worker # supports a huge range of exponents from Emin=-999999 to 303*cda5da8dSAndroid Build Coastguard Worker # Emax=999999. When expanded with as_integer_ratio(), numbers like 304*cda5da8dSAndroid Build Coastguard Worker # Decimal('3.14E+5000') and Decimal('3.14E-5000') have large 305*cda5da8dSAndroid Build Coastguard Worker # numerators or denominators that will slow computation. 306*cda5da8dSAndroid Build Coastguard Worker 307*cda5da8dSAndroid Build Coastguard Worker # When the integer ratios are accumulated as fractions, the size 308*cda5da8dSAndroid Build Coastguard Worker # grows to cover the full range from the smallest magnitude to the 309*cda5da8dSAndroid Build Coastguard Worker # largest. For example, Fraction(3.14E+300) + Fraction(3.14E-300), 310*cda5da8dSAndroid Build Coastguard Worker # has a 616 digit numerator. Likewise, 311*cda5da8dSAndroid Build Coastguard Worker # Fraction(Decimal('3.14E+5000')) + Fraction(Decimal('3.14E-5000')) 312*cda5da8dSAndroid Build Coastguard Worker # has 10,003 digit numerator. 313*cda5da8dSAndroid Build Coastguard Worker 314*cda5da8dSAndroid Build Coastguard Worker # This doesn't seem to have been problem in practice, but it is a 315*cda5da8dSAndroid Build Coastguard Worker # potential pitfall. 316*cda5da8dSAndroid Build Coastguard Worker 317*cda5da8dSAndroid Build Coastguard Worker try: 318*cda5da8dSAndroid Build Coastguard Worker return x.as_integer_ratio() 319*cda5da8dSAndroid Build Coastguard Worker except AttributeError: 320*cda5da8dSAndroid Build Coastguard Worker pass 321*cda5da8dSAndroid Build Coastguard Worker except (OverflowError, ValueError): 322*cda5da8dSAndroid Build Coastguard Worker # float NAN or INF. 323*cda5da8dSAndroid Build Coastguard Worker assert not _isfinite(x) 324*cda5da8dSAndroid Build Coastguard Worker return (x, None) 325*cda5da8dSAndroid Build Coastguard Worker try: 326*cda5da8dSAndroid Build Coastguard Worker # x may be an Integral ABC. 327*cda5da8dSAndroid Build Coastguard Worker return (x.numerator, x.denominator) 328*cda5da8dSAndroid Build Coastguard Worker except AttributeError: 329*cda5da8dSAndroid Build Coastguard Worker msg = f"can't convert type '{type(x).__name__}' to numerator/denominator" 330*cda5da8dSAndroid Build Coastguard Worker raise TypeError(msg) 331*cda5da8dSAndroid Build Coastguard Worker 332*cda5da8dSAndroid Build Coastguard Worker 333*cda5da8dSAndroid Build Coastguard Workerdef _convert(value, T): 334*cda5da8dSAndroid Build Coastguard Worker """Convert value to given numeric type T.""" 335*cda5da8dSAndroid Build Coastguard Worker if type(value) is T: 336*cda5da8dSAndroid Build Coastguard Worker # This covers the cases where T is Fraction, or where value is 337*cda5da8dSAndroid Build Coastguard Worker # a NAN or INF (Decimal or float). 338*cda5da8dSAndroid Build Coastguard Worker return value 339*cda5da8dSAndroid Build Coastguard Worker if issubclass(T, int) and value.denominator != 1: 340*cda5da8dSAndroid Build Coastguard Worker T = float 341*cda5da8dSAndroid Build Coastguard Worker try: 342*cda5da8dSAndroid Build Coastguard Worker # FIXME: what do we do if this overflows? 343*cda5da8dSAndroid Build Coastguard Worker return T(value) 344*cda5da8dSAndroid Build Coastguard Worker except TypeError: 345*cda5da8dSAndroid Build Coastguard Worker if issubclass(T, Decimal): 346*cda5da8dSAndroid Build Coastguard Worker return T(value.numerator) / T(value.denominator) 347*cda5da8dSAndroid Build Coastguard Worker else: 348*cda5da8dSAndroid Build Coastguard Worker raise 349*cda5da8dSAndroid Build Coastguard Worker 350*cda5da8dSAndroid Build Coastguard Worker 351*cda5da8dSAndroid Build Coastguard Workerdef _fail_neg(values, errmsg='negative value'): 352*cda5da8dSAndroid Build Coastguard Worker """Iterate over values, failing if any are less than zero.""" 353*cda5da8dSAndroid Build Coastguard Worker for x in values: 354*cda5da8dSAndroid Build Coastguard Worker if x < 0: 355*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError(errmsg) 356*cda5da8dSAndroid Build Coastguard Worker yield x 357*cda5da8dSAndroid Build Coastguard Worker 358*cda5da8dSAndroid Build Coastguard Worker 359*cda5da8dSAndroid Build Coastguard Workerdef _integer_sqrt_of_frac_rto(n: int, m: int) -> int: 360*cda5da8dSAndroid Build Coastguard Worker """Square root of n/m, rounded to the nearest integer using round-to-odd.""" 361*cda5da8dSAndroid Build Coastguard Worker # Reference: https://www.lri.fr/~melquion/doc/05-imacs17_1-expose.pdf 362*cda5da8dSAndroid Build Coastguard Worker a = math.isqrt(n // m) 363*cda5da8dSAndroid Build Coastguard Worker return a | (a*a*m != n) 364*cda5da8dSAndroid Build Coastguard Worker 365*cda5da8dSAndroid Build Coastguard Worker 366*cda5da8dSAndroid Build Coastguard Worker# For 53 bit precision floats, the bit width used in 367*cda5da8dSAndroid Build Coastguard Worker# _float_sqrt_of_frac() is 109. 368*cda5da8dSAndroid Build Coastguard Worker_sqrt_bit_width: int = 2 * sys.float_info.mant_dig + 3 369*cda5da8dSAndroid Build Coastguard Worker 370*cda5da8dSAndroid Build Coastguard Worker 371*cda5da8dSAndroid Build Coastguard Workerdef _float_sqrt_of_frac(n: int, m: int) -> float: 372*cda5da8dSAndroid Build Coastguard Worker """Square root of n/m as a float, correctly rounded.""" 373*cda5da8dSAndroid Build Coastguard Worker # See principle and proof sketch at: https://bugs.python.org/msg407078 374*cda5da8dSAndroid Build Coastguard Worker q = (n.bit_length() - m.bit_length() - _sqrt_bit_width) // 2 375*cda5da8dSAndroid Build Coastguard Worker if q >= 0: 376*cda5da8dSAndroid Build Coastguard Worker numerator = _integer_sqrt_of_frac_rto(n, m << 2 * q) << q 377*cda5da8dSAndroid Build Coastguard Worker denominator = 1 378*cda5da8dSAndroid Build Coastguard Worker else: 379*cda5da8dSAndroid Build Coastguard Worker numerator = _integer_sqrt_of_frac_rto(n << -2 * q, m) 380*cda5da8dSAndroid Build Coastguard Worker denominator = 1 << -q 381*cda5da8dSAndroid Build Coastguard Worker return numerator / denominator # Convert to float 382*cda5da8dSAndroid Build Coastguard Worker 383*cda5da8dSAndroid Build Coastguard Worker 384*cda5da8dSAndroid Build Coastguard Workerdef _decimal_sqrt_of_frac(n: int, m: int) -> Decimal: 385*cda5da8dSAndroid Build Coastguard Worker """Square root of n/m as a Decimal, correctly rounded.""" 386*cda5da8dSAndroid Build Coastguard Worker # Premise: For decimal, computing (n/m).sqrt() can be off 387*cda5da8dSAndroid Build Coastguard Worker # by 1 ulp from the correctly rounded result. 388*cda5da8dSAndroid Build Coastguard Worker # Method: Check the result, moving up or down a step if needed. 389*cda5da8dSAndroid Build Coastguard Worker if n <= 0: 390*cda5da8dSAndroid Build Coastguard Worker if not n: 391*cda5da8dSAndroid Build Coastguard Worker return Decimal('0.0') 392*cda5da8dSAndroid Build Coastguard Worker n, m = -n, -m 393*cda5da8dSAndroid Build Coastguard Worker 394*cda5da8dSAndroid Build Coastguard Worker root = (Decimal(n) / Decimal(m)).sqrt() 395*cda5da8dSAndroid Build Coastguard Worker nr, dr = root.as_integer_ratio() 396*cda5da8dSAndroid Build Coastguard Worker 397*cda5da8dSAndroid Build Coastguard Worker plus = root.next_plus() 398*cda5da8dSAndroid Build Coastguard Worker np, dp = plus.as_integer_ratio() 399*cda5da8dSAndroid Build Coastguard Worker # test: n / m > ((root + plus) / 2) ** 2 400*cda5da8dSAndroid Build Coastguard Worker if 4 * n * (dr*dp)**2 > m * (dr*np + dp*nr)**2: 401*cda5da8dSAndroid Build Coastguard Worker return plus 402*cda5da8dSAndroid Build Coastguard Worker 403*cda5da8dSAndroid Build Coastguard Worker minus = root.next_minus() 404*cda5da8dSAndroid Build Coastguard Worker nm, dm = minus.as_integer_ratio() 405*cda5da8dSAndroid Build Coastguard Worker # test: n / m < ((root + minus) / 2) ** 2 406*cda5da8dSAndroid Build Coastguard Worker if 4 * n * (dr*dm)**2 < m * (dr*nm + dm*nr)**2: 407*cda5da8dSAndroid Build Coastguard Worker return minus 408*cda5da8dSAndroid Build Coastguard Worker 409*cda5da8dSAndroid Build Coastguard Worker return root 410*cda5da8dSAndroid Build Coastguard Worker 411*cda5da8dSAndroid Build Coastguard Worker 412*cda5da8dSAndroid Build Coastguard Worker# === Measures of central tendency (averages) === 413*cda5da8dSAndroid Build Coastguard Worker 414*cda5da8dSAndroid Build Coastguard Workerdef mean(data): 415*cda5da8dSAndroid Build Coastguard Worker """Return the sample arithmetic mean of data. 416*cda5da8dSAndroid Build Coastguard Worker 417*cda5da8dSAndroid Build Coastguard Worker >>> mean([1, 2, 3, 4, 4]) 418*cda5da8dSAndroid Build Coastguard Worker 2.8 419*cda5da8dSAndroid Build Coastguard Worker 420*cda5da8dSAndroid Build Coastguard Worker >>> from fractions import Fraction as F 421*cda5da8dSAndroid Build Coastguard Worker >>> mean([F(3, 7), F(1, 21), F(5, 3), F(1, 3)]) 422*cda5da8dSAndroid Build Coastguard Worker Fraction(13, 21) 423*cda5da8dSAndroid Build Coastguard Worker 424*cda5da8dSAndroid Build Coastguard Worker >>> from decimal import Decimal as D 425*cda5da8dSAndroid Build Coastguard Worker >>> mean([D("0.5"), D("0.75"), D("0.625"), D("0.375")]) 426*cda5da8dSAndroid Build Coastguard Worker Decimal('0.5625') 427*cda5da8dSAndroid Build Coastguard Worker 428*cda5da8dSAndroid Build Coastguard Worker If ``data`` is empty, StatisticsError will be raised. 429*cda5da8dSAndroid Build Coastguard Worker """ 430*cda5da8dSAndroid Build Coastguard Worker T, total, n = _sum(data) 431*cda5da8dSAndroid Build Coastguard Worker if n < 1: 432*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('mean requires at least one data point') 433*cda5da8dSAndroid Build Coastguard Worker return _convert(total / n, T) 434*cda5da8dSAndroid Build Coastguard Worker 435*cda5da8dSAndroid Build Coastguard Worker 436*cda5da8dSAndroid Build Coastguard Workerdef fmean(data, weights=None): 437*cda5da8dSAndroid Build Coastguard Worker """Convert data to floats and compute the arithmetic mean. 438*cda5da8dSAndroid Build Coastguard Worker 439*cda5da8dSAndroid Build Coastguard Worker This runs faster than the mean() function and it always returns a float. 440*cda5da8dSAndroid Build Coastguard Worker If the input dataset is empty, it raises a StatisticsError. 441*cda5da8dSAndroid Build Coastguard Worker 442*cda5da8dSAndroid Build Coastguard Worker >>> fmean([3.5, 4.0, 5.25]) 443*cda5da8dSAndroid Build Coastguard Worker 4.25 444*cda5da8dSAndroid Build Coastguard Worker """ 445*cda5da8dSAndroid Build Coastguard Worker try: 446*cda5da8dSAndroid Build Coastguard Worker n = len(data) 447*cda5da8dSAndroid Build Coastguard Worker except TypeError: 448*cda5da8dSAndroid Build Coastguard Worker # Handle iterators that do not define __len__(). 449*cda5da8dSAndroid Build Coastguard Worker n = 0 450*cda5da8dSAndroid Build Coastguard Worker def count(iterable): 451*cda5da8dSAndroid Build Coastguard Worker nonlocal n 452*cda5da8dSAndroid Build Coastguard Worker for n, x in enumerate(iterable, start=1): 453*cda5da8dSAndroid Build Coastguard Worker yield x 454*cda5da8dSAndroid Build Coastguard Worker data = count(data) 455*cda5da8dSAndroid Build Coastguard Worker if weights is None: 456*cda5da8dSAndroid Build Coastguard Worker total = fsum(data) 457*cda5da8dSAndroid Build Coastguard Worker if not n: 458*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('fmean requires at least one data point') 459*cda5da8dSAndroid Build Coastguard Worker return total / n 460*cda5da8dSAndroid Build Coastguard Worker try: 461*cda5da8dSAndroid Build Coastguard Worker num_weights = len(weights) 462*cda5da8dSAndroid Build Coastguard Worker except TypeError: 463*cda5da8dSAndroid Build Coastguard Worker weights = list(weights) 464*cda5da8dSAndroid Build Coastguard Worker num_weights = len(weights) 465*cda5da8dSAndroid Build Coastguard Worker num = fsum(map(mul, data, weights)) 466*cda5da8dSAndroid Build Coastguard Worker if n != num_weights: 467*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('data and weights must be the same length') 468*cda5da8dSAndroid Build Coastguard Worker den = fsum(weights) 469*cda5da8dSAndroid Build Coastguard Worker if not den: 470*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('sum of weights must be non-zero') 471*cda5da8dSAndroid Build Coastguard Worker return num / den 472*cda5da8dSAndroid Build Coastguard Worker 473*cda5da8dSAndroid Build Coastguard Worker 474*cda5da8dSAndroid Build Coastguard Workerdef geometric_mean(data): 475*cda5da8dSAndroid Build Coastguard Worker """Convert data to floats and compute the geometric mean. 476*cda5da8dSAndroid Build Coastguard Worker 477*cda5da8dSAndroid Build Coastguard Worker Raises a StatisticsError if the input dataset is empty, 478*cda5da8dSAndroid Build Coastguard Worker if it contains a zero, or if it contains a negative value. 479*cda5da8dSAndroid Build Coastguard Worker 480*cda5da8dSAndroid Build Coastguard Worker No special efforts are made to achieve exact results. 481*cda5da8dSAndroid Build Coastguard Worker (However, this may change in the future.) 482*cda5da8dSAndroid Build Coastguard Worker 483*cda5da8dSAndroid Build Coastguard Worker >>> round(geometric_mean([54, 24, 36]), 9) 484*cda5da8dSAndroid Build Coastguard Worker 36.0 485*cda5da8dSAndroid Build Coastguard Worker """ 486*cda5da8dSAndroid Build Coastguard Worker try: 487*cda5da8dSAndroid Build Coastguard Worker return exp(fmean(map(log, data))) 488*cda5da8dSAndroid Build Coastguard Worker except ValueError: 489*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('geometric mean requires a non-empty dataset ' 490*cda5da8dSAndroid Build Coastguard Worker 'containing positive numbers') from None 491*cda5da8dSAndroid Build Coastguard Worker 492*cda5da8dSAndroid Build Coastguard Worker 493*cda5da8dSAndroid Build Coastguard Workerdef harmonic_mean(data, weights=None): 494*cda5da8dSAndroid Build Coastguard Worker """Return the harmonic mean of data. 495*cda5da8dSAndroid Build Coastguard Worker 496*cda5da8dSAndroid Build Coastguard Worker The harmonic mean is the reciprocal of the arithmetic mean of the 497*cda5da8dSAndroid Build Coastguard Worker reciprocals of the data. It can be used for averaging ratios or 498*cda5da8dSAndroid Build Coastguard Worker rates, for example speeds. 499*cda5da8dSAndroid Build Coastguard Worker 500*cda5da8dSAndroid Build Coastguard Worker Suppose a car travels 40 km/hr for 5 km and then speeds-up to 501*cda5da8dSAndroid Build Coastguard Worker 60 km/hr for another 5 km. What is the average speed? 502*cda5da8dSAndroid Build Coastguard Worker 503*cda5da8dSAndroid Build Coastguard Worker >>> harmonic_mean([40, 60]) 504*cda5da8dSAndroid Build Coastguard Worker 48.0 505*cda5da8dSAndroid Build Coastguard Worker 506*cda5da8dSAndroid Build Coastguard Worker Suppose a car travels 40 km/hr for 5 km, and when traffic clears, 507*cda5da8dSAndroid Build Coastguard Worker speeds-up to 60 km/hr for the remaining 30 km of the journey. What 508*cda5da8dSAndroid Build Coastguard Worker is the average speed? 509*cda5da8dSAndroid Build Coastguard Worker 510*cda5da8dSAndroid Build Coastguard Worker >>> harmonic_mean([40, 60], weights=[5, 30]) 511*cda5da8dSAndroid Build Coastguard Worker 56.0 512*cda5da8dSAndroid Build Coastguard Worker 513*cda5da8dSAndroid Build Coastguard Worker If ``data`` is empty, or any element is less than zero, 514*cda5da8dSAndroid Build Coastguard Worker ``harmonic_mean`` will raise ``StatisticsError``. 515*cda5da8dSAndroid Build Coastguard Worker """ 516*cda5da8dSAndroid Build Coastguard Worker if iter(data) is data: 517*cda5da8dSAndroid Build Coastguard Worker data = list(data) 518*cda5da8dSAndroid Build Coastguard Worker errmsg = 'harmonic mean does not support negative values' 519*cda5da8dSAndroid Build Coastguard Worker n = len(data) 520*cda5da8dSAndroid Build Coastguard Worker if n < 1: 521*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('harmonic_mean requires at least one data point') 522*cda5da8dSAndroid Build Coastguard Worker elif n == 1 and weights is None: 523*cda5da8dSAndroid Build Coastguard Worker x = data[0] 524*cda5da8dSAndroid Build Coastguard Worker if isinstance(x, (numbers.Real, Decimal)): 525*cda5da8dSAndroid Build Coastguard Worker if x < 0: 526*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError(errmsg) 527*cda5da8dSAndroid Build Coastguard Worker return x 528*cda5da8dSAndroid Build Coastguard Worker else: 529*cda5da8dSAndroid Build Coastguard Worker raise TypeError('unsupported type') 530*cda5da8dSAndroid Build Coastguard Worker if weights is None: 531*cda5da8dSAndroid Build Coastguard Worker weights = repeat(1, n) 532*cda5da8dSAndroid Build Coastguard Worker sum_weights = n 533*cda5da8dSAndroid Build Coastguard Worker else: 534*cda5da8dSAndroid Build Coastguard Worker if iter(weights) is weights: 535*cda5da8dSAndroid Build Coastguard Worker weights = list(weights) 536*cda5da8dSAndroid Build Coastguard Worker if len(weights) != n: 537*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('Number of weights does not match data size') 538*cda5da8dSAndroid Build Coastguard Worker _, sum_weights, _ = _sum(w for w in _fail_neg(weights, errmsg)) 539*cda5da8dSAndroid Build Coastguard Worker try: 540*cda5da8dSAndroid Build Coastguard Worker data = _fail_neg(data, errmsg) 541*cda5da8dSAndroid Build Coastguard Worker T, total, count = _sum(w / x if w else 0 for w, x in zip(weights, data)) 542*cda5da8dSAndroid Build Coastguard Worker except ZeroDivisionError: 543*cda5da8dSAndroid Build Coastguard Worker return 0 544*cda5da8dSAndroid Build Coastguard Worker if total <= 0: 545*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('Weighted sum must be positive') 546*cda5da8dSAndroid Build Coastguard Worker return _convert(sum_weights / total, T) 547*cda5da8dSAndroid Build Coastguard Worker 548*cda5da8dSAndroid Build Coastguard Worker# FIXME: investigate ways to calculate medians without sorting? Quickselect? 549*cda5da8dSAndroid Build Coastguard Workerdef median(data): 550*cda5da8dSAndroid Build Coastguard Worker """Return the median (middle value) of numeric data. 551*cda5da8dSAndroid Build Coastguard Worker 552*cda5da8dSAndroid Build Coastguard Worker When the number of data points is odd, return the middle data point. 553*cda5da8dSAndroid Build Coastguard Worker When the number of data points is even, the median is interpolated by 554*cda5da8dSAndroid Build Coastguard Worker taking the average of the two middle values: 555*cda5da8dSAndroid Build Coastguard Worker 556*cda5da8dSAndroid Build Coastguard Worker >>> median([1, 3, 5]) 557*cda5da8dSAndroid Build Coastguard Worker 3 558*cda5da8dSAndroid Build Coastguard Worker >>> median([1, 3, 5, 7]) 559*cda5da8dSAndroid Build Coastguard Worker 4.0 560*cda5da8dSAndroid Build Coastguard Worker 561*cda5da8dSAndroid Build Coastguard Worker """ 562*cda5da8dSAndroid Build Coastguard Worker data = sorted(data) 563*cda5da8dSAndroid Build Coastguard Worker n = len(data) 564*cda5da8dSAndroid Build Coastguard Worker if n == 0: 565*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError("no median for empty data") 566*cda5da8dSAndroid Build Coastguard Worker if n % 2 == 1: 567*cda5da8dSAndroid Build Coastguard Worker return data[n // 2] 568*cda5da8dSAndroid Build Coastguard Worker else: 569*cda5da8dSAndroid Build Coastguard Worker i = n // 2 570*cda5da8dSAndroid Build Coastguard Worker return (data[i - 1] + data[i]) / 2 571*cda5da8dSAndroid Build Coastguard Worker 572*cda5da8dSAndroid Build Coastguard Worker 573*cda5da8dSAndroid Build Coastguard Workerdef median_low(data): 574*cda5da8dSAndroid Build Coastguard Worker """Return the low median of numeric data. 575*cda5da8dSAndroid Build Coastguard Worker 576*cda5da8dSAndroid Build Coastguard Worker When the number of data points is odd, the middle value is returned. 577*cda5da8dSAndroid Build Coastguard Worker When it is even, the smaller of the two middle values is returned. 578*cda5da8dSAndroid Build Coastguard Worker 579*cda5da8dSAndroid Build Coastguard Worker >>> median_low([1, 3, 5]) 580*cda5da8dSAndroid Build Coastguard Worker 3 581*cda5da8dSAndroid Build Coastguard Worker >>> median_low([1, 3, 5, 7]) 582*cda5da8dSAndroid Build Coastguard Worker 3 583*cda5da8dSAndroid Build Coastguard Worker 584*cda5da8dSAndroid Build Coastguard Worker """ 585*cda5da8dSAndroid Build Coastguard Worker data = sorted(data) 586*cda5da8dSAndroid Build Coastguard Worker n = len(data) 587*cda5da8dSAndroid Build Coastguard Worker if n == 0: 588*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError("no median for empty data") 589*cda5da8dSAndroid Build Coastguard Worker if n % 2 == 1: 590*cda5da8dSAndroid Build Coastguard Worker return data[n // 2] 591*cda5da8dSAndroid Build Coastguard Worker else: 592*cda5da8dSAndroid Build Coastguard Worker return data[n // 2 - 1] 593*cda5da8dSAndroid Build Coastguard Worker 594*cda5da8dSAndroid Build Coastguard Worker 595*cda5da8dSAndroid Build Coastguard Workerdef median_high(data): 596*cda5da8dSAndroid Build Coastguard Worker """Return the high median of data. 597*cda5da8dSAndroid Build Coastguard Worker 598*cda5da8dSAndroid Build Coastguard Worker When the number of data points is odd, the middle value is returned. 599*cda5da8dSAndroid Build Coastguard Worker When it is even, the larger of the two middle values is returned. 600*cda5da8dSAndroid Build Coastguard Worker 601*cda5da8dSAndroid Build Coastguard Worker >>> median_high([1, 3, 5]) 602*cda5da8dSAndroid Build Coastguard Worker 3 603*cda5da8dSAndroid Build Coastguard Worker >>> median_high([1, 3, 5, 7]) 604*cda5da8dSAndroid Build Coastguard Worker 5 605*cda5da8dSAndroid Build Coastguard Worker 606*cda5da8dSAndroid Build Coastguard Worker """ 607*cda5da8dSAndroid Build Coastguard Worker data = sorted(data) 608*cda5da8dSAndroid Build Coastguard Worker n = len(data) 609*cda5da8dSAndroid Build Coastguard Worker if n == 0: 610*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError("no median for empty data") 611*cda5da8dSAndroid Build Coastguard Worker return data[n // 2] 612*cda5da8dSAndroid Build Coastguard Worker 613*cda5da8dSAndroid Build Coastguard Worker 614*cda5da8dSAndroid Build Coastguard Workerdef median_grouped(data, interval=1.0): 615*cda5da8dSAndroid Build Coastguard Worker """Estimates the median for numeric data binned around the midpoints 616*cda5da8dSAndroid Build Coastguard Worker of consecutive, fixed-width intervals. 617*cda5da8dSAndroid Build Coastguard Worker 618*cda5da8dSAndroid Build Coastguard Worker The *data* can be any iterable of numeric data with each value being 619*cda5da8dSAndroid Build Coastguard Worker exactly the midpoint of a bin. At least one value must be present. 620*cda5da8dSAndroid Build Coastguard Worker 621*cda5da8dSAndroid Build Coastguard Worker The *interval* is width of each bin. 622*cda5da8dSAndroid Build Coastguard Worker 623*cda5da8dSAndroid Build Coastguard Worker For example, demographic information may have been summarized into 624*cda5da8dSAndroid Build Coastguard Worker consecutive ten-year age groups with each group being represented 625*cda5da8dSAndroid Build Coastguard Worker by the 5-year midpoints of the intervals: 626*cda5da8dSAndroid Build Coastguard Worker 627*cda5da8dSAndroid Build Coastguard Worker >>> demographics = Counter({ 628*cda5da8dSAndroid Build Coastguard Worker ... 25: 172, # 20 to 30 years old 629*cda5da8dSAndroid Build Coastguard Worker ... 35: 484, # 30 to 40 years old 630*cda5da8dSAndroid Build Coastguard Worker ... 45: 387, # 40 to 50 years old 631*cda5da8dSAndroid Build Coastguard Worker ... 55: 22, # 50 to 60 years old 632*cda5da8dSAndroid Build Coastguard Worker ... 65: 6, # 60 to 70 years old 633*cda5da8dSAndroid Build Coastguard Worker ... }) 634*cda5da8dSAndroid Build Coastguard Worker 635*cda5da8dSAndroid Build Coastguard Worker The 50th percentile (median) is the 536th person out of the 1071 636*cda5da8dSAndroid Build Coastguard Worker member cohort. That person is in the 30 to 40 year old age group. 637*cda5da8dSAndroid Build Coastguard Worker 638*cda5da8dSAndroid Build Coastguard Worker The regular median() function would assume that everyone in the 639*cda5da8dSAndroid Build Coastguard Worker tricenarian age group was exactly 35 years old. A more tenable 640*cda5da8dSAndroid Build Coastguard Worker assumption is that the 484 members of that age group are evenly 641*cda5da8dSAndroid Build Coastguard Worker distributed between 30 and 40. For that, we use median_grouped(). 642*cda5da8dSAndroid Build Coastguard Worker 643*cda5da8dSAndroid Build Coastguard Worker >>> data = list(demographics.elements()) 644*cda5da8dSAndroid Build Coastguard Worker >>> median(data) 645*cda5da8dSAndroid Build Coastguard Worker 35 646*cda5da8dSAndroid Build Coastguard Worker >>> round(median_grouped(data, interval=10), 1) 647*cda5da8dSAndroid Build Coastguard Worker 37.5 648*cda5da8dSAndroid Build Coastguard Worker 649*cda5da8dSAndroid Build Coastguard Worker The caller is responsible for making sure the data points are separated 650*cda5da8dSAndroid Build Coastguard Worker by exact multiples of *interval*. This is essential for getting a 651*cda5da8dSAndroid Build Coastguard Worker correct result. The function does not check this precondition. 652*cda5da8dSAndroid Build Coastguard Worker 653*cda5da8dSAndroid Build Coastguard Worker Inputs may be any numeric type that can be coerced to a float during 654*cda5da8dSAndroid Build Coastguard Worker the interpolation step. 655*cda5da8dSAndroid Build Coastguard Worker 656*cda5da8dSAndroid Build Coastguard Worker """ 657*cda5da8dSAndroid Build Coastguard Worker data = sorted(data) 658*cda5da8dSAndroid Build Coastguard Worker n = len(data) 659*cda5da8dSAndroid Build Coastguard Worker if not n: 660*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError("no median for empty data") 661*cda5da8dSAndroid Build Coastguard Worker 662*cda5da8dSAndroid Build Coastguard Worker # Find the value at the midpoint. Remember this corresponds to the 663*cda5da8dSAndroid Build Coastguard Worker # midpoint of the class interval. 664*cda5da8dSAndroid Build Coastguard Worker x = data[n // 2] 665*cda5da8dSAndroid Build Coastguard Worker 666*cda5da8dSAndroid Build Coastguard Worker # Using O(log n) bisection, find where all the x values occur in the data. 667*cda5da8dSAndroid Build Coastguard Worker # All x will lie within data[i:j]. 668*cda5da8dSAndroid Build Coastguard Worker i = bisect_left(data, x) 669*cda5da8dSAndroid Build Coastguard Worker j = bisect_right(data, x, lo=i) 670*cda5da8dSAndroid Build Coastguard Worker 671*cda5da8dSAndroid Build Coastguard Worker # Coerce to floats, raising a TypeError if not possible 672*cda5da8dSAndroid Build Coastguard Worker try: 673*cda5da8dSAndroid Build Coastguard Worker interval = float(interval) 674*cda5da8dSAndroid Build Coastguard Worker x = float(x) 675*cda5da8dSAndroid Build Coastguard Worker except ValueError: 676*cda5da8dSAndroid Build Coastguard Worker raise TypeError(f'Value cannot be converted to a float') 677*cda5da8dSAndroid Build Coastguard Worker 678*cda5da8dSAndroid Build Coastguard Worker # Interpolate the median using the formula found at: 679*cda5da8dSAndroid Build Coastguard Worker # https://www.cuemath.com/data/median-of-grouped-data/ 680*cda5da8dSAndroid Build Coastguard Worker L = x - interval / 2.0 # Lower limit of the median interval 681*cda5da8dSAndroid Build Coastguard Worker cf = i # Cumulative frequency of the preceding interval 682*cda5da8dSAndroid Build Coastguard Worker f = j - i # Number of elements in the median internal 683*cda5da8dSAndroid Build Coastguard Worker return L + interval * (n / 2 - cf) / f 684*cda5da8dSAndroid Build Coastguard Worker 685*cda5da8dSAndroid Build Coastguard Worker 686*cda5da8dSAndroid Build Coastguard Workerdef mode(data): 687*cda5da8dSAndroid Build Coastguard Worker """Return the most common data point from discrete or nominal data. 688*cda5da8dSAndroid Build Coastguard Worker 689*cda5da8dSAndroid Build Coastguard Worker ``mode`` assumes discrete data, and returns a single value. This is the 690*cda5da8dSAndroid Build Coastguard Worker standard treatment of the mode as commonly taught in schools: 691*cda5da8dSAndroid Build Coastguard Worker 692*cda5da8dSAndroid Build Coastguard Worker >>> mode([1, 1, 2, 3, 3, 3, 3, 4]) 693*cda5da8dSAndroid Build Coastguard Worker 3 694*cda5da8dSAndroid Build Coastguard Worker 695*cda5da8dSAndroid Build Coastguard Worker This also works with nominal (non-numeric) data: 696*cda5da8dSAndroid Build Coastguard Worker 697*cda5da8dSAndroid Build Coastguard Worker >>> mode(["red", "blue", "blue", "red", "green", "red", "red"]) 698*cda5da8dSAndroid Build Coastguard Worker 'red' 699*cda5da8dSAndroid Build Coastguard Worker 700*cda5da8dSAndroid Build Coastguard Worker If there are multiple modes with same frequency, return the first one 701*cda5da8dSAndroid Build Coastguard Worker encountered: 702*cda5da8dSAndroid Build Coastguard Worker 703*cda5da8dSAndroid Build Coastguard Worker >>> mode(['red', 'red', 'green', 'blue', 'blue']) 704*cda5da8dSAndroid Build Coastguard Worker 'red' 705*cda5da8dSAndroid Build Coastguard Worker 706*cda5da8dSAndroid Build Coastguard Worker If *data* is empty, ``mode``, raises StatisticsError. 707*cda5da8dSAndroid Build Coastguard Worker 708*cda5da8dSAndroid Build Coastguard Worker """ 709*cda5da8dSAndroid Build Coastguard Worker pairs = Counter(iter(data)).most_common(1) 710*cda5da8dSAndroid Build Coastguard Worker try: 711*cda5da8dSAndroid Build Coastguard Worker return pairs[0][0] 712*cda5da8dSAndroid Build Coastguard Worker except IndexError: 713*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('no mode for empty data') from None 714*cda5da8dSAndroid Build Coastguard Worker 715*cda5da8dSAndroid Build Coastguard Worker 716*cda5da8dSAndroid Build Coastguard Workerdef multimode(data): 717*cda5da8dSAndroid Build Coastguard Worker """Return a list of the most frequently occurring values. 718*cda5da8dSAndroid Build Coastguard Worker 719*cda5da8dSAndroid Build Coastguard Worker Will return more than one result if there are multiple modes 720*cda5da8dSAndroid Build Coastguard Worker or an empty list if *data* is empty. 721*cda5da8dSAndroid Build Coastguard Worker 722*cda5da8dSAndroid Build Coastguard Worker >>> multimode('aabbbbbbbbcc') 723*cda5da8dSAndroid Build Coastguard Worker ['b'] 724*cda5da8dSAndroid Build Coastguard Worker >>> multimode('aabbbbccddddeeffffgg') 725*cda5da8dSAndroid Build Coastguard Worker ['b', 'd', 'f'] 726*cda5da8dSAndroid Build Coastguard Worker >>> multimode('') 727*cda5da8dSAndroid Build Coastguard Worker [] 728*cda5da8dSAndroid Build Coastguard Worker """ 729*cda5da8dSAndroid Build Coastguard Worker counts = Counter(iter(data)) 730*cda5da8dSAndroid Build Coastguard Worker if not counts: 731*cda5da8dSAndroid Build Coastguard Worker return [] 732*cda5da8dSAndroid Build Coastguard Worker maxcount = max(counts.values()) 733*cda5da8dSAndroid Build Coastguard Worker return [value for value, count in counts.items() if count == maxcount] 734*cda5da8dSAndroid Build Coastguard Worker 735*cda5da8dSAndroid Build Coastguard Worker 736*cda5da8dSAndroid Build Coastguard Worker# Notes on methods for computing quantiles 737*cda5da8dSAndroid Build Coastguard Worker# ---------------------------------------- 738*cda5da8dSAndroid Build Coastguard Worker# 739*cda5da8dSAndroid Build Coastguard Worker# There is no one perfect way to compute quantiles. Here we offer 740*cda5da8dSAndroid Build Coastguard Worker# two methods that serve common needs. Most other packages 741*cda5da8dSAndroid Build Coastguard Worker# surveyed offered at least one or both of these two, making them 742*cda5da8dSAndroid Build Coastguard Worker# "standard" in the sense of "widely-adopted and reproducible". 743*cda5da8dSAndroid Build Coastguard Worker# They are also easy to explain, easy to compute manually, and have 744*cda5da8dSAndroid Build Coastguard Worker# straight-forward interpretations that aren't surprising. 745*cda5da8dSAndroid Build Coastguard Worker 746*cda5da8dSAndroid Build Coastguard Worker# The default method is known as "R6", "PERCENTILE.EXC", or "expected 747*cda5da8dSAndroid Build Coastguard Worker# value of rank order statistics". The alternative method is known as 748*cda5da8dSAndroid Build Coastguard Worker# "R7", "PERCENTILE.INC", or "mode of rank order statistics". 749*cda5da8dSAndroid Build Coastguard Worker 750*cda5da8dSAndroid Build Coastguard Worker# For sample data where there is a positive probability for values 751*cda5da8dSAndroid Build Coastguard Worker# beyond the range of the data, the R6 exclusive method is a 752*cda5da8dSAndroid Build Coastguard Worker# reasonable choice. Consider a random sample of nine values from a 753*cda5da8dSAndroid Build Coastguard Worker# population with a uniform distribution from 0.0 to 1.0. The 754*cda5da8dSAndroid Build Coastguard Worker# distribution of the third ranked sample point is described by 755*cda5da8dSAndroid Build Coastguard Worker# betavariate(alpha=3, beta=7) which has mode=0.250, median=0.286, and 756*cda5da8dSAndroid Build Coastguard Worker# mean=0.300. Only the latter (which corresponds with R6) gives the 757*cda5da8dSAndroid Build Coastguard Worker# desired cut point with 30% of the population falling below that 758*cda5da8dSAndroid Build Coastguard Worker# value, making it comparable to a result from an inv_cdf() function. 759*cda5da8dSAndroid Build Coastguard Worker# The R6 exclusive method is also idempotent. 760*cda5da8dSAndroid Build Coastguard Worker 761*cda5da8dSAndroid Build Coastguard Worker# For describing population data where the end points are known to 762*cda5da8dSAndroid Build Coastguard Worker# be included in the data, the R7 inclusive method is a reasonable 763*cda5da8dSAndroid Build Coastguard Worker# choice. Instead of the mean, it uses the mode of the beta 764*cda5da8dSAndroid Build Coastguard Worker# distribution for the interior points. Per Hyndman & Fan, "One nice 765*cda5da8dSAndroid Build Coastguard Worker# property is that the vertices of Q7(p) divide the range into n - 1 766*cda5da8dSAndroid Build Coastguard Worker# intervals, and exactly 100p% of the intervals lie to the left of 767*cda5da8dSAndroid Build Coastguard Worker# Q7(p) and 100(1 - p)% of the intervals lie to the right of Q7(p)." 768*cda5da8dSAndroid Build Coastguard Worker 769*cda5da8dSAndroid Build Coastguard Worker# If needed, other methods could be added. However, for now, the 770*cda5da8dSAndroid Build Coastguard Worker# position is that fewer options make for easier choices and that 771*cda5da8dSAndroid Build Coastguard Worker# external packages can be used for anything more advanced. 772*cda5da8dSAndroid Build Coastguard Worker 773*cda5da8dSAndroid Build Coastguard Workerdef quantiles(data, *, n=4, method='exclusive'): 774*cda5da8dSAndroid Build Coastguard Worker """Divide *data* into *n* continuous intervals with equal probability. 775*cda5da8dSAndroid Build Coastguard Worker 776*cda5da8dSAndroid Build Coastguard Worker Returns a list of (n - 1) cut points separating the intervals. 777*cda5da8dSAndroid Build Coastguard Worker 778*cda5da8dSAndroid Build Coastguard Worker Set *n* to 4 for quartiles (the default). Set *n* to 10 for deciles. 779*cda5da8dSAndroid Build Coastguard Worker Set *n* to 100 for percentiles which gives the 99 cuts points that 780*cda5da8dSAndroid Build Coastguard Worker separate *data* in to 100 equal sized groups. 781*cda5da8dSAndroid Build Coastguard Worker 782*cda5da8dSAndroid Build Coastguard Worker The *data* can be any iterable containing sample. 783*cda5da8dSAndroid Build Coastguard Worker The cut points are linearly interpolated between data points. 784*cda5da8dSAndroid Build Coastguard Worker 785*cda5da8dSAndroid Build Coastguard Worker If *method* is set to *inclusive*, *data* is treated as population 786*cda5da8dSAndroid Build Coastguard Worker data. The minimum value is treated as the 0th percentile and the 787*cda5da8dSAndroid Build Coastguard Worker maximum value is treated as the 100th percentile. 788*cda5da8dSAndroid Build Coastguard Worker """ 789*cda5da8dSAndroid Build Coastguard Worker if n < 1: 790*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('n must be at least 1') 791*cda5da8dSAndroid Build Coastguard Worker data = sorted(data) 792*cda5da8dSAndroid Build Coastguard Worker ld = len(data) 793*cda5da8dSAndroid Build Coastguard Worker if ld < 2: 794*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('must have at least two data points') 795*cda5da8dSAndroid Build Coastguard Worker if method == 'inclusive': 796*cda5da8dSAndroid Build Coastguard Worker m = ld - 1 797*cda5da8dSAndroid Build Coastguard Worker result = [] 798*cda5da8dSAndroid Build Coastguard Worker for i in range(1, n): 799*cda5da8dSAndroid Build Coastguard Worker j, delta = divmod(i * m, n) 800*cda5da8dSAndroid Build Coastguard Worker interpolated = (data[j] * (n - delta) + data[j + 1] * delta) / n 801*cda5da8dSAndroid Build Coastguard Worker result.append(interpolated) 802*cda5da8dSAndroid Build Coastguard Worker return result 803*cda5da8dSAndroid Build Coastguard Worker if method == 'exclusive': 804*cda5da8dSAndroid Build Coastguard Worker m = ld + 1 805*cda5da8dSAndroid Build Coastguard Worker result = [] 806*cda5da8dSAndroid Build Coastguard Worker for i in range(1, n): 807*cda5da8dSAndroid Build Coastguard Worker j = i * m // n # rescale i to m/n 808*cda5da8dSAndroid Build Coastguard Worker j = 1 if j < 1 else ld-1 if j > ld-1 else j # clamp to 1 .. ld-1 809*cda5da8dSAndroid Build Coastguard Worker delta = i*m - j*n # exact integer math 810*cda5da8dSAndroid Build Coastguard Worker interpolated = (data[j - 1] * (n - delta) + data[j] * delta) / n 811*cda5da8dSAndroid Build Coastguard Worker result.append(interpolated) 812*cda5da8dSAndroid Build Coastguard Worker return result 813*cda5da8dSAndroid Build Coastguard Worker raise ValueError(f'Unknown method: {method!r}') 814*cda5da8dSAndroid Build Coastguard Worker 815*cda5da8dSAndroid Build Coastguard Worker 816*cda5da8dSAndroid Build Coastguard Worker# === Measures of spread === 817*cda5da8dSAndroid Build Coastguard Worker 818*cda5da8dSAndroid Build Coastguard Worker# See http://mathworld.wolfram.com/Variance.html 819*cda5da8dSAndroid Build Coastguard Worker# http://mathworld.wolfram.com/SampleVariance.html 820*cda5da8dSAndroid Build Coastguard Worker 821*cda5da8dSAndroid Build Coastguard Worker 822*cda5da8dSAndroid Build Coastguard Workerdef variance(data, xbar=None): 823*cda5da8dSAndroid Build Coastguard Worker """Return the sample variance of data. 824*cda5da8dSAndroid Build Coastguard Worker 825*cda5da8dSAndroid Build Coastguard Worker data should be an iterable of Real-valued numbers, with at least two 826*cda5da8dSAndroid Build Coastguard Worker values. The optional argument xbar, if given, should be the mean of 827*cda5da8dSAndroid Build Coastguard Worker the data. If it is missing or None, the mean is automatically calculated. 828*cda5da8dSAndroid Build Coastguard Worker 829*cda5da8dSAndroid Build Coastguard Worker Use this function when your data is a sample from a population. To 830*cda5da8dSAndroid Build Coastguard Worker calculate the variance from the entire population, see ``pvariance``. 831*cda5da8dSAndroid Build Coastguard Worker 832*cda5da8dSAndroid Build Coastguard Worker Examples: 833*cda5da8dSAndroid Build Coastguard Worker 834*cda5da8dSAndroid Build Coastguard Worker >>> data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5] 835*cda5da8dSAndroid Build Coastguard Worker >>> variance(data) 836*cda5da8dSAndroid Build Coastguard Worker 1.3720238095238095 837*cda5da8dSAndroid Build Coastguard Worker 838*cda5da8dSAndroid Build Coastguard Worker If you have already calculated the mean of your data, you can pass it as 839*cda5da8dSAndroid Build Coastguard Worker the optional second argument ``xbar`` to avoid recalculating it: 840*cda5da8dSAndroid Build Coastguard Worker 841*cda5da8dSAndroid Build Coastguard Worker >>> m = mean(data) 842*cda5da8dSAndroid Build Coastguard Worker >>> variance(data, m) 843*cda5da8dSAndroid Build Coastguard Worker 1.3720238095238095 844*cda5da8dSAndroid Build Coastguard Worker 845*cda5da8dSAndroid Build Coastguard Worker This function does not check that ``xbar`` is actually the mean of 846*cda5da8dSAndroid Build Coastguard Worker ``data``. Giving arbitrary values for ``xbar`` may lead to invalid or 847*cda5da8dSAndroid Build Coastguard Worker impossible results. 848*cda5da8dSAndroid Build Coastguard Worker 849*cda5da8dSAndroid Build Coastguard Worker Decimals and Fractions are supported: 850*cda5da8dSAndroid Build Coastguard Worker 851*cda5da8dSAndroid Build Coastguard Worker >>> from decimal import Decimal as D 852*cda5da8dSAndroid Build Coastguard Worker >>> variance([D("27.5"), D("30.25"), D("30.25"), D("34.5"), D("41.75")]) 853*cda5da8dSAndroid Build Coastguard Worker Decimal('31.01875') 854*cda5da8dSAndroid Build Coastguard Worker 855*cda5da8dSAndroid Build Coastguard Worker >>> from fractions import Fraction as F 856*cda5da8dSAndroid Build Coastguard Worker >>> variance([F(1, 6), F(1, 2), F(5, 3)]) 857*cda5da8dSAndroid Build Coastguard Worker Fraction(67, 108) 858*cda5da8dSAndroid Build Coastguard Worker 859*cda5da8dSAndroid Build Coastguard Worker """ 860*cda5da8dSAndroid Build Coastguard Worker T, ss, c, n = _ss(data, xbar) 861*cda5da8dSAndroid Build Coastguard Worker if n < 2: 862*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('variance requires at least two data points') 863*cda5da8dSAndroid Build Coastguard Worker return _convert(ss / (n - 1), T) 864*cda5da8dSAndroid Build Coastguard Worker 865*cda5da8dSAndroid Build Coastguard Worker 866*cda5da8dSAndroid Build Coastguard Workerdef pvariance(data, mu=None): 867*cda5da8dSAndroid Build Coastguard Worker """Return the population variance of ``data``. 868*cda5da8dSAndroid Build Coastguard Worker 869*cda5da8dSAndroid Build Coastguard Worker data should be a sequence or iterable of Real-valued numbers, with at least one 870*cda5da8dSAndroid Build Coastguard Worker value. The optional argument mu, if given, should be the mean of 871*cda5da8dSAndroid Build Coastguard Worker the data. If it is missing or None, the mean is automatically calculated. 872*cda5da8dSAndroid Build Coastguard Worker 873*cda5da8dSAndroid Build Coastguard Worker Use this function to calculate the variance from the entire population. 874*cda5da8dSAndroid Build Coastguard Worker To estimate the variance from a sample, the ``variance`` function is 875*cda5da8dSAndroid Build Coastguard Worker usually a better choice. 876*cda5da8dSAndroid Build Coastguard Worker 877*cda5da8dSAndroid Build Coastguard Worker Examples: 878*cda5da8dSAndroid Build Coastguard Worker 879*cda5da8dSAndroid Build Coastguard Worker >>> data = [0.0, 0.25, 0.25, 1.25, 1.5, 1.75, 2.75, 3.25] 880*cda5da8dSAndroid Build Coastguard Worker >>> pvariance(data) 881*cda5da8dSAndroid Build Coastguard Worker 1.25 882*cda5da8dSAndroid Build Coastguard Worker 883*cda5da8dSAndroid Build Coastguard Worker If you have already calculated the mean of the data, you can pass it as 884*cda5da8dSAndroid Build Coastguard Worker the optional second argument to avoid recalculating it: 885*cda5da8dSAndroid Build Coastguard Worker 886*cda5da8dSAndroid Build Coastguard Worker >>> mu = mean(data) 887*cda5da8dSAndroid Build Coastguard Worker >>> pvariance(data, mu) 888*cda5da8dSAndroid Build Coastguard Worker 1.25 889*cda5da8dSAndroid Build Coastguard Worker 890*cda5da8dSAndroid Build Coastguard Worker Decimals and Fractions are supported: 891*cda5da8dSAndroid Build Coastguard Worker 892*cda5da8dSAndroid Build Coastguard Worker >>> from decimal import Decimal as D 893*cda5da8dSAndroid Build Coastguard Worker >>> pvariance([D("27.5"), D("30.25"), D("30.25"), D("34.5"), D("41.75")]) 894*cda5da8dSAndroid Build Coastguard Worker Decimal('24.815') 895*cda5da8dSAndroid Build Coastguard Worker 896*cda5da8dSAndroid Build Coastguard Worker >>> from fractions import Fraction as F 897*cda5da8dSAndroid Build Coastguard Worker >>> pvariance([F(1, 4), F(5, 4), F(1, 2)]) 898*cda5da8dSAndroid Build Coastguard Worker Fraction(13, 72) 899*cda5da8dSAndroid Build Coastguard Worker 900*cda5da8dSAndroid Build Coastguard Worker """ 901*cda5da8dSAndroid Build Coastguard Worker T, ss, c, n = _ss(data, mu) 902*cda5da8dSAndroid Build Coastguard Worker if n < 1: 903*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('pvariance requires at least one data point') 904*cda5da8dSAndroid Build Coastguard Worker return _convert(ss / n, T) 905*cda5da8dSAndroid Build Coastguard Worker 906*cda5da8dSAndroid Build Coastguard Worker 907*cda5da8dSAndroid Build Coastguard Workerdef stdev(data, xbar=None): 908*cda5da8dSAndroid Build Coastguard Worker """Return the square root of the sample variance. 909*cda5da8dSAndroid Build Coastguard Worker 910*cda5da8dSAndroid Build Coastguard Worker See ``variance`` for arguments and other details. 911*cda5da8dSAndroid Build Coastguard Worker 912*cda5da8dSAndroid Build Coastguard Worker >>> stdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75]) 913*cda5da8dSAndroid Build Coastguard Worker 1.0810874155219827 914*cda5da8dSAndroid Build Coastguard Worker 915*cda5da8dSAndroid Build Coastguard Worker """ 916*cda5da8dSAndroid Build Coastguard Worker T, ss, c, n = _ss(data, xbar) 917*cda5da8dSAndroid Build Coastguard Worker if n < 2: 918*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('stdev requires at least two data points') 919*cda5da8dSAndroid Build Coastguard Worker mss = ss / (n - 1) 920*cda5da8dSAndroid Build Coastguard Worker if issubclass(T, Decimal): 921*cda5da8dSAndroid Build Coastguard Worker return _decimal_sqrt_of_frac(mss.numerator, mss.denominator) 922*cda5da8dSAndroid Build Coastguard Worker return _float_sqrt_of_frac(mss.numerator, mss.denominator) 923*cda5da8dSAndroid Build Coastguard Worker 924*cda5da8dSAndroid Build Coastguard Worker 925*cda5da8dSAndroid Build Coastguard Workerdef pstdev(data, mu=None): 926*cda5da8dSAndroid Build Coastguard Worker """Return the square root of the population variance. 927*cda5da8dSAndroid Build Coastguard Worker 928*cda5da8dSAndroid Build Coastguard Worker See ``pvariance`` for arguments and other details. 929*cda5da8dSAndroid Build Coastguard Worker 930*cda5da8dSAndroid Build Coastguard Worker >>> pstdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75]) 931*cda5da8dSAndroid Build Coastguard Worker 0.986893273527251 932*cda5da8dSAndroid Build Coastguard Worker 933*cda5da8dSAndroid Build Coastguard Worker """ 934*cda5da8dSAndroid Build Coastguard Worker T, ss, c, n = _ss(data, mu) 935*cda5da8dSAndroid Build Coastguard Worker if n < 1: 936*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('pstdev requires at least one data point') 937*cda5da8dSAndroid Build Coastguard Worker mss = ss / n 938*cda5da8dSAndroid Build Coastguard Worker if issubclass(T, Decimal): 939*cda5da8dSAndroid Build Coastguard Worker return _decimal_sqrt_of_frac(mss.numerator, mss.denominator) 940*cda5da8dSAndroid Build Coastguard Worker return _float_sqrt_of_frac(mss.numerator, mss.denominator) 941*cda5da8dSAndroid Build Coastguard Worker 942*cda5da8dSAndroid Build Coastguard Worker 943*cda5da8dSAndroid Build Coastguard Workerdef _mean_stdev(data): 944*cda5da8dSAndroid Build Coastguard Worker """In one pass, compute the mean and sample standard deviation as floats.""" 945*cda5da8dSAndroid Build Coastguard Worker T, ss, xbar, n = _ss(data) 946*cda5da8dSAndroid Build Coastguard Worker if n < 2: 947*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('stdev requires at least two data points') 948*cda5da8dSAndroid Build Coastguard Worker mss = ss / (n - 1) 949*cda5da8dSAndroid Build Coastguard Worker try: 950*cda5da8dSAndroid Build Coastguard Worker return float(xbar), _float_sqrt_of_frac(mss.numerator, mss.denominator) 951*cda5da8dSAndroid Build Coastguard Worker except AttributeError: 952*cda5da8dSAndroid Build Coastguard Worker # Handle Nans and Infs gracefully 953*cda5da8dSAndroid Build Coastguard Worker return float(xbar), float(xbar) / float(ss) 954*cda5da8dSAndroid Build Coastguard Worker 955*cda5da8dSAndroid Build Coastguard Worker 956*cda5da8dSAndroid Build Coastguard Worker# === Statistics for relations between two inputs === 957*cda5da8dSAndroid Build Coastguard Worker 958*cda5da8dSAndroid Build Coastguard Worker# See https://en.wikipedia.org/wiki/Covariance 959*cda5da8dSAndroid Build Coastguard Worker# https://en.wikipedia.org/wiki/Pearson_correlation_coefficient 960*cda5da8dSAndroid Build Coastguard Worker# https://en.wikipedia.org/wiki/Simple_linear_regression 961*cda5da8dSAndroid Build Coastguard Worker 962*cda5da8dSAndroid Build Coastguard Worker 963*cda5da8dSAndroid Build Coastguard Workerdef covariance(x, y, /): 964*cda5da8dSAndroid Build Coastguard Worker """Covariance 965*cda5da8dSAndroid Build Coastguard Worker 966*cda5da8dSAndroid Build Coastguard Worker Return the sample covariance of two inputs *x* and *y*. Covariance 967*cda5da8dSAndroid Build Coastguard Worker is a measure of the joint variability of two inputs. 968*cda5da8dSAndroid Build Coastguard Worker 969*cda5da8dSAndroid Build Coastguard Worker >>> x = [1, 2, 3, 4, 5, 6, 7, 8, 9] 970*cda5da8dSAndroid Build Coastguard Worker >>> y = [1, 2, 3, 1, 2, 3, 1, 2, 3] 971*cda5da8dSAndroid Build Coastguard Worker >>> covariance(x, y) 972*cda5da8dSAndroid Build Coastguard Worker 0.75 973*cda5da8dSAndroid Build Coastguard Worker >>> z = [9, 8, 7, 6, 5, 4, 3, 2, 1] 974*cda5da8dSAndroid Build Coastguard Worker >>> covariance(x, z) 975*cda5da8dSAndroid Build Coastguard Worker -7.5 976*cda5da8dSAndroid Build Coastguard Worker >>> covariance(z, x) 977*cda5da8dSAndroid Build Coastguard Worker -7.5 978*cda5da8dSAndroid Build Coastguard Worker 979*cda5da8dSAndroid Build Coastguard Worker """ 980*cda5da8dSAndroid Build Coastguard Worker n = len(x) 981*cda5da8dSAndroid Build Coastguard Worker if len(y) != n: 982*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('covariance requires that both inputs have same number of data points') 983*cda5da8dSAndroid Build Coastguard Worker if n < 2: 984*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('covariance requires at least two data points') 985*cda5da8dSAndroid Build Coastguard Worker xbar = fsum(x) / n 986*cda5da8dSAndroid Build Coastguard Worker ybar = fsum(y) / n 987*cda5da8dSAndroid Build Coastguard Worker sxy = fsum((xi - xbar) * (yi - ybar) for xi, yi in zip(x, y)) 988*cda5da8dSAndroid Build Coastguard Worker return sxy / (n - 1) 989*cda5da8dSAndroid Build Coastguard Worker 990*cda5da8dSAndroid Build Coastguard Worker 991*cda5da8dSAndroid Build Coastguard Workerdef correlation(x, y, /): 992*cda5da8dSAndroid Build Coastguard Worker """Pearson's correlation coefficient 993*cda5da8dSAndroid Build Coastguard Worker 994*cda5da8dSAndroid Build Coastguard Worker Return the Pearson's correlation coefficient for two inputs. Pearson's 995*cda5da8dSAndroid Build Coastguard Worker correlation coefficient *r* takes values between -1 and +1. It measures the 996*cda5da8dSAndroid Build Coastguard Worker strength and direction of the linear relationship, where +1 means very 997*cda5da8dSAndroid Build Coastguard Worker strong, positive linear relationship, -1 very strong, negative linear 998*cda5da8dSAndroid Build Coastguard Worker relationship, and 0 no linear relationship. 999*cda5da8dSAndroid Build Coastguard Worker 1000*cda5da8dSAndroid Build Coastguard Worker >>> x = [1, 2, 3, 4, 5, 6, 7, 8, 9] 1001*cda5da8dSAndroid Build Coastguard Worker >>> y = [9, 8, 7, 6, 5, 4, 3, 2, 1] 1002*cda5da8dSAndroid Build Coastguard Worker >>> correlation(x, x) 1003*cda5da8dSAndroid Build Coastguard Worker 1.0 1004*cda5da8dSAndroid Build Coastguard Worker >>> correlation(x, y) 1005*cda5da8dSAndroid Build Coastguard Worker -1.0 1006*cda5da8dSAndroid Build Coastguard Worker 1007*cda5da8dSAndroid Build Coastguard Worker """ 1008*cda5da8dSAndroid Build Coastguard Worker n = len(x) 1009*cda5da8dSAndroid Build Coastguard Worker if len(y) != n: 1010*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('correlation requires that both inputs have same number of data points') 1011*cda5da8dSAndroid Build Coastguard Worker if n < 2: 1012*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('correlation requires at least two data points') 1013*cda5da8dSAndroid Build Coastguard Worker xbar = fsum(x) / n 1014*cda5da8dSAndroid Build Coastguard Worker ybar = fsum(y) / n 1015*cda5da8dSAndroid Build Coastguard Worker sxy = fsum((xi - xbar) * (yi - ybar) for xi, yi in zip(x, y)) 1016*cda5da8dSAndroid Build Coastguard Worker sxx = fsum((d := xi - xbar) * d for xi in x) 1017*cda5da8dSAndroid Build Coastguard Worker syy = fsum((d := yi - ybar) * d for yi in y) 1018*cda5da8dSAndroid Build Coastguard Worker try: 1019*cda5da8dSAndroid Build Coastguard Worker return sxy / sqrt(sxx * syy) 1020*cda5da8dSAndroid Build Coastguard Worker except ZeroDivisionError: 1021*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('at least one of the inputs is constant') 1022*cda5da8dSAndroid Build Coastguard Worker 1023*cda5da8dSAndroid Build Coastguard Worker 1024*cda5da8dSAndroid Build Coastguard WorkerLinearRegression = namedtuple('LinearRegression', ('slope', 'intercept')) 1025*cda5da8dSAndroid Build Coastguard Worker 1026*cda5da8dSAndroid Build Coastguard Worker 1027*cda5da8dSAndroid Build Coastguard Workerdef linear_regression(x, y, /, *, proportional=False): 1028*cda5da8dSAndroid Build Coastguard Worker """Slope and intercept for simple linear regression. 1029*cda5da8dSAndroid Build Coastguard Worker 1030*cda5da8dSAndroid Build Coastguard Worker Return the slope and intercept of simple linear regression 1031*cda5da8dSAndroid Build Coastguard Worker parameters estimated using ordinary least squares. Simple linear 1032*cda5da8dSAndroid Build Coastguard Worker regression describes relationship between an independent variable 1033*cda5da8dSAndroid Build Coastguard Worker *x* and a dependent variable *y* in terms of a linear function: 1034*cda5da8dSAndroid Build Coastguard Worker 1035*cda5da8dSAndroid Build Coastguard Worker y = slope * x + intercept + noise 1036*cda5da8dSAndroid Build Coastguard Worker 1037*cda5da8dSAndroid Build Coastguard Worker where *slope* and *intercept* are the regression parameters that are 1038*cda5da8dSAndroid Build Coastguard Worker estimated, and noise represents the variability of the data that was 1039*cda5da8dSAndroid Build Coastguard Worker not explained by the linear regression (it is equal to the 1040*cda5da8dSAndroid Build Coastguard Worker difference between predicted and actual values of the dependent 1041*cda5da8dSAndroid Build Coastguard Worker variable). 1042*cda5da8dSAndroid Build Coastguard Worker 1043*cda5da8dSAndroid Build Coastguard Worker The parameters are returned as a named tuple. 1044*cda5da8dSAndroid Build Coastguard Worker 1045*cda5da8dSAndroid Build Coastguard Worker >>> x = [1, 2, 3, 4, 5] 1046*cda5da8dSAndroid Build Coastguard Worker >>> noise = NormalDist().samples(5, seed=42) 1047*cda5da8dSAndroid Build Coastguard Worker >>> y = [3 * x[i] + 2 + noise[i] for i in range(5)] 1048*cda5da8dSAndroid Build Coastguard Worker >>> linear_regression(x, y) #doctest: +ELLIPSIS 1049*cda5da8dSAndroid Build Coastguard Worker LinearRegression(slope=3.09078914170..., intercept=1.75684970486...) 1050*cda5da8dSAndroid Build Coastguard Worker 1051*cda5da8dSAndroid Build Coastguard Worker If *proportional* is true, the independent variable *x* and the 1052*cda5da8dSAndroid Build Coastguard Worker dependent variable *y* are assumed to be directly proportional. 1053*cda5da8dSAndroid Build Coastguard Worker The data is fit to a line passing through the origin. 1054*cda5da8dSAndroid Build Coastguard Worker 1055*cda5da8dSAndroid Build Coastguard Worker Since the *intercept* will always be 0.0, the underlying linear 1056*cda5da8dSAndroid Build Coastguard Worker function simplifies to: 1057*cda5da8dSAndroid Build Coastguard Worker 1058*cda5da8dSAndroid Build Coastguard Worker y = slope * x + noise 1059*cda5da8dSAndroid Build Coastguard Worker 1060*cda5da8dSAndroid Build Coastguard Worker >>> y = [3 * x[i] + noise[i] for i in range(5)] 1061*cda5da8dSAndroid Build Coastguard Worker >>> linear_regression(x, y, proportional=True) #doctest: +ELLIPSIS 1062*cda5da8dSAndroid Build Coastguard Worker LinearRegression(slope=3.02447542484..., intercept=0.0) 1063*cda5da8dSAndroid Build Coastguard Worker 1064*cda5da8dSAndroid Build Coastguard Worker """ 1065*cda5da8dSAndroid Build Coastguard Worker n = len(x) 1066*cda5da8dSAndroid Build Coastguard Worker if len(y) != n: 1067*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('linear regression requires that both inputs have same number of data points') 1068*cda5da8dSAndroid Build Coastguard Worker if n < 2: 1069*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('linear regression requires at least two data points') 1070*cda5da8dSAndroid Build Coastguard Worker if proportional: 1071*cda5da8dSAndroid Build Coastguard Worker sxy = fsum(xi * yi for xi, yi in zip(x, y)) 1072*cda5da8dSAndroid Build Coastguard Worker sxx = fsum(xi * xi for xi in x) 1073*cda5da8dSAndroid Build Coastguard Worker else: 1074*cda5da8dSAndroid Build Coastguard Worker xbar = fsum(x) / n 1075*cda5da8dSAndroid Build Coastguard Worker ybar = fsum(y) / n 1076*cda5da8dSAndroid Build Coastguard Worker sxy = fsum((xi - xbar) * (yi - ybar) for xi, yi in zip(x, y)) 1077*cda5da8dSAndroid Build Coastguard Worker sxx = fsum((d := xi - xbar) * d for xi in x) 1078*cda5da8dSAndroid Build Coastguard Worker try: 1079*cda5da8dSAndroid Build Coastguard Worker slope = sxy / sxx # equivalent to: covariance(x, y) / variance(x) 1080*cda5da8dSAndroid Build Coastguard Worker except ZeroDivisionError: 1081*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('x is constant') 1082*cda5da8dSAndroid Build Coastguard Worker intercept = 0.0 if proportional else ybar - slope * xbar 1083*cda5da8dSAndroid Build Coastguard Worker return LinearRegression(slope=slope, intercept=intercept) 1084*cda5da8dSAndroid Build Coastguard Worker 1085*cda5da8dSAndroid Build Coastguard Worker 1086*cda5da8dSAndroid Build Coastguard Worker## Normal Distribution ##################################################### 1087*cda5da8dSAndroid Build Coastguard Worker 1088*cda5da8dSAndroid Build Coastguard Worker 1089*cda5da8dSAndroid Build Coastguard Workerdef _normal_dist_inv_cdf(p, mu, sigma): 1090*cda5da8dSAndroid Build Coastguard Worker # There is no closed-form solution to the inverse CDF for the normal 1091*cda5da8dSAndroid Build Coastguard Worker # distribution, so we use a rational approximation instead: 1092*cda5da8dSAndroid Build Coastguard Worker # Wichura, M.J. (1988). "Algorithm AS241: The Percentage Points of the 1093*cda5da8dSAndroid Build Coastguard Worker # Normal Distribution". Applied Statistics. Blackwell Publishing. 37 1094*cda5da8dSAndroid Build Coastguard Worker # (3): 477–484. doi:10.2307/2347330. JSTOR 2347330. 1095*cda5da8dSAndroid Build Coastguard Worker q = p - 0.5 1096*cda5da8dSAndroid Build Coastguard Worker if fabs(q) <= 0.425: 1097*cda5da8dSAndroid Build Coastguard Worker r = 0.180625 - q * q 1098*cda5da8dSAndroid Build Coastguard Worker # Hash sum: 55.88319_28806_14901_4439 1099*cda5da8dSAndroid Build Coastguard Worker num = (((((((2.50908_09287_30122_6727e+3 * r + 1100*cda5da8dSAndroid Build Coastguard Worker 3.34305_75583_58812_8105e+4) * r + 1101*cda5da8dSAndroid Build Coastguard Worker 6.72657_70927_00870_0853e+4) * r + 1102*cda5da8dSAndroid Build Coastguard Worker 4.59219_53931_54987_1457e+4) * r + 1103*cda5da8dSAndroid Build Coastguard Worker 1.37316_93765_50946_1125e+4) * r + 1104*cda5da8dSAndroid Build Coastguard Worker 1.97159_09503_06551_4427e+3) * r + 1105*cda5da8dSAndroid Build Coastguard Worker 1.33141_66789_17843_7745e+2) * r + 1106*cda5da8dSAndroid Build Coastguard Worker 3.38713_28727_96366_6080e+0) * q 1107*cda5da8dSAndroid Build Coastguard Worker den = (((((((5.22649_52788_52854_5610e+3 * r + 1108*cda5da8dSAndroid Build Coastguard Worker 2.87290_85735_72194_2674e+4) * r + 1109*cda5da8dSAndroid Build Coastguard Worker 3.93078_95800_09271_0610e+4) * r + 1110*cda5da8dSAndroid Build Coastguard Worker 2.12137_94301_58659_5867e+4) * r + 1111*cda5da8dSAndroid Build Coastguard Worker 5.39419_60214_24751_1077e+3) * r + 1112*cda5da8dSAndroid Build Coastguard Worker 6.87187_00749_20579_0830e+2) * r + 1113*cda5da8dSAndroid Build Coastguard Worker 4.23133_30701_60091_1252e+1) * r + 1114*cda5da8dSAndroid Build Coastguard Worker 1.0) 1115*cda5da8dSAndroid Build Coastguard Worker x = num / den 1116*cda5da8dSAndroid Build Coastguard Worker return mu + (x * sigma) 1117*cda5da8dSAndroid Build Coastguard Worker r = p if q <= 0.0 else 1.0 - p 1118*cda5da8dSAndroid Build Coastguard Worker r = sqrt(-log(r)) 1119*cda5da8dSAndroid Build Coastguard Worker if r <= 5.0: 1120*cda5da8dSAndroid Build Coastguard Worker r = r - 1.6 1121*cda5da8dSAndroid Build Coastguard Worker # Hash sum: 49.33206_50330_16102_89036 1122*cda5da8dSAndroid Build Coastguard Worker num = (((((((7.74545_01427_83414_07640e-4 * r + 1123*cda5da8dSAndroid Build Coastguard Worker 2.27238_44989_26918_45833e-2) * r + 1124*cda5da8dSAndroid Build Coastguard Worker 2.41780_72517_74506_11770e-1) * r + 1125*cda5da8dSAndroid Build Coastguard Worker 1.27045_82524_52368_38258e+0) * r + 1126*cda5da8dSAndroid Build Coastguard Worker 3.64784_83247_63204_60504e+0) * r + 1127*cda5da8dSAndroid Build Coastguard Worker 5.76949_72214_60691_40550e+0) * r + 1128*cda5da8dSAndroid Build Coastguard Worker 4.63033_78461_56545_29590e+0) * r + 1129*cda5da8dSAndroid Build Coastguard Worker 1.42343_71107_49683_57734e+0) 1130*cda5da8dSAndroid Build Coastguard Worker den = (((((((1.05075_00716_44416_84324e-9 * r + 1131*cda5da8dSAndroid Build Coastguard Worker 5.47593_80849_95344_94600e-4) * r + 1132*cda5da8dSAndroid Build Coastguard Worker 1.51986_66563_61645_71966e-2) * r + 1133*cda5da8dSAndroid Build Coastguard Worker 1.48103_97642_74800_74590e-1) * r + 1134*cda5da8dSAndroid Build Coastguard Worker 6.89767_33498_51000_04550e-1) * r + 1135*cda5da8dSAndroid Build Coastguard Worker 1.67638_48301_83803_84940e+0) * r + 1136*cda5da8dSAndroid Build Coastguard Worker 2.05319_16266_37758_82187e+0) * r + 1137*cda5da8dSAndroid Build Coastguard Worker 1.0) 1138*cda5da8dSAndroid Build Coastguard Worker else: 1139*cda5da8dSAndroid Build Coastguard Worker r = r - 5.0 1140*cda5da8dSAndroid Build Coastguard Worker # Hash sum: 47.52583_31754_92896_71629 1141*cda5da8dSAndroid Build Coastguard Worker num = (((((((2.01033_43992_92288_13265e-7 * r + 1142*cda5da8dSAndroid Build Coastguard Worker 2.71155_55687_43487_57815e-5) * r + 1143*cda5da8dSAndroid Build Coastguard Worker 1.24266_09473_88078_43860e-3) * r + 1144*cda5da8dSAndroid Build Coastguard Worker 2.65321_89526_57612_30930e-2) * r + 1145*cda5da8dSAndroid Build Coastguard Worker 2.96560_57182_85048_91230e-1) * r + 1146*cda5da8dSAndroid Build Coastguard Worker 1.78482_65399_17291_33580e+0) * r + 1147*cda5da8dSAndroid Build Coastguard Worker 5.46378_49111_64114_36990e+0) * r + 1148*cda5da8dSAndroid Build Coastguard Worker 6.65790_46435_01103_77720e+0) 1149*cda5da8dSAndroid Build Coastguard Worker den = (((((((2.04426_31033_89939_78564e-15 * r + 1150*cda5da8dSAndroid Build Coastguard Worker 1.42151_17583_16445_88870e-7) * r + 1151*cda5da8dSAndroid Build Coastguard Worker 1.84631_83175_10054_68180e-5) * r + 1152*cda5da8dSAndroid Build Coastguard Worker 7.86869_13114_56132_59100e-4) * r + 1153*cda5da8dSAndroid Build Coastguard Worker 1.48753_61290_85061_48525e-2) * r + 1154*cda5da8dSAndroid Build Coastguard Worker 1.36929_88092_27358_05310e-1) * r + 1155*cda5da8dSAndroid Build Coastguard Worker 5.99832_20655_58879_37690e-1) * r + 1156*cda5da8dSAndroid Build Coastguard Worker 1.0) 1157*cda5da8dSAndroid Build Coastguard Worker x = num / den 1158*cda5da8dSAndroid Build Coastguard Worker if q < 0.0: 1159*cda5da8dSAndroid Build Coastguard Worker x = -x 1160*cda5da8dSAndroid Build Coastguard Worker return mu + (x * sigma) 1161*cda5da8dSAndroid Build Coastguard Worker 1162*cda5da8dSAndroid Build Coastguard Worker 1163*cda5da8dSAndroid Build Coastguard Worker# If available, use C implementation 1164*cda5da8dSAndroid Build Coastguard Workertry: 1165*cda5da8dSAndroid Build Coastguard Worker from _statistics import _normal_dist_inv_cdf 1166*cda5da8dSAndroid Build Coastguard Workerexcept ImportError: 1167*cda5da8dSAndroid Build Coastguard Worker pass 1168*cda5da8dSAndroid Build Coastguard Worker 1169*cda5da8dSAndroid Build Coastguard Worker 1170*cda5da8dSAndroid Build Coastguard Workerclass NormalDist: 1171*cda5da8dSAndroid Build Coastguard Worker "Normal distribution of a random variable" 1172*cda5da8dSAndroid Build Coastguard Worker # https://en.wikipedia.org/wiki/Normal_distribution 1173*cda5da8dSAndroid Build Coastguard Worker # https://en.wikipedia.org/wiki/Variance#Properties 1174*cda5da8dSAndroid Build Coastguard Worker 1175*cda5da8dSAndroid Build Coastguard Worker __slots__ = { 1176*cda5da8dSAndroid Build Coastguard Worker '_mu': 'Arithmetic mean of a normal distribution', 1177*cda5da8dSAndroid Build Coastguard Worker '_sigma': 'Standard deviation of a normal distribution', 1178*cda5da8dSAndroid Build Coastguard Worker } 1179*cda5da8dSAndroid Build Coastguard Worker 1180*cda5da8dSAndroid Build Coastguard Worker def __init__(self, mu=0.0, sigma=1.0): 1181*cda5da8dSAndroid Build Coastguard Worker "NormalDist where mu is the mean and sigma is the standard deviation." 1182*cda5da8dSAndroid Build Coastguard Worker if sigma < 0.0: 1183*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('sigma must be non-negative') 1184*cda5da8dSAndroid Build Coastguard Worker self._mu = float(mu) 1185*cda5da8dSAndroid Build Coastguard Worker self._sigma = float(sigma) 1186*cda5da8dSAndroid Build Coastguard Worker 1187*cda5da8dSAndroid Build Coastguard Worker @classmethod 1188*cda5da8dSAndroid Build Coastguard Worker def from_samples(cls, data): 1189*cda5da8dSAndroid Build Coastguard Worker "Make a normal distribution instance from sample data." 1190*cda5da8dSAndroid Build Coastguard Worker return cls(*_mean_stdev(data)) 1191*cda5da8dSAndroid Build Coastguard Worker 1192*cda5da8dSAndroid Build Coastguard Worker def samples(self, n, *, seed=None): 1193*cda5da8dSAndroid Build Coastguard Worker "Generate *n* samples for a given mean and standard deviation." 1194*cda5da8dSAndroid Build Coastguard Worker gauss = random.gauss if seed is None else random.Random(seed).gauss 1195*cda5da8dSAndroid Build Coastguard Worker mu, sigma = self._mu, self._sigma 1196*cda5da8dSAndroid Build Coastguard Worker return [gauss(mu, sigma) for i in range(n)] 1197*cda5da8dSAndroid Build Coastguard Worker 1198*cda5da8dSAndroid Build Coastguard Worker def pdf(self, x): 1199*cda5da8dSAndroid Build Coastguard Worker "Probability density function. P(x <= X < x+dx) / dx" 1200*cda5da8dSAndroid Build Coastguard Worker variance = self._sigma * self._sigma 1201*cda5da8dSAndroid Build Coastguard Worker if not variance: 1202*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('pdf() not defined when sigma is zero') 1203*cda5da8dSAndroid Build Coastguard Worker diff = x - self._mu 1204*cda5da8dSAndroid Build Coastguard Worker return exp(diff * diff / (-2.0 * variance)) / sqrt(tau * variance) 1205*cda5da8dSAndroid Build Coastguard Worker 1206*cda5da8dSAndroid Build Coastguard Worker def cdf(self, x): 1207*cda5da8dSAndroid Build Coastguard Worker "Cumulative distribution function. P(X <= x)" 1208*cda5da8dSAndroid Build Coastguard Worker if not self._sigma: 1209*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('cdf() not defined when sigma is zero') 1210*cda5da8dSAndroid Build Coastguard Worker return 0.5 * (1.0 + erf((x - self._mu) / (self._sigma * _SQRT2))) 1211*cda5da8dSAndroid Build Coastguard Worker 1212*cda5da8dSAndroid Build Coastguard Worker def inv_cdf(self, p): 1213*cda5da8dSAndroid Build Coastguard Worker """Inverse cumulative distribution function. x : P(X <= x) = p 1214*cda5da8dSAndroid Build Coastguard Worker 1215*cda5da8dSAndroid Build Coastguard Worker Finds the value of the random variable such that the probability of 1216*cda5da8dSAndroid Build Coastguard Worker the variable being less than or equal to that value equals the given 1217*cda5da8dSAndroid Build Coastguard Worker probability. 1218*cda5da8dSAndroid Build Coastguard Worker 1219*cda5da8dSAndroid Build Coastguard Worker This function is also called the percent point function or quantile 1220*cda5da8dSAndroid Build Coastguard Worker function. 1221*cda5da8dSAndroid Build Coastguard Worker """ 1222*cda5da8dSAndroid Build Coastguard Worker if p <= 0.0 or p >= 1.0: 1223*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('p must be in the range 0.0 < p < 1.0') 1224*cda5da8dSAndroid Build Coastguard Worker if self._sigma <= 0.0: 1225*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('cdf() not defined when sigma at or below zero') 1226*cda5da8dSAndroid Build Coastguard Worker return _normal_dist_inv_cdf(p, self._mu, self._sigma) 1227*cda5da8dSAndroid Build Coastguard Worker 1228*cda5da8dSAndroid Build Coastguard Worker def quantiles(self, n=4): 1229*cda5da8dSAndroid Build Coastguard Worker """Divide into *n* continuous intervals with equal probability. 1230*cda5da8dSAndroid Build Coastguard Worker 1231*cda5da8dSAndroid Build Coastguard Worker Returns a list of (n - 1) cut points separating the intervals. 1232*cda5da8dSAndroid Build Coastguard Worker 1233*cda5da8dSAndroid Build Coastguard Worker Set *n* to 4 for quartiles (the default). Set *n* to 10 for deciles. 1234*cda5da8dSAndroid Build Coastguard Worker Set *n* to 100 for percentiles which gives the 99 cuts points that 1235*cda5da8dSAndroid Build Coastguard Worker separate the normal distribution in to 100 equal sized groups. 1236*cda5da8dSAndroid Build Coastguard Worker """ 1237*cda5da8dSAndroid Build Coastguard Worker return [self.inv_cdf(i / n) for i in range(1, n)] 1238*cda5da8dSAndroid Build Coastguard Worker 1239*cda5da8dSAndroid Build Coastguard Worker def overlap(self, other): 1240*cda5da8dSAndroid Build Coastguard Worker """Compute the overlapping coefficient (OVL) between two normal distributions. 1241*cda5da8dSAndroid Build Coastguard Worker 1242*cda5da8dSAndroid Build Coastguard Worker Measures the agreement between two normal probability distributions. 1243*cda5da8dSAndroid Build Coastguard Worker Returns a value between 0.0 and 1.0 giving the overlapping area in 1244*cda5da8dSAndroid Build Coastguard Worker the two underlying probability density functions. 1245*cda5da8dSAndroid Build Coastguard Worker 1246*cda5da8dSAndroid Build Coastguard Worker >>> N1 = NormalDist(2.4, 1.6) 1247*cda5da8dSAndroid Build Coastguard Worker >>> N2 = NormalDist(3.2, 2.0) 1248*cda5da8dSAndroid Build Coastguard Worker >>> N1.overlap(N2) 1249*cda5da8dSAndroid Build Coastguard Worker 0.8035050657330205 1250*cda5da8dSAndroid Build Coastguard Worker """ 1251*cda5da8dSAndroid Build Coastguard Worker # See: "The overlapping coefficient as a measure of agreement between 1252*cda5da8dSAndroid Build Coastguard Worker # probability distributions and point estimation of the overlap of two 1253*cda5da8dSAndroid Build Coastguard Worker # normal densities" -- Henry F. Inman and Edwin L. Bradley Jr 1254*cda5da8dSAndroid Build Coastguard Worker # http://dx.doi.org/10.1080/03610928908830127 1255*cda5da8dSAndroid Build Coastguard Worker if not isinstance(other, NormalDist): 1256*cda5da8dSAndroid Build Coastguard Worker raise TypeError('Expected another NormalDist instance') 1257*cda5da8dSAndroid Build Coastguard Worker X, Y = self, other 1258*cda5da8dSAndroid Build Coastguard Worker if (Y._sigma, Y._mu) < (X._sigma, X._mu): # sort to assure commutativity 1259*cda5da8dSAndroid Build Coastguard Worker X, Y = Y, X 1260*cda5da8dSAndroid Build Coastguard Worker X_var, Y_var = X.variance, Y.variance 1261*cda5da8dSAndroid Build Coastguard Worker if not X_var or not Y_var: 1262*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('overlap() not defined when sigma is zero') 1263*cda5da8dSAndroid Build Coastguard Worker dv = Y_var - X_var 1264*cda5da8dSAndroid Build Coastguard Worker dm = fabs(Y._mu - X._mu) 1265*cda5da8dSAndroid Build Coastguard Worker if not dv: 1266*cda5da8dSAndroid Build Coastguard Worker return 1.0 - erf(dm / (2.0 * X._sigma * _SQRT2)) 1267*cda5da8dSAndroid Build Coastguard Worker a = X._mu * Y_var - Y._mu * X_var 1268*cda5da8dSAndroid Build Coastguard Worker b = X._sigma * Y._sigma * sqrt(dm * dm + dv * log(Y_var / X_var)) 1269*cda5da8dSAndroid Build Coastguard Worker x1 = (a + b) / dv 1270*cda5da8dSAndroid Build Coastguard Worker x2 = (a - b) / dv 1271*cda5da8dSAndroid Build Coastguard Worker return 1.0 - (fabs(Y.cdf(x1) - X.cdf(x1)) + fabs(Y.cdf(x2) - X.cdf(x2))) 1272*cda5da8dSAndroid Build Coastguard Worker 1273*cda5da8dSAndroid Build Coastguard Worker def zscore(self, x): 1274*cda5da8dSAndroid Build Coastguard Worker """Compute the Standard Score. (x - mean) / stdev 1275*cda5da8dSAndroid Build Coastguard Worker 1276*cda5da8dSAndroid Build Coastguard Worker Describes *x* in terms of the number of standard deviations 1277*cda5da8dSAndroid Build Coastguard Worker above or below the mean of the normal distribution. 1278*cda5da8dSAndroid Build Coastguard Worker """ 1279*cda5da8dSAndroid Build Coastguard Worker # https://www.statisticshowto.com/probability-and-statistics/z-score/ 1280*cda5da8dSAndroid Build Coastguard Worker if not self._sigma: 1281*cda5da8dSAndroid Build Coastguard Worker raise StatisticsError('zscore() not defined when sigma is zero') 1282*cda5da8dSAndroid Build Coastguard Worker return (x - self._mu) / self._sigma 1283*cda5da8dSAndroid Build Coastguard Worker 1284*cda5da8dSAndroid Build Coastguard Worker @property 1285*cda5da8dSAndroid Build Coastguard Worker def mean(self): 1286*cda5da8dSAndroid Build Coastguard Worker "Arithmetic mean of the normal distribution." 1287*cda5da8dSAndroid Build Coastguard Worker return self._mu 1288*cda5da8dSAndroid Build Coastguard Worker 1289*cda5da8dSAndroid Build Coastguard Worker @property 1290*cda5da8dSAndroid Build Coastguard Worker def median(self): 1291*cda5da8dSAndroid Build Coastguard Worker "Return the median of the normal distribution" 1292*cda5da8dSAndroid Build Coastguard Worker return self._mu 1293*cda5da8dSAndroid Build Coastguard Worker 1294*cda5da8dSAndroid Build Coastguard Worker @property 1295*cda5da8dSAndroid Build Coastguard Worker def mode(self): 1296*cda5da8dSAndroid Build Coastguard Worker """Return the mode of the normal distribution 1297*cda5da8dSAndroid Build Coastguard Worker 1298*cda5da8dSAndroid Build Coastguard Worker The mode is the value x where which the probability density 1299*cda5da8dSAndroid Build Coastguard Worker function (pdf) takes its maximum value. 1300*cda5da8dSAndroid Build Coastguard Worker """ 1301*cda5da8dSAndroid Build Coastguard Worker return self._mu 1302*cda5da8dSAndroid Build Coastguard Worker 1303*cda5da8dSAndroid Build Coastguard Worker @property 1304*cda5da8dSAndroid Build Coastguard Worker def stdev(self): 1305*cda5da8dSAndroid Build Coastguard Worker "Standard deviation of the normal distribution." 1306*cda5da8dSAndroid Build Coastguard Worker return self._sigma 1307*cda5da8dSAndroid Build Coastguard Worker 1308*cda5da8dSAndroid Build Coastguard Worker @property 1309*cda5da8dSAndroid Build Coastguard Worker def variance(self): 1310*cda5da8dSAndroid Build Coastguard Worker "Square of the standard deviation." 1311*cda5da8dSAndroid Build Coastguard Worker return self._sigma * self._sigma 1312*cda5da8dSAndroid Build Coastguard Worker 1313*cda5da8dSAndroid Build Coastguard Worker def __add__(x1, x2): 1314*cda5da8dSAndroid Build Coastguard Worker """Add a constant or another NormalDist instance. 1315*cda5da8dSAndroid Build Coastguard Worker 1316*cda5da8dSAndroid Build Coastguard Worker If *other* is a constant, translate mu by the constant, 1317*cda5da8dSAndroid Build Coastguard Worker leaving sigma unchanged. 1318*cda5da8dSAndroid Build Coastguard Worker 1319*cda5da8dSAndroid Build Coastguard Worker If *other* is a NormalDist, add both the means and the variances. 1320*cda5da8dSAndroid Build Coastguard Worker Mathematically, this works only if the two distributions are 1321*cda5da8dSAndroid Build Coastguard Worker independent or if they are jointly normally distributed. 1322*cda5da8dSAndroid Build Coastguard Worker """ 1323*cda5da8dSAndroid Build Coastguard Worker if isinstance(x2, NormalDist): 1324*cda5da8dSAndroid Build Coastguard Worker return NormalDist(x1._mu + x2._mu, hypot(x1._sigma, x2._sigma)) 1325*cda5da8dSAndroid Build Coastguard Worker return NormalDist(x1._mu + x2, x1._sigma) 1326*cda5da8dSAndroid Build Coastguard Worker 1327*cda5da8dSAndroid Build Coastguard Worker def __sub__(x1, x2): 1328*cda5da8dSAndroid Build Coastguard Worker """Subtract a constant or another NormalDist instance. 1329*cda5da8dSAndroid Build Coastguard Worker 1330*cda5da8dSAndroid Build Coastguard Worker If *other* is a constant, translate by the constant mu, 1331*cda5da8dSAndroid Build Coastguard Worker leaving sigma unchanged. 1332*cda5da8dSAndroid Build Coastguard Worker 1333*cda5da8dSAndroid Build Coastguard Worker If *other* is a NormalDist, subtract the means and add the variances. 1334*cda5da8dSAndroid Build Coastguard Worker Mathematically, this works only if the two distributions are 1335*cda5da8dSAndroid Build Coastguard Worker independent or if they are jointly normally distributed. 1336*cda5da8dSAndroid Build Coastguard Worker """ 1337*cda5da8dSAndroid Build Coastguard Worker if isinstance(x2, NormalDist): 1338*cda5da8dSAndroid Build Coastguard Worker return NormalDist(x1._mu - x2._mu, hypot(x1._sigma, x2._sigma)) 1339*cda5da8dSAndroid Build Coastguard Worker return NormalDist(x1._mu - x2, x1._sigma) 1340*cda5da8dSAndroid Build Coastguard Worker 1341*cda5da8dSAndroid Build Coastguard Worker def __mul__(x1, x2): 1342*cda5da8dSAndroid Build Coastguard Worker """Multiply both mu and sigma by a constant. 1343*cda5da8dSAndroid Build Coastguard Worker 1344*cda5da8dSAndroid Build Coastguard Worker Used for rescaling, perhaps to change measurement units. 1345*cda5da8dSAndroid Build Coastguard Worker Sigma is scaled with the absolute value of the constant. 1346*cda5da8dSAndroid Build Coastguard Worker """ 1347*cda5da8dSAndroid Build Coastguard Worker return NormalDist(x1._mu * x2, x1._sigma * fabs(x2)) 1348*cda5da8dSAndroid Build Coastguard Worker 1349*cda5da8dSAndroid Build Coastguard Worker def __truediv__(x1, x2): 1350*cda5da8dSAndroid Build Coastguard Worker """Divide both mu and sigma by a constant. 1351*cda5da8dSAndroid Build Coastguard Worker 1352*cda5da8dSAndroid Build Coastguard Worker Used for rescaling, perhaps to change measurement units. 1353*cda5da8dSAndroid Build Coastguard Worker Sigma is scaled with the absolute value of the constant. 1354*cda5da8dSAndroid Build Coastguard Worker """ 1355*cda5da8dSAndroid Build Coastguard Worker return NormalDist(x1._mu / x2, x1._sigma / fabs(x2)) 1356*cda5da8dSAndroid Build Coastguard Worker 1357*cda5da8dSAndroid Build Coastguard Worker def __pos__(x1): 1358*cda5da8dSAndroid Build Coastguard Worker "Return a copy of the instance." 1359*cda5da8dSAndroid Build Coastguard Worker return NormalDist(x1._mu, x1._sigma) 1360*cda5da8dSAndroid Build Coastguard Worker 1361*cda5da8dSAndroid Build Coastguard Worker def __neg__(x1): 1362*cda5da8dSAndroid Build Coastguard Worker "Negates mu while keeping sigma the same." 1363*cda5da8dSAndroid Build Coastguard Worker return NormalDist(-x1._mu, x1._sigma) 1364*cda5da8dSAndroid Build Coastguard Worker 1365*cda5da8dSAndroid Build Coastguard Worker __radd__ = __add__ 1366*cda5da8dSAndroid Build Coastguard Worker 1367*cda5da8dSAndroid Build Coastguard Worker def __rsub__(x1, x2): 1368*cda5da8dSAndroid Build Coastguard Worker "Subtract a NormalDist from a constant or another NormalDist." 1369*cda5da8dSAndroid Build Coastguard Worker return -(x1 - x2) 1370*cda5da8dSAndroid Build Coastguard Worker 1371*cda5da8dSAndroid Build Coastguard Worker __rmul__ = __mul__ 1372*cda5da8dSAndroid Build Coastguard Worker 1373*cda5da8dSAndroid Build Coastguard Worker def __eq__(x1, x2): 1374*cda5da8dSAndroid Build Coastguard Worker "Two NormalDist objects are equal if their mu and sigma are both equal." 1375*cda5da8dSAndroid Build Coastguard Worker if not isinstance(x2, NormalDist): 1376*cda5da8dSAndroid Build Coastguard Worker return NotImplemented 1377*cda5da8dSAndroid Build Coastguard Worker return x1._mu == x2._mu and x1._sigma == x2._sigma 1378*cda5da8dSAndroid Build Coastguard Worker 1379*cda5da8dSAndroid Build Coastguard Worker def __hash__(self): 1380*cda5da8dSAndroid Build Coastguard Worker "NormalDist objects hash equal if their mu and sigma are both equal." 1381*cda5da8dSAndroid Build Coastguard Worker return hash((self._mu, self._sigma)) 1382*cda5da8dSAndroid Build Coastguard Worker 1383*cda5da8dSAndroid Build Coastguard Worker def __repr__(self): 1384*cda5da8dSAndroid Build Coastguard Worker return f'{type(self).__name__}(mu={self._mu!r}, sigma={self._sigma!r})' 1385*cda5da8dSAndroid Build Coastguard Worker 1386*cda5da8dSAndroid Build Coastguard Worker def __getstate__(self): 1387*cda5da8dSAndroid Build Coastguard Worker return self._mu, self._sigma 1388*cda5da8dSAndroid Build Coastguard Worker 1389*cda5da8dSAndroid Build Coastguard Worker def __setstate__(self, state): 1390*cda5da8dSAndroid Build Coastguard Worker self._mu, self._sigma = state 1391