parsimony.utils.stats

multivariate_normal(mu, Sigma, n=1)

Generates n random vectors from the multivariate normal distribution
with mean mu and covariance matrix Sigma.

This function is faster (roughly 11 times faster for a 600-by-4000 matrix
on my computer) than numpy.random.multivariate_normal. This method differs
from  numpy's function in that it uses the Cholesky factorisation. Note
that this requires the covariance matrix to be positive definite, as
opposed to positive semi-definite in the numpy case.

See details at: https://en.wikipedia.org/wiki/
Multivariate_normal_distribution#Drawing_values_from_the_distribution

Parameters
----------
mu : Numpy array, shape (n, 1). The mean vector.

Sigma : Numpy array, shape (p, p). The covariance matrix.

n : Integer. The number of multivariate normal vectors to generate.

Example
-------
>>> import parsimony.utils.stats as stats
>>> import numpy as np
>>> np.random.seed(42)
>>> n, p = 50000, 100
>>> mu = np.random.rand(p, 1)
>>> alpha = 0.01
>>> Sigma = alpha * np.random.rand(p, p) + (1 - alpha) * np.eye(p, p)
>>> M = stats.multivariate_normal(mu, Sigma, n)
>>> mean = np.mean(M, axis=0)
>>> S = np.dot((M - mean).T, (M - mean)) * (1.0 / float(n - 1))
>>> round(np.linalg.norm(Sigma - S), 14)
0.51886218849785

sensitivity(cond, test)

source code

A test's ability to identify a condition correctly.

Also called true positive rate or recall.

Parameters
----------
cond : Numpy array, boolean or 0/1 integer. The "true", known condition.

test : Numpy array, boolean or 0/1 integer. The estimated outcome.

Example
-------
>>> import parsimony.utils.stats as stats
>>> import numpy as np
>>> np.random.seed(42)
>>> p = 2030
>>> cond = np.zeros((p, 1))
>>> test = np.zeros((p, 1))
>>> stats.sensitivity(cond, test)
1.0
>>> stats.sensitivity(cond, np.logical_not(test))
1.0
>>> cond[:30] = 1.0
>>> test[:30] = 1.0
>>> test[:10] = 0.0
>>> test[-180:] = 1.0
>>> round(stats.sensitivity(cond, test), 2)
0.67

specificity(cond, test)

source code

A test's ability to exclude a condition correctly.

Also called true negative rate.

Parameters
----------
cond : Numpy array, boolean or 0/1 integer. The "true", known condition.

test : Numpy array, boolean or 0/1 integer. The estimated outcome.

Example
-------
>>> import parsimony.utils.stats as stats
>>> import numpy as np
>>> np.random.seed(42)
>>> p = 2030
>>> cond = np.zeros((p, 1))
>>> test = np.zeros((p, 1))
>>> stats.specificity(cond, test)
1.0
>>> stats.specificity(cond, np.logical_not(test))
0.0
>>> cond[:30] = 1.0
>>> test[:30] = 1.0
>>> test[:10] = 0.0
>>> test[-180:] = 1.0
>>> round(stats.specificity(cond, test), 2)
0.91

ppv(cond, test)

source code

A test's ability to correctly identify positive outcomes.

Short for positive predictive value. Also called precision.

Parameters
----------
cond : Numpy array, boolean or 0/1 integer. The "true", known condition.

test : Numpy array, boolean or 0/1 integer. The estimated outcome.

Example
-------
>>> import parsimony.utils.stats as stats
>>> import numpy as np
>>> np.random.seed(42)
>>> p = 2030
>>> cond = np.zeros((p, 1))
>>> test = np.zeros((p, 1))
>>> stats.ppv(cond, test)
1.0
>>> stats.ppv(cond, np.logical_not(test))
0.0
>>> cond[:30] = 1.0
>>> test[:30] = 1.0
>>> test[:10] = 0.0
>>> test[-180:] = 1.0
>>> round(stats.ppv(cond, test), 2)
0.1

npv(cond, test)

source code

A test's ability to correctly identify negative outcomes.

The negative predictive value, NPV.

Parameters
----------
cond : Numpy array, boolean or 0/1 integer. The "true", known condition.

test : Numpy array, boolean or 0/1 integer. The estimated outcome.

Example
-------
>>> import parsimony.utils.stats as stats
>>> import numpy as np
>>> np.random.seed(42)
>>> p = 2030
>>> cond = np.zeros((p, 1))
>>> test = np.zeros((p, 1))
>>> stats.npv(cond, test)
1.0
>>> stats.npv(cond, np.logical_not(test))
1.0
>>> cond[:30] = 1.0
>>> test[:30] = 1.0
>>> test[:10] = 0.0
>>> test[-180:] = 1.0
>>> round(stats.npv(cond, test), 3)
0.995

accuracy(cond, test)

source code

The degree of correctly estimated outcomes.

Parameters
----------
cond : Numpy array, boolean or 0/1 integer. The "true", known condition.

test : Numpy array, boolean or 0/1 integer. The estimated outcome.

Example
-------
>>> import parsimony.utils.stats as stats
>>> import numpy as np
>>> np.random.seed(42)
>>> p = 2030
>>> cond = np.zeros((p, 1))
>>> test = np.zeros((p, 1))
>>> stats.accuracy(cond, test)
1.0
>>> stats.accuracy(cond, np.logical_not(test))
0.0
>>> cond[:30] = 1.0
>>> test[:30] = 1.0
>>> test[:10] = 0.0
>>> test[-180:] = 1.0
>>> round(stats.accuracy(cond, test), 2)
0.91

F_score(cond, test)

source code

A measure of a test's accuracy by a weighted average of the precision
and sensitivity.

Also called the harmonic mean of precision and sensitivity.

Parameters
----------
cond : Numpy array, boolean or 0/1 integer. The "true", known condition.

test : Numpy array, boolean or 0/1 integer. The estimated outcome.

Example
-------
>>> import parsimony.utils.stats as stats
>>> import numpy as np
>>> np.random.seed(42)
>>> p = 2030
>>> cond = np.zeros((p, 1))
>>> test = np.zeros((p, 1))
>>> stats.F_score(cond, test)
1.0
>>> stats.F_score(cond, np.logical_not(test))
0.0
>>> cond[:30] = 1.0
>>> test[:30] = 1.0
>>> test[:10] = 0.0
>>> test[-180:] = 1.0
>>> round(stats.F_score(cond, test), 2)
0.17

alpha(cond, test)

source code

False positive rate or type I error.

Parameters
----------
cond : Numpy array, boolean or 0/1 integer. The "true", known condition.

test : Numpy array, boolean or 0/1 integer. The estimated outcome.

Example
-------
>>> import parsimony.utils.stats as stats
>>> import numpy as np
>>> np.random.seed(42)
>>> p = 2030
>>> cond = np.zeros((p, 1))
>>> test = np.zeros((p, 1))
>>> stats.alpha(cond, test)
0.0
>>> stats.alpha(cond, np.logical_not(test))
1.0
>>> cond[:30] = 1.0
>>> test[:30] = 1.0
>>> test[:10] = 0.0
>>> test[-180:] = 1.0
>>> round(stats.alpha(cond, test), 2)
0.09

beta(cond, test)

source code

False negative rate or type II error.

Parameters
----------
cond : Numpy array, boolean or 0/1 integer. The "true", known condition.

test : Numpy array, boolean or 0/1 integer. The estimated outcome.

Example
-------
>>> import parsimony.utils.stats as stats
>>> import numpy as np
>>> np.random.seed(42)
>>> p = 2030
>>> cond = np.zeros((p, 1))
>>> test = np.zeros((p, 1))
>>> stats.beta(cond, test)
0.0
>>> stats.beta(cond, np.logical_not(test))
0.0
>>> cond[:30] = 1.0
>>> test[:30] = 1.0
>>> test[:10] = 0.0
>>> test[-180:] = 1.0
>>> round(stats.beta(cond, test), 2)
0.33

power(cond, test)

source code

Statistical power for a test. The probability that it correctly rejects
the null hypothesis when the null hypothesis is false.

Parameters
----------
cond : Numpy array, boolean or 0/1 integer. The "true", known condition.

test : Numpy array, boolean or 0/1 integer. The estimated outcome.

Example
-------
>>> import parsimony.utils.stats as stats
>>> import numpy as np
>>> np.random.seed(42)
>>> p = 2030
>>> cond = np.zeros((p, 1))
>>> test = np.zeros((p, 1))
>>> stats.power(cond, test)
1.0
>>> stats.power(cond, np.logical_not(test))
1.0
>>> cond[:30] = 1.0
>>> test[:30] = 1.0
>>> test[:10] = 0.0
>>> test[-180:] = 1.0
>>> round(stats.power(cond, test), 2)
0.67

likelihood_ratio_positive(cond, test)

source code

Assesses the value of performing a diagnostic test for the positive
outcome.

Parameters
----------
cond : Numpy array, boolean or 0/1 integer. The "true", known condition.

test : Numpy array, boolean or 0/1 integer. The estimated outcome.

Example
-------
>>> import parsimony.utils.stats as stats
>>> import numpy as np
>>> np.random.seed(42)
>>> p = 2030
>>> cond = np.zeros((p, 1))
>>> test = np.zeros((p, 1))
>>> stats.likelihood_ratio_positive(cond, test)
inf
>>> stats.likelihood_ratio_positive(cond, np.logical_not(test))
1.0
>>> cond[:30] = 1.0
>>> test[:30] = 1.0
>>> test[:10] = 0.0
>>> test[-180:] = 1.0
>>> round(stats.likelihood_ratio_positive(cond, test), 1)
7.4

likelihood_ratio_negative(cond, test)

source code

Assesses the value of performing a diagnostic test for the negative
outcome.

Parameters
----------
cond : Numpy array, boolean or 0/1 integer. The "true", known condition.

test : Numpy array, boolean or 0/1 integer. The estimated outcome.

Example
-------
>>> import parsimony.utils.stats as stats
>>> import numpy as np
>>> np.random.seed(42)
>>> p = 2030
>>> cond = np.zeros((p, 1))
>>> test = np.zeros((p, 1))
>>> stats.likelihood_ratio_negative(cond, test)
0.0
>>> stats.likelihood_ratio_negative(cond, np.logical_not(test))
inf
>>> cond[:30] = 1.0
>>> test[:30] = 1.0
>>> test[:10] = 0.0
>>> test[-180:] = 1.0
>>> round(stats.likelihood_ratio_negative(cond, test), 2)
0.37

fleiss_kappa(W, k)

source code

Computes Fleiss' kappa for a set of variables classified into k categories by a number of different raters.

W is a matrix with shape (variables, raters) with k categories between 0, ..., k - 1.

Module stats

multivariate_normal(mu, Sigma, n=1)

sensitivity(cond, test)

specificity(cond, test)

ppv(cond, test)

precision(*args, **kwargs)

npv(cond, test)

accuracy(cond, test)

F_score(cond, test)

alpha(cond, test)

beta(cond, test)

power(cond, test)

likelihood_ratio_positive(cond, test)

likelihood_ratio_negative(cond, test)

fleiss_kappa(W, k)