load(n_samples=100,
shape=( 30, 30, 1) ,
r2=0.75,
sigma_spatial_smoothing=1,
obj_pix_ratio=2.0,
model=' independant ' ,
random_seed=None)
| source code
|
Generates regression samples (images + target variable) and beta.
The covariance structure is controlled both at a pixel
level (spatial smoothing) and object level. Objects are groups
of pixels sharing a covariance that stem from a latent variable.
beta is non null within objects (default is five dots).
Then y is obtained with y = X * beta + noise, where beta is scalled such
that r_square(y, X * beta) = r2 and noise is sampled according to N(0, 1).
Parameters
----------
n_samples: Integer. Number of samples. Default is 100.
shape: Tuple or list of sample shape. Default is (30, 30, 1).
r2: Float. The desire R-squared (explained variance) ie.:
r_square(y, X * beta) = r2 (Default is .75)
sigma_spatial_smoothing: Float. Standard deviation for Gaussian kernel
(default is 1). High value promotes spatial correlation pixels.
model: string or a dict (default "independant")
If model is "independant":
# Each point has an independant latent
l1=1., l2=1., l3=1., l4=1., l5=1.,
# No shared variance
l12=0., l45=0., l12345=0.,
# Five dots contribute equally
b1=1., b2=1., b3=1., b4=-1., b5=-1.
if model is a dictionary:
update (overwrite) independant model by dictionnary parameter
Example set betas of points 4 and 5 to 1
dict(b4=1., b5=1.)
If model is "redundant":
# Point-level signal in dots 1 an 2 stem from shared latent
l1=0., l2=0., l12 =1.,
# l3 is independant
l3=1.,
# Point-level signal in dots 4 an 5 stem from shared latent
l4=0., l5=0., l45=1.,
# No global shared variance
l12345 = 0.,
# Five dots contribute equally
b1=1., b2=1., b3=1., b4=-1., b5=-1.
If model is "suppressor":
# Point-level signal in dot 2 stem only from shared latent
l1=1, l2=0., l12=1.,
# l3 is independant
l3 = 1.,
# Point-level signal in dot 5 stem from shared latent
l4=1., l5=0., l45=1.,
# No global shared variance
l12345 = 0.,
# Dot 2 suppresses shared signal with dot 1, dot 5 suppresses dot 4
b1=1., b2=-1., b3=1., b4=1., b5=-1.
y = X1 - X2 + X3 + X4 - X5 + noise
y = l1 + l12 - l12 + l3 + l4 + l45 - l45 + noise
y = l1 + l3 + l4 + noise
So pixels of X2 and X5 are not correlated with the target y so they
will not be detected by univariate analysis. However, they
are usefull since they are suppressing unwilling variance that stem
from latents l12 and l45.
obj_pix_ratio: Float. Controls the ratio between object-level signal
and pixel-level signal for pixels within objects. If
obj_pix_ratio == 1 then 100% of the signal of pixels within
the same object is shared (ie.: no pixel level) signal. If
obj_pix_ratio == 0 then all the signal is pixel specific.
High obj_pix_ratio promotes spatial correlation between
pixels of the same object.
random_seed: None or integer. See numpy.random.seed(). If not None, it can
be used to obtain reproducable samples.
Returns
-------
X3d: Numpy array of shape [n_sample, shape]. The input features.
y: Numpy array of shape [n_sample, 1]. The target variable.
beta3d: Numpy array of shape [shape,]. It is the beta such that
y = X * beta + noise.
Details
-------
The general procedure is:
1) For each pixel i, Generate independant variables Xi ~ N(0, 1)
2) Add object level structure corresponding to the five dots:
- Sample five latent variables ~ N(0, 1): l1, l3, l4, l12, l45,
l12345.
l1: latent (shared variance) for all pixels of point 1.
...
l5: latent (shared variance) for all pixels of point 5.
l12: latent (shared variance) for all pixels of point 1 and 2.
l45: latent (shared variance) for all pixels of point 4 and 5 .
l12345: latent (shared variance) for all pixels of point 1, 2, 3, 4
and 5.
- Pixel i of dots X1, X2, X3, X4, X5 are sampled as:
X1i = l1 + l12 + l12345 + Xi
X2i = l2 + l12 + l12345 + Xi
X3i = l3 + l12345 + Xi
X4i = l4 + l45 + l12345 + Xi
X5i = l5 + l45 + l12345 + Xi
Note that:
Pixels of dot X1 share a common variance that stem from l1, l12
and l12345
Pixels of dot X2 share a common variance that stem from l1, l12
and l12345
Pixels of dot X1 and pixel of dot X2 share a common variance that
stem from l12.
etc.
4) Spatial Smoothing.
5) Model: y = X beta + noise
- Betas are null outside dots, and b1, b2, b3, b4, b5 within dots
- Sample noise ~ N(0, 1)
- Compute X beta then scale beta such that: r_squared(y, X beta) = r2
Return X, y, beta
Examples
--------
>>> import numpy as np
>>> import matplotlib.pyplot as plot
>>> from parsimony import datasets
>>> n_samples = 100
>>> shape = (11, 11, 1)
>>> X3d, y, beta3d = datasets.regression.dice5.load(n_samples=n_samples,
... shape=shape, r2=.5, random_seed=1)
|