parsimony.datasets.regression.dice5

load(n_samples=100, shape=`(`30`,` 30`,` 1`)`, r2=0.75, sigma_spatial_smoothing=1, obj_pix_ratio=2.0, model=`'independant'`, random_seed=None)

Generates regression samples (images + target variable) and beta.

The covariance structure is controlled both at a pixel
level (spatial smoothing) and object level. Objects are groups
of pixels sharing a covariance that stem from a latent variable.
beta is non null within objects (default is five dots).
Then y is obtained with y = X * beta + noise, where beta is scalled such
that r_square(y, X * beta) = r2 and noise is sampled according to N(0, 1).

Parameters
----------
n_samples: Integer. Number of samples. Default is 100.

shape: Tuple or list of sample shape. Default is (30, 30, 1).

r2: Float. The desire R-squared (explained variance) ie.:
        r_square(y, X * beta) = r2 (Default is .75)

sigma_spatial_smoothing: Float. Standard deviation for Gaussian kernel
        (default is 1). High value promotes spatial correlation pixels.

model:  string or a dict (default "independant")
    If model is "independant":
        # Each point has an independant latent
        l1=1., l2=1., l3=1., l4=1., l5=1.,
        # No shared variance
        l12=0., l45=0., l12345=0.,
        # Five dots contribute equally
        b1=1., b2=1., b3=1., b4=-1., b5=-1.

    if model is a dictionary:
        update (overwrite) independant model by dictionnary parameter
        Example set betas of points 4 and 5 to 1
        dict(b4=1., b5=1.)

    If model is "redundant":
        # Point-level signal in dots 1 an 2 stem from shared latent
        l1=0., l2=0., l12 =1.,
        # l3 is independant
        l3=1.,
        # Point-level signal in dots 4 an 5 stem from shared latent
        l4=0., l5=0., l45=1.,
        # No global shared variance
        l12345 = 0.,
        # Five dots contribute equally
        b1=1., b2=1., b3=1., b4=-1., b5=-1.

    If model is "suppressor":
        # Point-level signal in dot 2 stem only from shared latent
        l1=1, l2=0., l12=1.,
        # l3 is independant
        l3 = 1.,
        # Point-level signal in dot 5 stem from shared latent
        l4=1., l5=0., l45=1.,
        # No global shared variance
        l12345 = 0.,
        # Dot 2 suppresses shared signal with dot 1, dot 5 suppresses dot 4
        b1=1., b2=-1., b3=1., b4=1., b5=-1.

        y = X1       - X2  + X3 + X4       - X5  + noise
        y = l1 + l12 - l12 + l3 + l4 + l45 - l45 + noise
        y = l1 + l3 + l4 + noise
        So pixels of X2 and X5 are not correlated with the target y so they
        will not be detected by univariate analysis. However, they
        are usefull since they are suppressing unwilling variance that stem
        from latents l12 and l45.

obj_pix_ratio: Float. Controls the ratio between object-level signal
        and pixel-level signal for pixels within objects. If
        obj_pix_ratio == 1 then 100% of the signal of pixels within
        the same object is shared (ie.: no pixel level) signal. If
        obj_pix_ratio == 0 then all the signal is pixel specific.
        High obj_pix_ratio promotes spatial correlation between
        pixels of the same object.

random_seed: None or integer. See numpy.random.seed(). If not None, it can
        be used to obtain reproducable samples.

Returns
-------
X3d: Numpy array of shape [n_sample, shape]. The input features.

y: Numpy array of shape [n_sample, 1]. The target variable.

beta3d: Numpy array of shape [shape,]. It is the beta such that
        y = X * beta + noise.

Details
-------
The general procedure is:

    1) For each pixel i, Generate independant variables Xi ~ N(0, 1)

    2) Add object level structure corresponding to the five dots:
       - Sample five latent variables ~ N(0, 1): l1, l3, l4, l12, l45,
       l12345.
       l1: latent (shared variance) for all pixels of point 1.
       ...
       l5: latent (shared variance) for all pixels of point 5.
       l12: latent (shared variance) for all pixels of point 1 and 2.
       l45: latent (shared variance) for all pixels of point 4 and 5 .
       l12345: latent (shared variance) for all pixels of point 1, 2, 3, 4
       and 5.

       - Pixel i of dots X1, X2, X3, X4, X5 are sampled as:
         X1i = l1 + l12 + l12345 + Xi
         X2i = l2 + l12 + l12345 + Xi
         X3i = l3 + l12345 + Xi
         X4i = l4 + l45 + l12345 + Xi
         X5i = l5 + l45 + l12345 + Xi
         Note that:
         Pixels of dot X1 share a common variance that stem from l1, l12
         and l12345
         Pixels of dot X2 share a common variance that stem from l1, l12
         and l12345
         Pixels of dot X1 and pixel of dot X2 share a common variance that
         stem from l12.
         etc.

    4) Spatial Smoothing.

    5) Model: y = X beta + noise
    - Betas are null outside dots, and b1, b2, b3, b4, b5 within dots
    - Sample noise ~ N(0, 1)
    - Compute X beta then scale beta such that: r_squared(y, X beta) = r2
    Return X, y, beta


Examples
--------
>>> import numpy as np
>>> import matplotlib.pyplot as plot
>>> from  parsimony import datasets
>>> n_samples = 100
>>> shape = (11, 11, 1)
>>> X3d, y, beta3d = datasets.regression.dice5.load(n_samples=n_samples,
...     shape=shape, r2=.5, random_seed=1)

Module dice5

load(n_samples=100, shape=`(`30`,` 30`,` 1`)`, r2=0.75, sigma_spatial_smoothing=1, obj_pix_ratio=2.0, model=`'independant'`, random_seed=None)

dice_five_with_union_of_pairs(shape)

Module dice5

load(n_samples=100, shape=(30, 30, 1), r2=0.75, sigma_spatial_smoothing=1, obj_pix_ratio=2.0, model='independant', random_seed=None)

dice_five_with_union_of_pairs(shape)

load(n_samples=100, shape=`(`30`,` 30`,` 1`)`, r2=0.75, sigma_spatial_smoothing=1, obj_pix_ratio=2.0, model=`'independant'`, random_seed=None)