load(size=[ [ 100, 100] ] ,
rho=[ 0.05] ,
delta=0.1,
eps=None,
density=0.5,
snr=100.0,
locally_smooth=False)
| source code
|
Generates random data for regression purposes. Builds data with a
regression model on the form
y = X.beta + e.
Parameters
----------
size : A list or a list of lists. The shapes of the block matrices to
generate. The numbers of rows must be the same.
rho : A scalar or a list of the average correlation between off-diagonal
elements of S.
delta : Baseline noise between groups. Only used if the number of groups is
greater than one and locally_smooth=False. The baseline noise is
computed as
delta * rho_min,
and you must prvide a delta such that 0 <= delta < 1.
eps : Maximum entry-wise random noise. This parameter determines the
distribution of the noise. The noise is approximately normally
distributed. If locally_smooth=False the mean is
delta * rho_min
and the variance is
(eps * (1 - max(rho))) ** 2.0 / 10.
If locally_smooth=True, the mean is zero and the variance is
(eps * (1.0 - max(rho)) / (1.0 + max(rho))) ** 2.0 / 10.
You can thus control the noise by this parameter, but note that you
must have
0 <= eps < 1.
density : Determines how much of the regression vector is set to zero. If
density=1.0, the regression vector is dense and if density=0.0
would mean a zero vector. However, note that you should let
density * p >= 1,
where p is the number of columns in size.
snr : The signal-to-noise ratio. The dependent variable is computed as
y = X.beta + e
and Var(e) = (||X.beta||² / (n - 1)) / snr.
locally_smooth : If True, uses ToeplitzCorrelation (with "local
smoothing"); if False, uses ConstantCorrelation.
Returns
-------
X : The matrix of independent variables.
y : The dependent variable.
beta : The regression vector.
e : The noise/residual vector.
|