.. _tutorials:


General framework
=================

Parsimony comprise three principal parts: **algorithms**, **functions** and
**estimators**.

* **functions** define the loss functions and penalties, and combination of those, that are to be minimised. These represent the underlying mathematical problem formulations.

* **algorithms** define minimising algorithms (e.g., Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) [FISTA2009]_, COntinuation of NESTerov's smoothing Algorithm (CONESTA), Excessive Gap Method, etc.) that are used to minimise a given function.

* **estimators** define a combination of functions and an algorithm. This is the higher-level entry-point for the user. Estimators combine a function (loss function plus penalties) with one or more algorithms that can be used to minimise the given function.

Parsimony currently comprise the following parts:

* Functions

  * Loss functions

    * Linear regression
    * Logistic regression

  * Penalties

    * L1 (Lasso)
    * L2 (Ridge)
    * Total Variation (TV)
    * Overlapping group lasso (GL)

* Algorithms

  * Iterative Shrinkage-Thresholding Algorithm (ISTA)
  * Fast Iterative Shrinkage-Thresholding Algorithm (FISTA)
  * COntinuation of NESTerov's smoothing Algorithm (CONESTA)
  * Excessive Gap Method

* Estimators

  * LinearRegression
  * Lasso
  * ElasticNet
  * LinearRegressionL1L2TV
  * LinearRegressionL1L2GL
  * LogisticRegressionL1L2TV
  * LogisticRegressionL1L2GL
  * LinearRegressionL2SmoothedL1TV

Estimators
==========

Simulated dataset for regression
--------------------------------

We build a simple simulated dataset for the regression problem
:math:`y = X \beta + noise`:

.. code-block:: python

    import numpy as np
    import parsimony.utils.start_vectors as start_vectors
    np.random.seed(42)
    # Three-dimensional matrix is defined as:
    shape = (4, 4, 4)
    # The number of samples is defined as:
    num_samples = 50
    # The number of features per sample is defined as:
    num_ft = shape[0] * shape[1] * shape[2]
    # Define X randomly as simulated data
    X = np.random.rand(num_samples, num_ft)
    # Define beta randomly
    start_vector = start_vectors.RandomStartVector(normalise=False,
                                                   limits=(-1, 1))
    beta = start_vector.get_vector(num_ft)
    beta = np.sort(beta, axis=0)
    beta[np.abs(beta) < 0.2] = 0.0
    # Define y by adding noise
    y = np.dot(X, beta) + 0.1 * np.random.randn(num_samples, 1)

In later sessions, we want to discover :math:`\beta` using different loss
functions.

Linear regression
------------------

Knowing :math:`X` and :math:`y`, we want to find :math:`\beta` by
minimizing the OLS loss function. Note that the defaulyt option is to minimize the mean squared error and not the sum squared error.

.. math::

   \min \frac{1}{n}\|y - X\beta\|^2_2

.. code-block:: python

    import parsimony.estimators as estimators
    ols_estimator = estimators.LinearRegression()
    ols_estimator.fit(X, y)
    print "Estimated beta error =", np.linalg.norm(ols_estimator.beta - beta)

Ridge regression (L2 penalty)
-----------------------------

We add an :math:`\ell_2` constraint with ridge regression coefficient
:math:`l_2=0.1` and minimise

.. math::

   \min\left(\frac{1}{2 n}\|y - X\beta\|_2^2 + \frac{l_2}{2}\|\beta\|_2^2\right)

.. code-block:: python

    import parsimony.estimators as estimators
    l2 = 0.1  # l2 ridge regression coefficient
    ridge_estimator = estimators.RidgeRegression(l2)
    ridge_estimator.fit(X, y)
    print "Estimated beta error =", np.linalg.norm(ridge_estimator.beta - beta)

Lasso regression (L1 penalty)
-----------------------------

Similarly, you can use an :math:`\ell_1` penalty and minimise

.. math::

   \min\left(\frac{1}{2 n}\|y - X\beta\|_2^2 + l_1\|\beta\|_1\right).

.. code-block:: python

    import parsimony.estimators as estimators
    l1 = 0.1  # l1 lasso coefficient
    lasso_estimator = estimators.Lasso(l1)
    lasso_estimator.fit(X, y)
    print "Estimated beta error =", np.linalg.norm(lasso_estimator.beta - beta)

Elastic net regression (L1 + L2 penalties)
------------------------------------------

You can combine :math:`\ell_1` and :math:`\ell_2` penalties with coefficients :math:`\alpha=0.1` (global penalty) and :math:`l=0.1` (:math:`\ell_1` ratio) and minimise

.. math::

   \min\left(\frac{1}{2 n}\|y - X\beta\|_2^2 + \alpha\left(\frac{(1 - l)}{2}\|\beta\|_2^2 + l\cdot \|\beta\|_1\right)\right).

.. code-block:: python

    import parsimony.estimators as estimators
    alpha = 0.1  # global penalty
    l = 0.1  # l1 ratio (lasso)
    enet_estimator = estimators.ElasticNet(l=l, alpha=alpha)
    enet_estimator.fit(X, y)
    print "Estimated beta error =", np.linalg.norm(enet_estimator.beta - beta)

Elastic net regression + TV (L1 + L2 + TV penalties)
----------------------------------------------------

You can combine :math:`\ell_1`, :math:`\ell_2` and TV penalties with coefficients :math:`l_1=0.1`, :math:`l_2=0.1` and  :math:`l_{TV}=0.1` and minimise

.. math::

   \min\left(\frac{1}{2 n}\|y - X\beta\|_2^2 + \frac{l_2}{2} \|\beta\|_2^2 + l_1 \|\beta\|_1 + l_{TV} TV(\beta)\right).

.. code-block:: python

    import parsimony.estimators as estimators
    import parsimony.functions.nesterov.tv as tv_helper
    l1 = 0.1  # l1 penalty
    l2 = 0.1  # l2 penalty
    tv = 0.1  # tv penalty
    A, n_compacts = tv_helper.linear_operator_from_shape(shape)  # Memory allocation for TV
    tvenet_estimator = estimators.LinearRegressionL1L2TV(l1=l1, l2=l2, tv=tv, A=A)
    tvenet_estimator.fit(X, y)
    print "Estimated beta error =", np.linalg.norm(tvenet_estimator.beta - beta)

Elastic net regression + Group Lasso (L1 + L2 + GL penalties)
-------------------------------------------------------------

We change the :math:`\mathrm{TV}` constraint to an overlapping group lasso
constraint, :math:`\mathrm{GL}`, and instead minimise

.. math::

   \min\left(\frac{1}{2 n}\|y - X\beta\|_2^2 + \frac{k}{2}\|\beta\|_2^2 + l\|\beta\|_1 + g\cdot GL(\beta)\right).

.. code-block:: python

    import parsimony.estimators as estimators
    import parsimony.algorithms as algorithms
    import parsimony.functions.nesterov.gl as gl
    k = 0.0  # l2 ridge regression coefficient
    l = 0.1  # l1 lasso coefficient
    g = 0.1  # group lasso coefficient
    groups = [range(0, 2 * num_ft / 3), range(num_ft/ 3, num_ft)]
    A = gl.linear_operator_from_groups(num_ft, groups)
    estimator = estimators.LinearRegressionL1L2GL(
                                          k, l, g, A=A,
                                          algorithm=algorithms.proximal.FISTA(),
                                          algorithm_params=dict(max_iter=1000))
    res = estimator.fit(X, y)
    print "Estimated beta error =", np.linalg.norm(estimator.beta - beta)

Simulated dataset for classication
-----------------------------------

.. code-block:: python

    import numpy as np
    np.random.seed(42)
    # A three-dimensional matrix is defined as:
    shape = (4, 4, 4)
    # The number of samples is defined as:
    num_samples = 50
    # The number of features per sample is defined as:
    num_ft = shape[0] * shape[1] * shape[2]
    # Define X randomly as simulated data
    X = np.random.rand(num_samples, num_ft)
    # Define y as zeros or ones
    y = np.random.randint(0, 2, (num_samples, 1))


Logistic regression
-------------------

Knowing :math:`X` and :math:`y`, we want to find the weight vector
:math:`\beta` by minimizing the logistic regression loss function

.. math::

   \min \frac{1}{n}\sum_{i=1}^n\log(1 + e^{-y_i(\beta^Tx_i)})


.. code-block:: python

    import parsimony.estimators as estimators
    lr_estimator = estimators.LogisticRegression()
    lr_estimator.fit(X, y)
    print "Estimated prediction rate =", lr_estimator.score(X, y)

Ridge logistic regression (L2 penalty)
--------------------------------------

We add an :math:`\ell_2` constraint with ridge coefficient :math:`l_2=0.1` and
minimise

.. math::

   \min\left(\frac{1}{n}\sum_{i=1}^n\log(1 + e^{-y_i(\beta^Tx_i)}) + \frac{l_2}{2}\|\beta\|_2^2\right).

Note that there is not specfific estimator, but you can use the generic
LogisticRegressionL1L2TV estimator with null :math:`l_1, tv` coefficient

.. code-block:: python

    import parsimony.estimators as estimators
    import parsimony.functions.nesterov.tv as tv_helper
    l2 = 0.1  # l2 ridge regression coefficient
    A, n_compacts = tv_helper.linear_operator_from_shape(shape)
    ridge_lr_estimator = estimators.LogisticRegressionL1L2TV(l1=0, l2=l2, tv=0, A=A)
    ridge_lr_estimator.fit(X, y)
    print "Estimated prediction rate =", ridge_lr_estimator.score(X, y)

Ridge + Lasso logistic regression (L1 + L2 penalties)
-----------------------------------------------------

Similarly, you can add an :math:`\ell_1` constraint and a :math:`\mathrm{TV}`
constraint with coefficients :math:`l_1=0.1` and :math:`l_2=0.1` and instead
minimise:

.. math::

   \min\left(\frac{1}{n}\sum_{i=1}^n\log(1 + e^{-y_i(\beta^Tx_i)}) + \frac{l_2}{2}\|\beta\|_2^2 + l_1\|\beta\|_1 \right).

.. code-block:: python

    import parsimony.estimators as estimators
    import parsimony.functions.nesterov.tv as tv_helper
    l1 = 0.1  # l1 lasso coefficient
    l2 = 0.1  # l2 ridge regression coefficient
    A, n_compacts = tv_helper.linear_operator_from_shape(shape)
    enet_lr_estimator = estimators.LogisticRegressionL1L2TV(l1=l1, l2=l2, tv=0, A=A)
    enet_lr_estimator.fit(X, y)
    print "Estimated prediction rate =", enet_lr_estimator.score(X, y)

Logistic regression with L1 + L2 and TV penalties
-------------------------------------------------

Finally, you can add a :math:`\mathrm{TV}` penalty minimise:

.. math::

   \min\left(\frac{1}{n}\sum_{i=1}^n\log(1 + e^{-y_i(\beta^Tx_i)}) + \frac{l_2}{2}\|\beta\|_2^2 + l_1\|\beta\|_1 + t_{tv}\cdot TV(\beta)\right).

.. code-block:: python

    import parsimony.estimators as estimators
    import parsimony.functions.nesterov.tv as tv_helper
    l1 = 0.1  # l1 lasso coefficient
    l2 = 0.1  # l2 ridge regression coefficient
    tv = 0.1  # l2 ridge regression coefficient
    A, n_compacts = tv_helper.linear_operator_from_shape(shape)
    enettv_lr_estimator = estimators.LogisticRegressionL1L2TV(l1=l1, l2=l2, tv=tv, A=A)
    enettv_lr_estimator.fit(X, y)
    print "Estimated prediction rate =", enettv_lr_estimator.score(X, y)

Logistic regression with L1 + L2 and GL penalties
-------------------------------------------------

We change the :math:`\mathrm{TV}` constraint to an overlapping group lasso
constraint and instead minimise

.. math::

   \min\left(\frac{1}{n}\sum_{i=1}^n\log(1 + e^{-y_i(\beta^Tx_i)}) + \frac{k}{2}\|\beta\|_2^2 + l\|\beta\|_1 + g\cdot GL(\beta)\right)

.. code-block:: python

    import parsimony.estimators as estimators
    import parsimony.algorithms as algorithms
    import parsimony.functions.nesterov.gl as gl
    k = 0.0  # l2 ridge regression coefficient
    l = 0.1  # l1 lasso coefficient
    g = 0.1  # group lasso coefficient
    groups = [range(0, 2 * num_ft / 3), range(num_ft/ 3, num_ft)]
    A = gl.linear_operator_from_groups(num_ft, groups)
    estimator = estimators.LogisticRegressionL1L2GL(
                                          k, l, g, A=A,
                                          algorithm=algorithms.proximal.FISTA(),
                                          algorithm_params=dict(max_iter=1000))
    res = estimator.fit(X, y)
    print "Estimated prediction rate =", estimator.score(X, y)

Algorithms
==========

FISTA
-----

CONESTA
-------

We applied FISTA ([FISTA2009]_) in the previous sections. In this section, we
switch to CONESTA to minimise the function.

.. code-block:: python

    import parsimony.estimators as estimators
    import parsimony.algorithms as algorithms
    import parsimony.functions.nesterov.tv as tv
    k = 0.0  # l2 ridge regression coefficient
    l = 0.1  # l1 lasso coefficient
    g = 0.1  # tv coefficient
    Atv, n_compacts = tv.linear_operator_from_shape(shape)
    tvl1l2_conesta = estimators.LinearRegressionL1L2TV(
                                          k, l, g, A=Atv,
                                          algorithm=algorithms.proximal.CONESTA())
    res = tvl1l2_conesta.fit(X, y)
    print "Estimated beta error =", np.linalg.norm(tvl1l2_conesta.beta - beta)

Excessive gap method
--------------------

The Excessive Gap Method currently only works with the function
"LinearRegressionL2SmoothedL1TV". For this algorithm to work, :math:`k` must be
positive.

.. code-block:: python

    import scipy.sparse as sparse
    import parsimony.functions.nesterov.l1tv as l1tv
    import parsimony.algorithms.primaldual as primaldual
    #Atv, n_compacts = tv.linear_operator_from_shape(shape)
    #Al1 = sparse.eye(num_ft, num_ft)
    A = l1tv.linear_operator_from_shape(shape, num_ft, penalty_start=0)
    Al1 = A[0]
    Atv = A[1:]
    k = 0.05  # ridge regression coefficient
    l = 0.05  # l1 coefficient
    g = 0.05  # tv coefficient
    rr_smoothed_l1_tv = estimators.LinearRegressionL2SmoothedL1TV(
                        k, l, g, A=A,
                        algorithm=primaldual.ExcessiveGapMethod(max_iter=1000))
    res = rr_smoothed_l1_tv.fit(X, y)
    print "Estimated beta error =", np.linalg.norm(rr_smoothed_l1_tv.beta - beta)


References
==========
.. [FISTA2009] Amir Beck and Marc Teboulle, A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems, SIAM Journal on Imaging Sciences, 2009.
.. [NESTA2011] Stephen Becker, Jerome Bobin, and Emmanuel J. Candes, NESTA: A Fast and Accurate First-Order Method for Sparse Recovery, SIAM Journal on Imaging Sciences, 2011.