The PLSR modeling approach

Partial least squares regression of biological data

Arthur Goldsipe

Partial least squares regression (PLS-regresion or PLSR) has proven to be a powerful data-driven modeling technique for many biological problems. We typically use PLSR to develop a model that predicts responses from signals (part of the cue-signal-response paradigm).

What is PLSR?

PLSR is a multivariate regression technique that easily handles many correlated and/or noisy variables. PLSR works by iteratively finding a multidimensional direction in the X (input or signal) space that explains the maximum multidimensional variance direction in the Y (output or response) space. Like PCA (principal component analysis), it reduces dimensionality of by identifying latent or hidden variables that correspond to eigenvectors in an eigenvalue problem.

What are some advantages of PLSR?

  • It can be used when there are many signal measurements.
  • It can handle and identify correlations between signals.
  • It can handle modest amounts of missing data (and estimate the missing values).
  • It can identify outliers.
  • It can identify similarities between treatment conditions or observations.
  • It can identify important signal measurements, which may be useful for model reduction or future experimental design.

Software tools

  • SBPipeline's DataRail: A toolbox for handling biological data that includes support for PLSR
  • The N-way Toolbox: A free MATLAB toolbox for several multivariate models, including a multi-way generalization of PLSR
  • The Multi-block Toolbox: A free MATLAB toolbox for multi-block PLSR
  • SIMCA-P: A commercial software tool for performing PCA and PLSR in an interactive, GUI-based environment
  • PLSR_Toolbox/Solo: A commercial MATLAB toolbox and a stand-alone software tool for PLSR
  • MVARTOOLS: A free, basic MATLAB toolbox for PCA, PCR, and PLSR

Tutorials and Reviews

This short MATLAB tutorial demonstrates the basics of performing and analyzing PLSR. An accompanying project for SIMCA-P is also available.

A nice introductory paper by Svante Wold and Nouna Kettaneh of Umetrics was presented at COMPSTAT 2004. The paper and presentation are both freely availble courtesy of Umetrics.

References

  1. Albeck, J., Macbeath, G., White, F., Sorger, P., Lauffenburger, D., and Gaudet, S. (2006). Collecting and organizing systematic sets of protein data. Nature Reviews Molecular Cell Biology 7, 803-812.
  2. Bro, R. (1996). Multiway calibration. Multilinear PLS. Journal of Chemometrics 10, 47-61.
  3. Gaudet, S., Janes, K. A., Albeck, J. G., Pace, E. A., Lauffenburger, D. A., and Sorger, P. K. (2005). A compendium of signals and responses triggered by prodeath and prosurvival cytokines. Mol Cell Proteomics 4, 1569-1590.
  4. Geladi, P., and Kowalski, B. (1986). Partial least-squares regression: a tutorial. Analytica Chimica Acta 185, 1-17.
  5. Janes, K., Kelly, J., Gaudet, S., Albeck, J., Sorger, P., and Lauffenburger, D. (2004). Cue-Signal-Response Analysis of TNF-Induced Apoptosis by Partial Least Squares Regression of Dynamic Multivariate Data. Journal of Computational Biology 11, 544-561.
  6. Janes, K. (2005) Quantitative analysis of the cytokine-mediated apoptosis-survival cell decision process.
  7. Janes, K., Albeck, J., Gaudet, S., Sorger, P., Lauffenburger, D., and Yaffe, M. (2005). A Systems Model of Signaling Identifies a Molecular Basis Set for Cytokine-Induced Apoptosis. Science 310, 1646-1653.
  8. Janes, K., and Yaffe, M. (2006). Data-driven modelling of signal-transduction networks. Nature Reviews Molecular Cell Biology 7, 820-828.
  9. Janes, K. A., and Lauffenburger, D. A. (2006). A biological approach to computational models of proteomic networks. Curr Opin Chem Biol 10, 73-80.
  10. Jaqaman, K., and Danuser, G. (2006). Linking data to models: data regression. Nature Reviews Molecular Cell Biology 7, 813-819.
  11. Kumar, N., Wolf-Yadlin, A., White, F. M., and Lauffenburger, D. A. (2007). Modeling HER2 Effects on Cell Behavior from Mass Spectrometry Phosphotyrosine Data. PLoS Comput Biol 3.
  12. Martens, H., and Martens, M. (2001). Multivariate Analysis of Quality. An Introduction. Measurement Science and Technology 12, 1746-1746.
  13. Wold, S., and Kettaneh, N. (2004). The PLS method (partial least squares projections to latent structures) and its applications in industrial RDP (research, development, and production), COMPSTAT 2004, 16th Symposium of IASC, August 23-27, 2004.

©2008 Cell Decision Process Center all rights reserved
This page last modified on 2008-02-06