Postdoctoral, University of Colorado, 1970; Postdoctoral, University of Minnesota, 1967; Ph.D., Genetics, Universitiy of Wisconsin, 1966; Dip., Genetics, University of Edinburgh, UK, 1962
Protein expression profiles contain a wealth of information about an organism and its environment. The profiles are highly specific and can be reduced to a small number of diagnostic proteins which classify stressful conditions nearly perfectly. Every type and level of stress studied so far results in a specific pattern of protein expression. Indeed so abundant is the information that there are many sets of proteins which yield 90-100% accuracy in diagnosis and prognosis.
Diagnostic proteomics can be descriptive, based on subtraction assays and it may include various types of multivariable analyses. Our earlier work included subtraction assays using imaging software (PDQ, Melanie)(2-5,7-9,13,16). Recently we have begun to treat the proteome as a system with component protein variables, using Artificial Neural Networks (ANN) and statistics (5,6)
The goal in both cases is to separate irrelevant variability (among tissues, samples or organisms) from that among treatments of interest. With imaging the approach is to eliminate such variability by forming composite images including only proteins found in all samples from a treatment and never present in another treatment or control. In effect we are removing the error term. We refer to the result of such a subtraction assay as the Protein Expression Signature (PES) (8). The PESmay contain proteins specifically repressed as well as induced by the treatment.
In ANN and statistical analyses error variation is preserved by using protein data from individual subjects.
Proteins in the PES may be identified with mass spectrometry at this stage or, as we prefer, the analysis can proceed to considering the proteome as a multivariable system.
The data, transformed digitally, enter an Artificial Network model, basically an iterative version of multiple regression, where outputs (treatments) are mapped on inputs (proteins) as in the very simple example shown. The inputs may be in the thousands which makes the method useful for proteins as well as for genomic (microarray) data.
The details of ANN are readily accessible in the literature (Refs 5,14, 15 and many others) A brief account from our perspective will appear in Bradley et al (2008)(6). In essence the ANN model is built on a training set of data where it learns the definitive variables. If it has learned properly then it will correctly classify the output classes in a blind test.
In a final step, currently in a pending patent application (Michael O’Neill), the proteins most potent in diagnosis can be identified by slightly disturbing the system one input at a time and observing the effect on the output values. The net result is a subset of highly predictive protein inputs usable in a simple assay based on corresponding capture molecules in a kit or array.
Beyond ANN, multivariable statistics can be used to test the validate the results (17), or in some cases as a method to indicate if there are significant differences among treatments in the proteomic responses (19, 20). Significance tests can be derived from multiple ANN models. However ANN models assume no correlations among inputs, which may be relevant when constructing weighted sets of protein or genomic indicators.
In a thoughtful review Spiegelman et al (18) have discussed the potential of “chemometrics” and statistics in improving biomarker discovery. They document the weaknesses in some studies, using iterative methods, particularly in validation These are also cited by Wagner (20). The flaws noted are clear and quantitative methods are now seen to add accuracy to diagnostic proteomics.
Some background references on protein expression, genomic and proteomic analysis:
1. Anderson, T.J., Tchernyshyov, I. , Diez, R. , Cole, R.N. , Geman, D. , Dang ,C.V. and Winslow, R.L. (2007). Discovering robust protein biomarkers for disease from relative expression reversals in 2-D DIGE data. Proteomics 7:1197-1207
2. Bond, J.A., C.M. Gonzalez and B.P. Bradley, (1993) Age-dependent expression of heat shock protein in D. magna. Comp. Phy. & Bioc. 106: 93-98.
3. Bradley, B.P. (1993) Measuring phenotypic and genetic variation, demonstrating adaptation. 27th European Marine Biology Symposium, Dublin, Ireland. Sept. 7-11, 1992. JAPAGA, Dublin pp. 3-14.
4. Bradley, B.P., J.-A. Bond, C.M. Gonzalez and B.E. Tepper, (1994) Complex mixture analysis using protein expression as a qualitative and quantitative tool. Env. Tox. and Chem. 13:1043-1050.
5. Bradley, B.P., Brown, D.C., Iamonte, T.N., Boyd, S.M. and O’Neill, M.C., (1996). Protein Patterns and Toxicity Identification, in “Biomarkers and Risk Assessment” ASTM STP 1306, David A. Bengtson and Diane S. Henshel, Eds., American Society for Testing and Materials, Philadelphia.
6. Bradley, B.P., B. Kalampanyl and M.C.O’Neill (2008) Protein Expression Profiling in Methods in Molecular Biology (Eds D. Sheahen and R. Ryther) Humana Press (To appear fall 2008)
7. Bradley, B.P., M.A. Lane and C.M. Gonzalez, (1992). A molecular mechanism of adaptation in an estuarine copepod. Neth. J. Sea Research, 30:1-6.
8. Bradley, B.P., E.A. Shrader, D.G. Kimmel and J.C.Meiller. (2002). Protein Expression Signatures: an application of proteomics. Marine Environmental Res. 54: 373-377.
9. Brown, D.C. and B.P. Bradley, (1995). Genetic and physiological regulation of HSP70. Marine Env. Res. 39: 181-184.
10. Choi,YL, Tsukasaki,K.,O’Neill MC et al (2007) A genomic analysis of adult T cell Leukemia. Oncogene 26: 1245-1255
11. Cowan,M.L. and J.Vera (2008) Proteomics: Advances in Biomarker Discovery. Expert Rev. Proteomics 5:21-23
12. Djavan, Bob et al (2002) Novel artificial neural network for early detection of prostate cancer. J. Clinical Oncology 20:921-929
13. Kimmel, D.G. and B.P. Bradley (2001). Temperature and salinity stress in Eurytemora affinis: Defining ecological limits using protein expression. J. Exp. Mar. Biol. Ecol. 266:135-149
14. Lancashire, Lee et al (2005) Using chemometrics and statistics to improve proteomics biomarker discovery J.Proteome Research 5:461-462
15. O’Neill, M. and Song, L. (2003) Neural network analysis of lymphoma microarray data: prognosis and diagnosis near-perfect. BMC Bioinformatics 4:13-25
16. Shrader, EA, Henry, T.R., Greeley, M.S., and Bradley, B.P. (2003) Proteomics in zebrafish exposed to endocrine disrupting chemicals. Ecotoxicology 12:485-488
17. Smit, Suzanne et al (2007) Assessing the validity of proteomics based biomarkers. Analytica Chimica Acta 592:210-217.
18. Spiegelman, C.H., Ruth Pfeiffer and Mitchell Gail (2005) Using chemometrics and statistics to improve proteomics biomarker discovery. J.Proteomics Res. 5:461-462
19. Urfer, W., Grzegorczyk, M. and Jung, K. (2006) Statistics for proteomics: A review of tools for analyzing experimental data. Practical Proteomics 1-2:48-55
20. Wagner, Michael et al (2004) Computational protein biomarker prediction: a case study for prostate cancer BMC Bioinformaics 5:26
U.S. Patents 5149634 (Bioassay for Environmental Quality) (1992), 5250413 (Sublethal Bioassay for Environmental Quality) (1993) and 6653135 (Dynamic Protein Signature Assay) (2003)