From owner-chemistry@ccl.net Wed Dec 1 13:35:00 2010 From: "David Livingstone davel^_^chemquestuk.com" To: CCL Subject: CCL: Randomisation (and model performance checks) Message-Id: <-43252-101201094703-11707-pMUmdF+K05594+/FXDvQWw[-]server.ccl.net> X-Original-From: "David Livingstone" Content-type: Multipart/Alternative; boundary="Alt-Boundary-31802.106237570" Date: Wed, 01 Dec 2010 14:46:50 -0000 MIME-Version: 1.0 Sent to CCL by: "David Livingstone" [davel a chemquestuk.com] --Alt-Boundary-31802.106237570 Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7BIT Content-description: Mail message body Sorry about the delay in posting this comment - my first attempt seems to have disappeared somewhere! Yvonne nicely explained how to do Y scrambling and what it means (a check for chance effects) but there are more things to do when assessing models. Cross-validation (leave-one-out or leave-many-out) gives some measure of fitting performance but not prediction. Model Applicability Domain (AD) gives some idea of how well an individual prediction can be expected to perform. Division into sets gives some general measures of modelling performance. I recently updated a handbook of data modelling (A Practical Guide to Scientific Data Analysis, Wiley, Nov. 2009) which covers some of these topics and also shows how to use a lot of multivariate methods. Check out my website for contents (www.chemquestuk.com) and the Wiley website for excerpts or to order! (http://eu.wiley.com/WileyCDA/WileyTitle/productCd- 0470851538,descCd-tableOfContents.html). Cheers, Dave. -- D.J. Livingstone ChemQuest Delamere House, 1 Royal Crescent, Sandown. Isle of Wight UK PO36 8LZ Phone: +44 (0)1983 406832 e-mail davel^chemquestuk.com www.chemquestuk.com ------------------------------------------------------------------ --Alt-Boundary-31802.106237570 Content-type: text/html; charset=US-ASCII Content-transfer-encoding: 7BIT Content-description: Mail message body
Sorry about the delay in posting this comment - my first attempt seems to have disappeared somewhere!

Yvonne nicely explained how to do Y scrambling and what it means (a check
for chance effects) but there are more things to do when assessing models.

Cross-validation (leave-one-out or leave-many-out) gives some measure of
fitting performance but not prediction. Model Applicability Domain (AD)
gives some idea of how well an individual prediction can be expected to
perform. Division into sets gives some general measures of modelling
performance.

I recently updated a handbook of data modelling (A Practical Guide to
Scientific Data Analysis, Wiley, Nov. 2009) which covers some of these
topics and also shows how to use a lot of multivariate methods. Check out
my website for contents (www.chemquestuk.com) and the Wiley website for
excerpts or to order! (http://eu.wiley.com/WileyCDA/WileyTitle/productCd-
0470851538,descCd-tableOfContents.html).

Cheers,

 Dave.


--
D.J. Livingstone                ChemQuest
                       Delamere House, 1 Royal Crescent,
                       Sandown. Isle of Wight UK PO36 8LZ

Phone: +44 (0)1983 406832
e-mail davel^chemquestuk.com     www.chemquestuk.com
------------------------------------------------------------------
  
--Alt-Boundary-31802.106237570--