CCL: Randomisation (and model performance checks)



Sorry about the delay in posting this comment - my first attempt seems to have disappeared somewhere!

Yvonne nicely explained how to do Y scrambling and what it means (a check
for chance effects) but there are more things to do when assessing models.

Cross-validation (leave-one-out or leave-many-out) gives some measure of
fitting performance but not prediction. Model Applicability Domain (AD)
gives some idea of how well an individual prediction can be expected to
perform. Division into sets gives some general measures of modelling
performance.

I recently updated a handbook of data modelling (A Practical Guide to
Scientific Data Analysis, Wiley, Nov. 2009) which covers some of these
topics and also shows how to use a lot of multivariate methods. Check out
my website for contents (www.chemquestuk.com) and the Wiley website for
excerpts or to order! (http://eu.wiley.com/WileyCDA/WileyTitle/productCd-
0470851538,descCd-tableOfContents.html).

Cheers,

 Dave.


--
D.J. Livingstone                ChemQuest
                       Delamere House, 1 Royal Crescent,
                       Sandown. Isle of Wight UK PO36 8LZ

Phone: +44 (0)1983 406832
e-mail davel^chemquestuk.com     www.chemquestuk.com
------------------------------------------------------------------