*From*: d.turner -AatT- sheffield.ac.uk*Subject*: Re: CCL:cross-validation and prediction with PLS*Date*: Thu, 9 Sep 1999 13:50:33 +0100

Dear Dr Winkler

I have noted your comments with interest. I am very curious about your

suggestion that Bayesian NN QSAR methods don't (may not) require

validation; that crossvalidation (CV) may not be needed I can understand

but NO validation at all? How do you know that your model has any value

without some external criterion? Or have I missed something here?

I have some more questions/comments:-

>

> 1. What is the best way to chose a validation set (or test for that matter) is

> leave-one-out, leave-N-out etc?

As far as I know one doesn't use CV (LOO/LNO) to choose validation/test sets.

> 2. Should clustering be used to choose a representative test set and , if so,

experimental design as well as what is more usually called clustering?

> number of compounds and the square of the number of independent variables,

> which has important implications for large data sets (eg combichem/HTS data)

> models, all slightly different. Which is the 'best' or 'true' model?

PLS latent variables (LV) to use in a final all-observations-included analysis the

regression coefficients from which predictions can be made. Or is your

"N" an LV indicator?

I look forward to hearing your comments

Regards

Dave Turner

> regularized neural nets for QSAR. We find that they overcome virtually all of

> the problems with PLS QSAR models as they give the single statistically best

> model possible for the data set. In addition there are good theoretical reasons

> why they do not require cross validation or test sets. We are investigating this

> for QSAR and preliminary results suggest that this is the case.

>

> We have published some of this work recently:

>

> [74] New QSAR Methods Applied to Structure-Activity Mapping and

> Combinatorial Chemistry, Burden, F.R. and Winkler, D.A. J. Chem. Inf.

> Comput. Sci. 39, 236 (1999).

> [75] The Computer Simulation of High Throughput Screening of Bioactive

> Molecules, F.R. Burden, D.A. Winkler, in Molecular Modelling and Prediction

> of Bioactivity (K. Gundertofte and F.S. Jorgensen eds), Plenum Press 1998.

> [80] Robust QSAR Models Using Bayesian Regularised Artificial Neural

> Networks, Burden, F.R. and Winkler, D.A. J. Med. Chem., 1999; 42(16); 3183-

> 3187 (1999).

> [81] A QSAR Model for the Acute Toxicity of Substituted Benzenes towards

> Tetrahymena Pyriformis using Bayesian Regularized Neural Networks. F R.

> Burden* David A. Winkler, Chem. Res. Toxicol., in press.

> [82] Robust QSAR Models from Novel Descriptors and Bayesian Regularized

> Neural Networks, Winkler, D.A, Burden, F.R. Mol. Simul. 1999 in press.

> [87] Do QSAR Models using Bayesian Regularized Artificial Neural Networks

> Really Need Validation? Winkler, D.A. and Burden, F.R. J.Chem. Inf.

> Comput. Sci in preparation.

>

> Cheers,

>

> Dave

>

> Dr. David A. Winkler Email: dave.winkler -AatT- molsci.csiro.au

> Senior Principal Research Scientist Voice: 61-3-9545-2477

> CSIRO Molecular Science Fax: 61-3-9545-2446

> Private Bag 10,Clayton South MDC 3169 http://www.csiro.au

> Australia http://www.molsci.csiro.au

>

Dr David Turner Dept of Information Studies, Sheffield University Sheffield, S10 2TN, UK Tel. 0114 2 222 650 E-mail: D.Turner -AatT- sheffield.ac.uk Fax: 0114 2 780 300