From owner-chemistry@ccl.net Mon Mar 19 07:19:01 2007 From: "Alex Allardyce aa() chemaxon.com" To: CCL Subject: CCL: Second call for papers: ChemAxon's 2007 User Group Meeting and Training Day, June 13-14 and June 12, Budapest, Hungary Message-Id: <-33830-070319071411-24329-NTUNpc5PjXoJlHlL7cdmaA]![server.ccl.net> X-Original-From: Alex Allardyce Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=ISO-8859-1; format=flowed Date: Mon, 19 Mar 2007 07:13:53 -0400 MIME-Version: 1.0 Sent to CCL by: Alex Allardyce [aa##chemaxon.com] Hi, Please excuse multiple postings ChemAxon's 2007 User Group Meeting will be held on Wednesday and Thursday, June 13-14 at the Gellert Hotel Spa in Budapest, Hungary. The meeting will preceded by a training day on June 12. The User Group Meeting will feature oral and poster presentations from ChemAxon users, scientists and developers as well as several social events to mix and speak with staff and peers. We are inviting those interested in presenting at the meeting to submit abstracts for review. The deadline for receipt of oral abstracts is March 26th and June 1st for poster abstracts. To find out more and submit your abstract please visit: http://www.chemaxon.com/UGM/07/index.html The training day will give hands on experience with ChemAxon end user applications to take attendees through common tasks in the discovery and optimization process. Closing the training day we will have an open session "Live FAQ" for all meeting attendees. We hope you can participate in what has proven to be a most interesting and enjoyable cheminformatics event. See you there. Alex -- Alex Allardyce Dir. Marketing, ChemAxon Cell-US: 1-857-544-0541 skype: alex_allardyce From owner-chemistry@ccl.net Mon Mar 19 08:40:01 2007 From: "Cross, Simon scross{=}tripos.com" To: CCL Subject: CCL: Babel problem (?) Message-Id: <-33831-070319055404-18010-W6JwP/u5Itz7S7Bx7y0iGQ * server.ccl.net> X-Original-From: "Cross, Simon" Content-class: urn:content-classes:message Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII" Date: Mon, 19 Mar 2007 04:21:55 -0500 MIME-Version: 1.0 Sent to CCL by: "Cross, Simon" [scross_-_tripos.com] Martin, I'm not sure what the Babel problem is, but a simple alternative would be to import the SDF directly into the SYBYL spreadsheet. Even simpler would be to select SDF as the ligand input file format from the docking interface. If it is command-line translation you are after, SYBYL's dbtranslate utility should also be available to you. Regards, Simon ------------------------------------ Dr Simon Cross Product Manager Tripos UK Ltd scross-#-tripos.com Tel +44 1908 650021 Mob +44 7980 572278 Fax +44 1908 650001 ------------------------------------ SYBYL, UNITY, and other products are exclusive property of Tripos & are protected by trademarks, copyrights, & patents as appropriate. The information in this e-mail is confidential and may have associated legal obligations. It is intended for the exclusive attention of the addressee stated above and should not be copied or disclosed to any other. If you have received this transmission in error, please contact the sender. -----Original Message----- > From: owner-chemistry-#-ccl.net [mailto:owner-chemistry-#-ccl.net] Sent: Friday, March 16, 2007 5:16 PM To: Cross, Simon Subject: CCL: Babel problem (?) Sent to CCL by: "Martin Lindh" [martinlindh-$-yahoo.se] Hi I prepare ligands using LigPrep (Maestro). Since I want to dock these ligands in FlexX (Sybyl), I export the ligands from Maestro as a .sdf file. I then use Babel to convert the .sdf file to a multi .mol2 file. The .mol2 file is then imported into Sybyl. During this process some(!) of the ligands get strange mol2 formating. Part of a molecule that is translated correctly into the mol2-file: [A]ATOM 1 P1 -7.7746 0.3497 0.1484 P.3 1 <1> 0.1085 2 O1 -7.2207 1.7666 -0.1209 O.2 1 <1> -0.4607 3 O2 -7.9571 -0.5515 -1.0931 O.co2 1 <1> -0.3575 Part of a molecule that is NOT translated correctly in mol2-file: 28 C 14.2736 6.1106 -1.8492 C.ar 0 UNK0 0.1493 29 N 14.3973 5.3090 -0.7250 N.ar 0 UNK0 -0.2079 30 N 13.7198 8.3248 -1.0353 N.pl3 0 UNK0 -0.3413 31 C1* 15.1231 2.8403 -3.4280 C.3 1 UNK1 0.1683 32 O4* 15.0326 3.1353 -4.7858 O.3 1 UNK1 -0.3456 33 C4* 13.7316 2.8114 -5.2536 C.3 1 UNK1 0.1126 Anyone seen this problem and have suggestions of what to do? Thank you in advance Martinhttp://www.ccl.net/cgi-bin/ccl/send_ccl_messagehttp://www.ccl.net/chemistry/sub_unsub.shtmlhttp://www.ccl.net/spammers.txt From owner-chemistry@ccl.net Mon Mar 19 09:15:02 2007 From: "Igor Novak inovak[]csu.edu.au" To: CCL Subject: CCL:G: Gaussian 03 under SuSE linux 10.1 Message-Id: <-33832-070319023722-2665-5cY3l86rit5A2Rgsg74OWQ^server.ccl.net> X-Original-From: "Igor Novak" Date: Mon, 19 Mar 2007 02:37:18 -0400 Sent to CCL by: "Igor Novak" [inovak##csu.edu.au] Dear netters, I am running Gaussian 03 under SuSE 64-bit linux 10.1. My new system has 4GB RAM, AMD64 Athlon CPU, 300GB HD. When running G3B3 job containing 7 heavy atoms (diazide) it goes through OK. When I add two more azide groups (tetraazide) the job starts to access HD continuously (light is on all the time) or aborts without any error message in the output. The same tetraazide input on an older PC with identical specifications 300GB HD, 4GB RAM, SuSE linux 10.1, goes through OK. Has anyone else experienced similar hardware problems? I would be grateful for any suggestions. Regards I.Novak Charles Sturt University Orange NSW, Australia From owner-chemistry@ccl.net Mon Mar 19 10:36:01 2007 From: "Geoffrey Hutchison grh25]*[cornell.edu" To: CCL Subject: CCL: Babel problem (?) Message-Id: <-33833-070319102843-9394-n7GvuWHlafkbL+ZVFjvEpw---server.ccl.net> X-Original-From: Geoffrey Hutchison Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed Date: Mon, 19 Mar 2007 10:28:07 -0400 Mime-Version: 1.0 (Apple Message framework v752.2) Sent to CCL by: Geoffrey Hutchison [grh25:_:cornell.edu] > I prepare ligands using LigPrep (Maestro). Since I want to dock > these ligands > in FlexX (Sybyl), I export the ligands from Maestro as a .sdf file. > I then > use Babel to convert the .sdf file to a multi .mol2 file. The .mol2 > file is > then imported into Sybyl. > > During this process some(!) of the ligands get strange mol2 formating. You don't mention what version of Babel you're using, but I'll take a guess that it's some version of Open Babel. (Old versions of babel don't even attempt to assign chain/residue information.) There are several ways to report problems with Open Babel. We have our own mailing lists, and you can easily post questions here: http://openbabel.sourceforge.net/wiki/Mailing_lists In this case, it looks like a bug, so I strongly suggest you report it to the bug tracker (ideally with a test file): http://sourceforge.net/tracker/?atid=428740&group_id=40728&func=browse My guess is that this is related to a bug I fixed last week, but it would be very helpful to have a file to test. Thanks, -Geoff -- -Dr. Geoffrey Hutchison Cornell University, Department of Chemistry and Chemical Biology Abruña Group http://abruna.chem.cornell.edu/ From owner-chemistry@ccl.net Mon Mar 19 11:15:01 2007 From: "Kalaiselvan Anbarasan kalaianbaccl ~ gmail.com" To: CCL Subject: CCL: ADF - error Message-Id: <-33834-070319111211-26972-xbItW1+VDHdwJx0tg2R2Lg__server.ccl.net> X-Original-From: "Kalaiselvan Anbarasan" Date: Mon, 19 Mar 2007 11:12:07 -0400 Sent to CCL by: "Kalaiselvan Anbarasan" [kalaianbaccl---gmail.com] Hello, In ADF, during the numerical frequency calculation after few geometry updates i got the error as MakeAOIndex. Can anyone suggest me, the type of error and how to overcome it. Thank you in advance, Kalaiselvan From owner-chemistry@ccl.net Mon Mar 19 12:17:01 2007 From: "Dan Maftei dan.maftei/a\chem.uaic.ro" To: CCL Subject: CCL:G: Gaussian 03 under SuSE linux 10.1 Message-Id: <-33835-070319105642-19190-d/2jGWKT4UoMSAmG4rFWjw[]server.ccl.net> X-Original-From: Dan Maftei Content-Type: multipart/mixed; boundary="------------040902080705030404050303" Date: Mon, 19 Mar 2007 16:54:26 +0200 MIME-Version: 1.0 Sent to CCL by: Dan Maftei [dan.maftei{}chem.uaic.ro] This is a multi-part message in MIME format. --------------040902080705030404050303 Content-Type: text/plain; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: 7bit On the 64bit machine (the one generating the problem) try "dmesg". If may be a G03/l1.exe problem (check for segfaults, invalid instructions etc.) If g03 was compiled with the AMD64 switch on (i. e. pgf77 -tp k8-64) you may encounter this kind of problems. Igor Novak inovak[]csu.edu.au wrote: > Sent to CCL by: "Igor Novak" [inovak##csu.edu.au] > Dear netters, > I am running Gaussian 03 under SuSE 64-bit linux 10.1. My new system has 4GB RAM, AMD64 Athlon CPU, 300GB HD. When running G3B3 job containing 7 heavy atoms (diazide) it goes through OK. When I add two more azide groups (tetraazide) the job starts to access HD continuously (light is on all the time) or aborts without any error message in the output. The same tetraazide input on an older PC with identical specifications 300GB HD, 4GB RAM, SuSE linux 10.1, goes through OK. > Has anyone else experienced similar hardware problems? > I would be grateful for any suggestions. > > Regards > I.Novak > Charles Sturt University > Orange NSW, Australia> > > > --------------040902080705030404050303 Content-Type: text/x-vcard; charset=utf-8; name="dan.maftei.vcf" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="dan.maftei.vcf" begin:vcard fn:Dan Maftei n:Maftei;Dan org:"Alexandru Ioan Cuza" University;Physical, Theoretical and Materials Chemistry adr:Nr. 11;;Bd. Carol 1, ;Iasi;;700506;Romania email;internet:dan.maftei=chem.uaic.ro title:Faculty of Chemistry tel;work:+40232-201307 tel;cell:+40740-262227 x-mozilla-html:FALSE url:http://www.chem.uaic.ro/~danmaftei version:2.1 end:vcard --------------040902080705030404050303-- From owner-chemistry@ccl.net Mon Mar 19 14:40:00 2007 From: "Jozsef Csontos jozsefcsontos * creighton.edu" To: CCL Subject: CCL:G: Gaussian 03 under SuSE linux 10.1 Message-Id: <-33836-070319105240-17163-jG+2aJsEGotFu9VG/hR7PA-#-server.ccl.net> X-Original-From: Jozsef Csontos Content-Transfer-Encoding: 7bit Content-Type: text/plain Date: Mon, 19 Mar 2007 09:52:21 -0500 Mime-Version: 1.0 Sent to CCL by: Jozsef Csontos [jozsefcsontos=creighton.edu] Hi, if your older configuration is a 32bit system then the difference is that the 64bit one needs two times of memory for the same job because of the 64bit memory addressing. This could be the source of the intensive harddisk I/O access. This would be my guess, I hope it helps, Jozsef On Mon, 2007-03-19 at 02:37 -0400, Igor Novak inovak[]csu.edu.au wrote: > Sent to CCL by: "Igor Novak" [inovak##csu.edu.au] > Dear netters, > I am running Gaussian 03 under SuSE 64-bit linux 10.1. My new system has 4GB RAM, AMD64 Athlon CPU, 300GB HD. When running G3B3 job containing 7 heavy atoms (diazide) it goes through OK. When I add two more azide groups (tetraazide) the job starts to access HD continuously (light is on all the time) or aborts without any error message in the output. The same tetraazide input on an older PC with identical specifications 300GB HD, 4GB RAM, SuSE linux 10.1, goes through OK. > Has anyone else experienced similar hardware problems? > I would be grateful for any suggestions. > > Regards > I.Novak > Charles Sturt University > Orange NSW, Australia> > > -- Jozsef Csontos, Ph.D. (jozsefcsontos_at_creighton.edu) Department of Biomedical Sciences Creighton University, Omaha, NE From owner-chemistry@ccl.net Mon Mar 19 15:55:02 2007 From: "Stan van Gisbergen vangisbergen(~)scm.com" To: CCL Subject: CCL: ADF - error Message-Id: <-33837-070319131622-25168-8jiVtsqljuZtRc5Gs2Jbrw+*+server.ccl.net> X-Original-From: Stan van Gisbergen Content-Type: multipart/alternative; boundary=Apple-Mail-135-592645138 Date: Mon, 19 Mar 2007 17:14:17 +0100 Mime-Version: 1.0 (Apple Message framework v752.2) Sent to CCL by: Stan van Gisbergen [vangisbergen-$-scm.com] --Apple-Mail-135-592645138 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Dear Kalaiselvan, Please send any ADF-related questions to our support E-mail address support++scm.com or post it on our ADF user forum on our website www.scm.com. You will then quickly get an answer. Thank you. Best regards, Stan van Gisbergen On Mar 19, 2007, at 4:12 PM, Kalaiselvan Anbarasan kalaianbaccl ~ gmail.com wrote: > > Sent to CCL by: "Kalaiselvan Anbarasan" [kalaianbaccl---gmail.com] > Hello, > In ADF, during the numerical frequency calculation after few > geometry updates i got the error as MakeAOIndex. > Can anyone suggest me, the type of error and how to overcome it. > > Thank you in advance, > > Kalaiselvan > > > > -= This is automatically added to each message by the mailing > script =- > To recover the email address of the author of the message, please > change> Conferences: http://server.ccl.net/chemistry/announcements/ > conferences/ > > Search Messages: http://www.ccl.net/htdig (login: ccl, Password: > search)> > -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- > +-+-+ > > > Dr. S.J.A. van Gisbergen Scientific Computing & Modelling NV Theoretical Chemistry, Vrije Universiteit De Boelelaan 1083 1081 HV Amsterdam The Netherlands vangisbergen++scm.com http://www.scm.com T: +31-20-5987626 F: +31-20-5987629 --Apple-Mail-135-592645138 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=ISO-8859-1 Dear Kalaiselvan,=A0

Please send any ADF-related = questions to our support E-mail address support++scm.com

or post it = on our ADF user forum on our website www.scm.com.=A0You will then quickly get = an answer.=A0

Thank you.

Best = regards,

Stan van Gisbergen=A0

On Mar = 19, 2007, at 4:12 PM, Kalaiselvan Anbarasan kalaianbaccl ~ gmail.com = wrote:

Sent to CCL by: "Kalaiselvan=A0 Anbarasan" = [kalaianbaccl---gmail.com]
In ADF, during the numerical = frequency calculation after few geometry updates i got the error as = MakeAOIndex.
Can anyone suggest me, the type = of error and how to overcome it.

Thank you in advance,

-=3D This is = automatically added to each message by the mailing script =3D-
To recover the email address of the author of the = message, please change
the strange characters on = the top line to the ++ sign. You can also
look up = the X-Original-From: line in the mail header.

E-mail = to subscribers: CHEMISTRY++ccl.net= or use:
=A0 =A0 =A0 http://www.ccl.ne= t/cgi-bin/ccl/send_ccl_message

E-mail to administrators: CHEMISTRY-REQUEST++ccl.net = or use
=A0 =A0 =A0 http://www.ccl.ne= t/cgi-bin/ccl/send_ccl_message

Subscribe/Unsubscribe:=A0
=A0 =A0 =A0 http://www.ccl.net/c= hemistry/sub_unsub.shtml

Before posting, check wait time = at: http://www.ccl.net

Job: http://www.ccl.net/jobs=A0
http:/= /server.ccl.net/chemistry/announcements/conferences/

Search = Messages: http://www.ccl.net/htdig=A0 (login: ccl, Password: = search)

If your mail bounces from CCL with 5.7.1 error, = check:
=A0 =A0 =A0 http://www.ccl.net/spammers.txt

RTFI: http://www.cc= l.net/chemistry/aboutccl/instructions/

Dr. S.J.A. van Gisbergen =A0 = =A0 =A0

Scientific Computing & Modelling NV

Theoretical Chemistry, Vrije = Universiteit

De Boelelaan 1083

1081 HV Amsterdam

The Netherlands=A0=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0

vangisbergen++scm.com = =A0

http://www.scm.com

T: +31-20-5987626 =A0 = =A0

F: = +31-20-5987629 =A0 =A0 =A0

= --Apple-Mail-135-592645138-- From owner-chemistry@ccl.net Mon Mar 19 16:30:01 2007 From: "aa aa**chemaxon.hu" To: CCL Subject: CCL: program to split sdf file Message-Id: <-33838-070319112907-8026-QTFxv2D0Qumo+vtFwUoKQQ,server.ccl.net> X-Original-From: aa Content-Type: multipart/alternative; boundary="------------010709000102070601030800" Date: Mon, 19 Mar 2007 10:36:18 -0400 MIME-Version: 1.0 Sent to CCL by: aa [aa : chemaxon.hu] This is a multi-part message in MIME format. --------------010709000102070601030800 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit HI, Unix has loads of ways of doing this, some of which have already been suggested. But, if you want a more graphical way of doing things then you might want to look a ChemAxon's Instant JChem (http://www.chemaxon.com/product/ijc.html) which lets you easily view and query chemical data imported from an SD file. The data can be queried (in your case, just for IDs 1-3,000 etc, but much more complex queries are possible) and then the results exported to a SD file. And yes, it is free! Alex Fan,Huajun hjfan^^^pvamu.edu wrote: > > Hi, Does anyone know any programs (preferably free) that can split a > big sdf file into smaller files? I got a sdf file containing 30,000 > molecules and want to do a DOCK5. It is too big even to read it > through. I want to split it into 10 samller files that contains 3,000 > each. Is it possible? The newest version of Babel seems not available > of this split function for SDF format. > > > > Thanks in advance. > > > > **Hua-Jun ** > > > --------------010709000102070601030800 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit HI,

Unix has loads of ways of doing this, some of which have already been suggested.
But, if you want a more graphical way of doing things then you might want to look a ChemAxon's Instant JChem (http://www.chemaxon.com/product/ijc.html) which lets you easily view and query chemical data imported from an SD file.

The data can be queried (in your case, just for IDs 1-3,000 etc, but much more complex queries are possible) and then the results exported to a SD file.

And yes, it is free!
Alex

Fan,Huajun hjfan^^^pvamu.edu wrote:

Hi, Does anyone know any programs (preferably free) that can split a big sdf file into smaller files? I got a sdf file containing 30,000 molecules and want to do a DOCK5. It is too big even to read it through. I want to split it into 10 samller files that contains 3,000 each. Is it possible? The newest version of Babel seems not available of this split function for SDF format.

Thanks in advance.

Hua-Jun

--------------010709000102070601030800-- From owner-chemistry@ccl.net Mon Mar 19 20:31:00 2007 From: "Iain MacDougall iain.macdougall##studentmail.newcastle.edu.au" To: CCL Subject: CCL: comparing database results Message-Id: <-33839-070319202054-27459-0TPh7aCun2NxYKmI5//6Yg:+:server.ccl.net> X-Original-From: "Iain MacDougall" Date: Mon, 19 Mar 2007 20:20:50 -0400 Sent to CCL by: "Iain MacDougall" [iain.macdougall\a/studentmail.newcastle.edu.au] I have two sets of compounds from in silico database searching. I have compared the compounds and have found that none are identical, however I wish to find similar hits as well. I can do this by eye for small numbers of hits but this will be impossible for some of my larger hitlists. Does anyone know of freeware that will do some kind of structural comparison/similarity searching for me? Thanks for your help! From owner-chemistry@ccl.net Mon Mar 19 22:13:01 2007 From: "Wolf-D. Ihlenfeldt wdi{:}xemistry.com" To: CCL Subject: CCL: comparing database results Message-Id: <-33840-070319220912-26233-Zl0gDvQ6g/4cmzGe+c/ohg*server.ccl.net> X-Original-From: "Wolf-D. Ihlenfeldt" Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii" Date: Mon, 19 Mar 2007 22:08:48 -0400 MIME-Version: 1.0 Sent to CCL by: "Wolf-D. Ihlenfeldt" [wdi~~xemistry.com] Hi Iain, that can easily be done with a small CACTVS script (CACTVS is free for academic use). In what format do you have your data, and what kind of similarity criterion do you want to use? > -----Original Message----- > From: owner-chemistry_-_ccl.net [mailto:owner-chemistry_-_ccl.net] > Sent: Monday, March 19, 2007 8:21 PM > To: Ihlenfeldt, W.d. > Subject: CCL: comparing database results > > > Sent to CCL by: "Iain MacDougall" > [iain.macdougall\a/studentmail.newcastle.edu.au] > I have two sets of compounds from in silico database > searching. I have compared the compounds and have found that > none are identical, however I wish to find similar hits as > well. I can do this by eye for small numbers of hits but this > will be impossible for some of my larger hitlists. Does > anyone know of freeware that will do some kind of structural > comparison/similarity searching for me? > Thanks for your help! > > > > -= This is automatically added to each message by the mailing > script =- To recover the email address of the author of the > message, please change the strange characters on the top line > to the _-_ sign. You can also look up the X-Original-From: line > in the mail header.> Conferences: > http://server.ccl.net/chemistry/announcements/conferences/ > > Search Messages: http://www.ccl.net/htdig (login: ccl, > Password: search)> > -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > -+-+-+-+-+ > > > > From owner-chemistry@ccl.net Mon Mar 19 22:47:01 2007 From: "Brunsteiner, Michael micb+*+uic.edu" To: CCL Subject: CCL: comparing database results Message-Id: <-33841-070319212448-20660-FHsR0kEfpNMolHKgY9NF9g**server.ccl.net> X-Original-From: "Brunsteiner, Michael" Content-Transfer-Encoding: 8bit Content-Type: text/plain;charset=iso-8859-1 Date: Mon, 19 Mar 2007 20:24:28 -0500 (CDT) MIME-Version: 1.0 Sent to CCL by: "Brunsteiner, Michael" [micb^_^uic.edu] On Mon, March 19, 2007 19:20, Iain MacDougall iain.macdougall##studentmail.newcastle.edu.au wrote: > > Sent to CCL by: "Iain MacDougall" > [iain.macdougall\a/studentmail.newcastle.edu.au] > I have two sets of compounds from in silico database searching. I have > compared the compounds and have found that none are identical, however I > wish to find similar hits as well. I can do this by eye for small numbers > of hits but this will be impossible for some of my larger hitlists. Does > anyone know of freeware that will do some kind of structural > comparison/similarity searching for me? > Thanks for your help! you might want to have a look at ROCS (http://www.eyesopen.com/docs/html/rocs/) > from openeye, its free for academia. mic From owner-chemistry@ccl.net Mon Mar 19 23:42:01 2007 From: "Wolf-D. Ihlenfeldt wdi=xemistry.com" To: CCL Subject: CCL: program to split sdf file Message-Id: <-33842-070319234024-13232-soHYqi9usu+Ci4/M6g4sHA/./server.ccl.net> X-Original-From: "Wolf-D. Ihlenfeldt" Content-Type: multipart/alternative; boundary="----=_NextPart_000_0AF8_01C76A7F.E91421B0" Date: Mon, 19 Mar 2007 23:39:58 -0400 MIME-Version: 1.0 Sent to CCL by: "Wolf-D. Ihlenfeldt" [wdi|a|xemistry.com] This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------=_NextPart_000_0AF8_01C76A7F.E91421B0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit There have now been quite a number of helpful suggestions on how to perform the split - but none of them are really robust, which is essential if files become too big for visual inspection of the results. Remember: Murphy lives! Take the following completely legal SD file record: ---- -ISIS- 03190722292D $$$$ 1 0 0 0 0 0 0 0 0 0999 V2000 -2.3167 -0.2167 0.0000 Au 0 0 0 1 0 0 1 0 0 0 0 0 M END > $$$$ $$$$ ------ This is a single record, with multiple "$$$$" in places where they are *not* record terminators. All simple string search methods with awk or similar tools which simply look for the $$$$ line will fail on this. Never assume that such records do not exist. I *have* seen $$$$ in SD data lines before. A record splitter thus needs more chemical intelligence to process such files. OpenBabel has been suggested for problem. There are several problems with that proposal: a) It does not scale to really big files, because Babel has no method to output multiple files. Every batch is a separate command and needs to skip to the first copy position. Not a problem with a few thousand cpds, but ultimately this approaches n**2 performance law and if you need to split your full PubChem 10 mil cpds download, you have a problem. b) While Babel is smart enough to read and output the first records from an SD input file with repeated records as above (almost) correctly, its skip function seems to have less brainpower and gets confused. It simply silently quits without any message. It is not possible to output records starting after first record from above repeated multi-record test file or after encountering such as record anywhere in the skipped part. A bad thing if something like this happens in the middle of your 500Mb file where you cannot edit. c) While on superficial inspection the Babel output looks correct when run on the first records, a closer look shows that critical information has been lost. Babel needs to read records into its internal datastructure before output via conversion.However, its Molfile parser is rather simple and supports few of the more advanced Molfile encoding conventions. In this case, Babel silently drops the critical H0 designator flag (plus a second flag) which lets a Molfile reader distinguish between metal Au and AuH3 with implicit H. So after the pass through Babel, the compound has changed, without any notification, from metal Au to AuH3. That can be a problem. OK, enough criticism, here is constructive help: -----snip---store as script.tcl--- set fname [lindex $argv 0] set fhin [molfile open $fname] set setsize [lindex $argv 1] set startrec 1 while 1 { set fhout [open [file rootname $fname]_${startrec}_[expr $startrec+$setsize-1][file extension $fname] w] if {[catch {molfile copy $fhin $fhout $setsize}]} { close $fhout exit } close $fhout incr startrec $setsize } ------- Above is a really simple (and not user-proofed, no parameter checking) script for the CACTVS toolkit (www.xemistry.com/academic). Run it with the generic script interpreter from the packages as csts -f script.tcl filename.sdf setsize The script will output a set of files like "myfile_1_99.sdf", "myfile_100_199.sdf", etc. in the same directory as the source file. This script: a) Processes above sample file (or any other input file) without a single change of bytes in the split records, similar to line-copying awk scripts etc.. The record copy function does not decode and re-encode the data; it just keeps an eye on the passing data to detect proper record boundaries. b) Does not need to know anything about the input file format. It will autodetect the format (independent of the suffix) and work with any supported multi-record format. W. D. Ihlenfeldt Xemistry GmbH wdi-x-xemistry.com _____ > From: owner-chemistry-x-ccl.net [mailto:owner-chemistry-x-ccl.net] Sent: Monday, March 19, 2007 10:36 AM To: Ihlenfeldt, W.d. Subject: CCL: program to split sdf file HI, Unix has loads of ways of doing this, some of which have already been suggested. But, if you want a more graphical way of doing things then you might want to look a ChemAxon's Instant JChem (http://www.chemaxon.com/product/ijc.html) which lets you easily view and query chemical data imported from an SD file. The data can be queried (in your case, just for IDs 1-3,000 etc, but much more complex queries are possible) and then the results exported to a SD file. And yes, it is free! Alex Fan,Huajun hjfan^^^pvamu.edu wrote: Hi, Does anyone know any programs (preferably free) that can split a big sdf file into smaller files? I got a sdf file containing 30,000 molecules and want to do a DOCK5. It is too big even to read it through. I want to split it into 10 samller files that contains 3,000 each. Is it possible? The newest version of Babel seems not available of this split function for SDF format. Thanks in advance. Hua-Jun ------=_NextPart_000_0AF8_01C76A7F.E91421B0 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

There=20 have now been quite a number of helpful suggestions on how to perform = the split=20 - but none of them are really robust, which is essential if files = become=20 too big for visual inspection of the results. Remember: Murphy=20 lives!

Take=20 the following completely legal SD file record:

----

-ISIS- = 03190722292D
$$$$
1 =20 0 0 0 0 0 0 0 0 0999=20 V2000
-2.3167 -0.2167 = 0.0000=20 Au 0 0 0 1 0 0 1 0 = 0 =20 0 0 0
M END
> = <PRICE>
$$$$

$$$$

------

This=20 is a single record, with multiple "$$$$" in places where they are = *not*=20 record terminators. All simple string search methods with awk or similar = tools=20 which simply look for the $$$$ line will fail on this. Never = assume=20 that such records do not exist. I *have* seen $$$$ in SD data lines = before.

A=20 record splitter thus needs more chemical intelligence to process such = files.=20

OpenBabel has been suggested for problem. There are several = problems with=20 that proposal:

a) It=20 does not scale to really big files, because Babel has no method to = output=20 multiple files. Every batch is a separate command and needs to skip to = the first=20 copy position. Not a problem with a few thousand cpds, but ultimately = this=20 approaches n**2 performance law and if you need to split your full = PubChem 10=20 mil cpds download, you have a problem.

b)=20 While Babel is smart enough to read and output the first records from an = SD=20 input file with repeated records as above (almost) correctly, its = skip=20 function seems to have less brainpower and gets confused. It simply = silently=20 quits without any message. It is not possible to output records starting = after=20 first record from above repeated multi-record test file or after=20 encountering such as record anywhere in the skipped part. A bad thing if = something like this happens in the middle of your 500Mb file where you = cannot=20 edit.

c) While on superficial inspection the Babel = output=20 looks correct when run on the first records, a closer look shows that = critical=20 information has been lost. Babel needs to read records into its internal = datastructure before output via conversion.However, its Molfile parser = is rather=20 simple and supports few of the more advanced Molfile encoding = conventions. In=20 this case, Babel silently drops the critical H0 designator flag (plus a = second=20 flag) which lets a Molfile reader distinguish between metal Au and AuH3 = with=20 implicit H. So after the = pass through=20 Babel, the compound has changed, without any notification, from = metal Au to=20 AuH3. That can be a problem.

OK, enough criticism, here is constructive=20 help:

-----snip---store as=20 script.tcl---

set fname [lindex $argv 0]
set fhin = [molfile open=20 $fname]
set setsize [lindex $argv 1]
set startrec 1
while 1=20 {
        set fhout [open [file = rootname=20 $fname]_${startrec}_[expr $startrec+$setsize-1][file extension $fname]=20 w]
        if {[catch {molfile = copy $fhin=20 $fhout $setsize}]}=20 {
           &n= bsp;   =20 close=20 $fhout
          &nb= sp;    =20 exit
       =20 }
        close=20 $fhout
        incr startrec=20 $setsize
}

-------

Above is a really simple (and not = user-proofed, no=20 parameter checking) script for the CACTVS toolkit (www.xemistry.com/academic).= Run=20 it with the generic script interpreter from the packages=20 as

csts -f script.tcl filename.sdf=20 setsize

The script will output a set of files = like=20 "myfile_1_99.sdf", "myfile_100_199.sdf", etc. in the same directory as = the=20 source file.

This = script:

a) Processes above sample file (or any other = input=20 file) without a single change of bytes in the split records, similar to=20 line-copying awk scripts etc.. The record copy function does not decode = and=20 re-encode the data; it just keeps an eye on the passing data to detect = proper=20 record boundaries.

b) Does not need to know anything about = the input=20 file format. It will autodetect the format (independent of the suffix) = and work=20 with any supported multi-record = format.

W. D. Ihlenfeldt
Xemistry=20 GmbH
wdi-x-xemistry.com

From: owner-chemistry-x-ccl.net=20 [mailto:owner-chemistry-x-ccl.net]
Sent: Monday, March 19, = 2007 10:36=20 AM
To: Ihlenfeldt, W.d.
Subject: CCL: = program to=20 split sdf file

HI,

Unix has loads of ways of doing this, some of = which have=20 already been suggested.
But, if you want a more graphical way of = doing=20 things then you might want to look a ChemAxon's Instant JChem (http://www.chemaxon.com= /product/ijc.html)=20 which lets you easily view and query chemical data imported from an SD = file.=20

The data can be queried (in your case, just for IDs 1-3,000 = etc, but=20 much more complex queries are possible) and then the results exported = to a SD=20 file.

And yes, it is free!
Alex

Fan,Huajun=20 hjfan^^^pvamu.edu wrote:=20

Hi, Does anyone know = any=20 programs (preferably free) that can split a big sdf file into = smaller files?=20 I got a sdf file containing 30,000 molecules and want to do a DOCK5. = It is=20 too big even to read it through. I want to split it into 10 samller = files=20 that contains 3,000 each. Is it possible? The newest version of Babel seems not available of this = split=20 function for SDF format.

Thanks in=20 advance.

Hua-Jun=20

------=_NextPart_000_0AF8_01C76A7F.E91421B0--