From owner-chemistry@ccl.net Tue Sep 20 07:58:00 2016 From: "Yury Minenkov yury.minenkov ~~ gmail.com" To: CCL Subject: CCL: substructure search Message-Id: <-52380-160920052644-12950-WFClny3615vpaWI5NX71kg|,|server.ccl.net> X-Original-From: "Yury Minenkov" Date: Tue, 20 Sep 2016 05:26:43 -0400 Sent to CCL by: "Yury Minenkov" [yury.minenkov=-=gmail.com] Dear colleagues, I would like to ask a (basic) question which is perhaps on the border between the fields of Computational chemistry, chemoinformatics and drug design, in particular I am interested in the sub-stricture search algorithms. At the beginning I have the two things: a) The small XYZ molecular fragment for which I know the connectivity (I know in which way the atoms are connected, but I do not know the bond orders this is not that important for me at the beginning) b) Many XYZ coordinates of different molecules for which I also know the connectives (again, not the bond orders) I want to search for the given fragment in each of the XYZ molecular file I have. Something similar is organized in the CSD Cambridge structural database. I am quite certain that this is a general problem and there are should be many ready solutions available. Do we have any available libraries (preferably open-source & free with C API) in which such substructure search is implemented? Perhaps any codes? Or easy to implement algorithms? I tried once few Python-based implementations based on SMARTS/SMILES but these are failed for the Transition metal complexes. That is why I believe 3d search would be better. Thank you in advance! Sorry if this too general question. With kind regards, Yury From owner-chemistry@ccl.net Tue Sep 20 09:20:00 2016 From: "Michel Petitjean petitjean.chiral(a)gmail.com" To: CCL Subject: CCL: substructure search Message-Id: <-52381-160920091850-5445-ZWjn3nWCETWbgD3NUZv58w{:}server.ccl.net> X-Original-From: Michel Petitjean Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Date: Tue, 20 Sep 2016 15:18:41 +0200 MIME-Version: 1.0 Sent to CCL by: Michel Petitjean [petitjean.chiral,,gmail.com] Dear Yuri, A combined 2Dand 3D substructure search is not trivial to do. In the case of xyz files, may I suggest that you do first a 3D search with the CSR freeware: http://petitjeanmichel.free.fr/itoweb.petitjean.freeware.html Best regards, Michel. Michel Petitjean MTi, INSERM UMR-S 973, University Paris 7, 35 rue Helene Brion, 75205 Paris Cedex 13, France. Phone: +331 5727 8434; Fax: +331 5727 8372 E-mail: petitjean.chiral:_:gmail.com (preferred), michel.petitjean:_:univ-paris-diderot.fr http://petitjeanmichel.free.fr/itoweb.petitjean.html 2016-09-20 11:26 GMT+02:00 Yury Minenkov yury.minenkov ~~ gmail.com : > Sent to CCL by: "Yury Minenkov" [yury.minenkov=-=gmail.com] > Dear colleagues, > > I would like to ask a (basic) question which is perhaps on the border between the fields of Computational chemistry, chemoinformatics and drug design, in particular I am interested in the sub-stricture search algorithms. > > At the beginning I have the two things: > > a) The small XYZ molecular fragment for which I know the connectivity (I know in which way the atoms are connected, but I do not know the bond orders this is not that important for me at the beginning) > > b) Many XYZ coordinates of different molecules for which I also know the connectives (again, not the bond orders) > > I want to search for the given fragment in each of the XYZ molecular file I have. Something similar is organized in the CSD Cambridge structural database. > > I am quite certain that this is a general problem and there are should be many ready solutions available. Do we have any available libraries (preferably open-source & free with C API) in which such substructure search is implemented? Perhaps any codes? Or easy to implement algorithms? > > I tried once few Python-based implementations based on SMARTS/SMILES but these are failed for the Transition metal complexes. That is why I believe 3d search would be better. > > Thank you in advance! Sorry if this too general question. > > With kind regards, > Yury From owner-chemistry@ccl.net Tue Sep 20 09:55:00 2016 From: "Filippov, Igor (NIH/NLM/NCBI) C filippov..ncbi.nlm.nih.gov" To: CCL Subject: CCL:G: substructure search Message-Id: <-52382-160920090817-984-+OoiMN6fSgbzGqkl1RuueA a server.ccl.net> X-Original-From: "Filippov, Igor (NIH/NLM/NCBI) [C]" Content-Language: en-US Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="us-ascii" Date: Tue, 20 Sep 2016 13:08:08 +0000 MIME-Version: 1.0 Sent to CCL by: "Filippov, Igor (NIH/NLM/NCBI) [C]" [filippov#ncbi.nlm.nih.gov] It is indeed a common task. Take a look at the following thread from a couple of years ago: https://www.mail-archive.com/openbabel-discuss(~)lists.sourceforge.net/msg03911.html The gist of that exchange - here is an openbabel-based tool, not particularly fast but easy to study and modify: https://github.com/openbabel/contributed/tree/master/c%2B%2B/mcs-cliquer And here is a very advanced tool for fast MCS based on RDKit: https://bitbucket.org/dalke/fmcs Hope this helps, Igor -----Original Message----- > From: owner-chemistry+igorf==helix.nih.gov(~)ccl.net [mailto:owner-chemistry+igorf==helix.nih.gov(~)ccl.net] On Behalf Of Yury Minenkov yury.minenkov ~~ gmail.com Sent: Tuesday, September 20, 2016 5:27 AM To: Filippov, Igor (NIH/NCI/Helix) Subject: CCL: substructure search Sent to CCL by: "Yury Minenkov" [yury.minenkov=-=gmail.com] Dear colleagues, I would like to ask a (basic) question which is perhaps on the border between the fields of Computational chemistry, chemoinformatics and drug design, in particular I am interested in the sub-stricture search algorithms. At the beginning I have the two things: a) The small XYZ molecular fragment for which I know the connectivity (I know in which way the atoms are connected, but I do not know the bond orders this is not that important for me at the beginning) b) Many XYZ coordinates of different molecules for which I also know the connectives (again, not the bond orders) I want to search for the given fragment in each of the XYZ molecular file I have. Something similar is organized in the CSD Cambridge structural database. I am quite certain that this is a general problem and there are should be many ready solutions available. Do we have any available libraries (preferably open-source & free with C API) in which such substructure search is implemented? Perhaps any codes? Or easy to implement algorithms? I tried once few Python-based implementations based on SMARTS/SMILES but these are failed for the Transition metal complexes. That is why I believe 3d search would be better. Thank you in advance! Sorry if this too general question. With kind regards, Yuryhttp://www.ccl.net/cgi-bin/ccl/send_ccl_messagehttp://www.ccl.net/chemistry/sub_unsub.shtmlhttp://www.ccl.net/spammers.txt From owner-chemistry@ccl.net Tue Sep 20 14:03:01 2016 From: "Francois BERENGER francois.berenger:+:inria.fr" To: CCL Subject: CCL: substructure search Message-Id: <-52383-160920092619-9273-DBsI8WNQQN3+DfoQ711M8A[*]server.ccl.net> X-Original-From: Francois BERENGER Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=windows-1252 Date: Tue, 20 Sep 2016 15:26:12 +0200 MIME-Version: 1.0 Sent to CCL by: Francois BERENGER [francois.berenger===inria.fr] Maybe the fmcsR package for R allows to do what you need. Cf. http://bioinformatics.oxfordjournals.org/content/29/21/2792.long On 09/20/2016 11:26 AM, Yury Minenkov yury.minenkov ~~ gmail.com wrote: > Sent to CCL by: "Yury Minenkov" [yury.minenkov=-=gmail.com] > Dear colleagues, > > I would like to ask a (basic) question which is perhaps on the border between the fields of Computational chemistry, chemoinformatics and drug design, in particular I am interested in the sub-stricture search algorithms. > > At the beginning I have the two things: > > a) The small XYZ molecular fragment for which I know the connectivity (I know in which way the atoms are connected, but I do not know the bond orders this is not that important for me at the beginning) > > b) Many XYZ coordinates of different molecules for which I also know the connectives (again, not the bond orders) > > I want to search for the given fragment in each of the XYZ molecular file I have. Something similar is organized in the CSD Cambridge structural database. > > I am quite certain that this is a general problem and there are should be many ready solutions available. Do we have any available libraries (preferably open-source & free with C API) in which such substructure search is implemented? Perhaps any codes? Or easy to implement algorithms? I don't think those algorithms are easy (maximum common substructure search). They might be terribly inefficient also. > I tried once few Python-based implementations based on SMARTS/SMILES but these are failed for the Transition metal complexes. That is why I believe 3d search would be better. > > Thank you in advance! Sorry if this too general question. > > With kind regards, > Yury> > -- Regards, Francois. "When in doubt, use more types" From owner-chemistry@ccl.net Tue Sep 20 19:28:01 2016 From: "Markus Sitzmann markus.sitzmann*gmail.com" To: CCL Subject: CCL:G: substructure search Message-Id: <-52384-160920145248-21964-r/QrF3YEqxwUle6vJ3GFsg(~)server.ccl.net> X-Original-From: Markus Sitzmann Content-Type: multipart/alternative; boundary=001a114a4888c3ea26053cf4ee27 Date: Tue, 20 Sep 2016 20:52:35 +0200 MIME-Version: 1.0 Sent to CCL by: Markus Sitzmann [markus.sitzmann\a/gmail.com] --001a114a4888c3ea26053cf4ee27 Content-Type: text/plain; charset=UTF-8 Hi Yury, as far as I understood your question you are just looking for a substructure search based on pure connectivity (2D). I think this paper is an excellent starting point with a lot of references: https://jcheminf.springeropen.com/articles/10.1186/1758-2946-4-13 As you can see from the references it is a long time since on a solution of this problem has been worked on (and yet it is a problem of interest - everybody would be happy to have better performance). I think what Igor mentioned (MCS) is special case of substructure searching (in short, for MCS you have only molecules which you compare without using a specific "query" structure and the point is to find the "query" or "most common substructure" in a given set of molecules). If you really looking for actual "3D substructure searching" where the conformation of atoms matter, that is a complete different beast (but I don't understand your question like that). If you are looking for "open" implementations you should look for openbabel, RDKit or CDK (the latter however is Java). There are also plenty commercial implementations. Markus On Tue, Sep 20, 2016 at 3:08 PM, Filippov, Igor (NIH/NLM/NCBI) C filippov.. ncbi.nlm.nih.gov wrote: > > Sent to CCL by: "Filippov, Igor (NIH/NLM/NCBI) [C]" [filippov# > ncbi.nlm.nih.gov] > It is indeed a common task. Take a look at the following thread from a > couple of years ago: > https://www.mail-archive.com/openbabel-discuss[-]lists. > sourceforge.net/msg03911.html > > The gist of that exchange - here is an openbabel-based tool, not > particularly fast but easy to study and modify: > https://github.com/openbabel/contributed/tree/master/c%2B%2B/mcs-cliquer > > And here is a very advanced tool for fast MCS based on RDKit: > https://bitbucket.org/dalke/fmcs > > Hope this helps, > Igor > > > -----Original Message----- > > From: owner-chemistry+igorf==helix.nih.gov[-]ccl.net [mailto: > owner-chemistry+igorf==helix.nih.gov[-]ccl.net] On Behalf Of Yury > Minenkov yury.minenkov ~~ gmail.com > Sent: Tuesday, September 20, 2016 5:27 AM > To: Filippov, Igor (NIH/NCI/Helix) > Subject: CCL: substructure search > > > Sent to CCL by: "Yury Minenkov" [yury.minenkov=-=gmail.com] Dear > colleagues, > > I would like to ask a (basic) question which is perhaps on the border > between the fields of Computational chemistry, chemoinformatics and drug > design, in particular I am interested in the sub-stricture search > algorithms. > > At the beginning I have the two things: > > a) The small XYZ molecular fragment for which I know the connectivity (I > know in which way the atoms are connected, but I do not know the bond > orders this is not that important for me at the beginning) > > b) Many XYZ coordinates of different molecules for which I also know the > connectives (again, not the bond orders) > > I want to search for the given fragment in each of the XYZ molecular file > I have. Something similar is organized in the CSD Cambridge structural > database. > > I am quite certain that this is a general problem and there are should be > many ready solutions available. Do we have any available libraries > (preferably open-source & free with C API) in which such substructure > search is implemented? Perhaps any codes? Or easy to implement algorithms? > > I tried once few Python-based implementations based on SMARTS/SMILES but > these are failed for the Transition metal complexes. That is why I believe > 3d search would be better. > > Thank you in advance! Sorry if this too general question. > > With kind regards, > Yuryhttp://www.ccl.net/cgi-bin/ccl/send_ccl_messagehttp:/ > /www.ccl.net/chemistry/sub_unsub.shtmlhttp://www.ccl.net/spammers.txt> > > --001a114a4888c3ea26053cf4ee27 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi=C2=A0Yury,
=
as far as I understood your question you are just looking for = a substructure search based on pure connectivity (2D). I think this paper i= s an excellent starting point with a lot of references:


<= /div>
As you can see from the referenc= es it is a long time since on a solution of this problem has been worked on= (and yet it is a problem of interest - everybody=C2=A0would be happy to ha= ve better performance).
<= br>

I think what Igor mentioned (MCS) is spec= ial case of substructure searching (in short, for MCS you have only molecul= es which you compare without using a specific "query" structure a= nd the point is to find the "query" or "most common substruc= ture" in a given set of molecules).

If y= ou really looking for actual "3D substructure searching" where th= e conformation of atoms matter, that is a complete different beast (but I d= on't understand your question like that).

If you are looking for "open"=C2=A0implementations you should l= ook for openbabel, RDKit or CDK (the latter however is Java). There are als= o plenty commercial implementations.

Markus



On Tue, Sep 20, 2016 at 3:08 PM, Filippo= v, Igor (NIH/NLM/NCBI) C filippov..ncbi= .nlm.nih.gov <owner-chemistry]-[ccl.net> wrote:
<= blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px= #ccc solid;padding-left:1ex">
Sent to CCL by: "Filippov, Igor (NIH/NLM/NCBI) [C]" [filippov#ncbi.= nlm.nih.gov]
It is indeed a common task. Take a look at the following thread from a coup= le of years ago:
https://www.mail= -archive.com/openbabel-discuss[-]lists.sourceforge.net/msg03911.h= tml

The gist of that exchange - here is an openbabel-based tool, not particular= ly fast but easy to study and modify:
https://github.com/openbabel= /contributed/tree/master/c%2B%2B/mcs-cliquer

And here is a very advanced tool for fast MCS based on RDKit:
https://bitbucket.org/dalke/fmcs

Hope this helps,
Igor


-----Original Message-----
> From: owner-chemistry+igorf=3D=3Dhelix.nih.gov[-]ccl.net [mailto:owner-chemistry+igorf=3D=3Dhelix.ni= h.gov[-]ccl.net] On Behalf Of Yury Minenkov yury.minenkov ~~ gmail.com
Sent: Tuesday, September 20, 2016 5:27 AM
To: Filippov, Igor (NIH/NCI/Helix) <igorf[-]helix.nih.gov>
Subject: CCL: substructure search


Sent to CCL by: "Yury=C2=A0 Minenkov" [yury.minenkov=3D-=3Dgmail.com] = Dear colleagues,

I would like to ask a (basic) question which is perhaps on the border betwe= en the fields of Computational chemistry, chemoinformatics and drug design,= in particular I am interested in the sub-stricture search algorithms.

At the beginning I have the two things:

a) The small XYZ molecular fragment for which I know the connectivity (I kn= ow in which way the atoms are connected, but I do not know the bond orders= =C2=A0 this is not that important for me at the beginning)

b) Many XYZ coordinates of different molecules for which I also know the co= nnectives (again, not the bond orders)

I want to search for the given fragment in each of the XYZ molecular file I= have. Something similar is organized in the CSD Cambridge structural datab= ase.

I am quite certain that this is a general problem and there are should be m= any ready solutions available. Do we have any available libraries (preferab= ly open-source & free with C API) in which such substructure search is = implemented? Perhaps any codes? Or easy to implement algorithms?

I tried once few Python-based implementations based on SMARTS/SMILES=C2=A0 = but these are failed for the Transition metal complexes. That is why I beli= eve 3d search would be better.

Thank you in advance! Sorry if this too general question.

With kind regards,
Yuryhttp://www.ccl.net/cgi-bin/ccl/send_ccl_mes= sagehttp://www.ccl.net/chemistry/sub_unsub.shtmlhttp://www.ccl.ne= t/spammers.txt


-=3D This is automatically added to each message by the mailing script =3D-=
E-mail to subscribers: CHEMISTRY]-[ccl.n= et or use:
=C2=A0 =C2=A0 =C2=A0 http://www.ccl.net/cgi-bin/= ccl/send_ccl_message

E-mail to administrators: CHEM= ISTRY-REQUEST]-[ccl.net or use
=C2=A0 =C2=A0 =C2=A0 http://www.ccl.net/cgi-bin/= ccl/send_ccl_message

Subscribe/Unsubscribe:
=C2=A0 =C2=A0 =C2=A0 http://www.ccl.net/chemistry/s= ub_unsub.shtml

Before posting, check wait time at: http://www.ccl.net

Job: http://www.ccl.net/jobs
Conferences: http://server.ccl.net/ch= emistry/announcements/conferences/

Search Messages: http://www.ccl.net/chemistry/searchccl/index.shtml
=C2=A0 =C2=A0 =C2=A0 http://www.ccl.net/spammers.txt

RTFI: http://www.ccl.net/chemistry/aboutcc= l/instructions/



--001a114a4888c3ea26053cf4ee27--