From owner-chemistry@ccl.net Mon Mar 11 16:08:00 2019 From: "steve heller steve||hellers.com" To: CCL Subject: CCL: InChI Organometallics RFP Message-Id: <-53641-190311160321-6873-giX5Lua7hbwF+i/9DZMjYQ{=}server.ccl.net> X-Original-From: "steve heller" Date: Mon, 11 Mar 2019 16:03:18 -0400 Sent to CCL by: "steve heller" [steve^^^hellers.com] For more details, please contact Steve Heller. steve,,heller.com === Request for proposal: A new InChI layer for organometallic and coordination compounds 1. Introduction and background Originally developed by the International Union of Pure and Applied Chemistry (IUPAC), the IUPAC International Chemical Identifier (InChI) is a character string generated by computer algorithm. It is a tool to be used in software applications designed and developed by those who choose to use it. The InChI algorithm turns chemical structures into machine-readable strings of information. InChIs are unique to the compound they describe and can encode absolute stereochemistry making chemicals and chemistry machine-readable and discoverable. A simple analogy is that InChI is the barcode for chemistry and chemical structures. The InChI format and algorithm are non-proprietary and the software is open source, with ongoing development done by both a part-time InChI developer and by the community. A number of IUPAC working groups are currently extending the standard for areas of chemistry that are not yet handled by the InChI algorithm. One particularly tricky area of chemistry is that of organometallic and coordination structures, particularly the representation of bonding. The effects of the existing disconnection and canonicalization code in the InChI on organometallic and coordination structures are not well understood and we need more clarity before embarking on major changes. 2. Objective The working group invites detailed prototyping of a proposed organometallic and coordination layer for the InChI based on the existing code. The aim is to investigate the degree to which the existing InChI code can be used to provide a basis for InChIs that better describe the bonding in organometallic and coordination compounds. The first deliverable will be a piece of code in any mainstream programming language that either (a) takes as input V2000 files and/or InChIs that have been generated with the reconnected layer or (b) interacts directly with the InChI API, and produces an organometallic and coordination layer following the scheme in the appendix. For ease of programming, it is accepted that the atom numbering can be taken from the standard InChI assignment after metals have been disconnected, reflecting that used in the hydrogen layer. There will usually be several species within the InChI for one organometallic system, some of which may occur more than once, such as the cyclopentadienyl species in ferrocene. If two species that are different when bound to a metal atom are currently considered identical by InChI following disconnection, then we will need to have a way for the InChI to distinguish between these and this will need to be devised and tested. Ultimately the organometallic and coordination layer will need to list the inorganic bonds defined by species and atom numbers in a way that avoids ambiguities. The developer should feel free to modify the precise details of this layer in order to clarify ambiguities that arise in practice. The second deliverable will be a report describing how the code works and the difficulties, ambiguities and failures found along the way. The InChI Trust will supply a small test set of structures in V2000 format and the CCDC will provide access to the Cambridge Structural Database through their Python API to allow a much wider range of structures to be explored. Out of scope for this RFP: a. Modifications to the InChI code itself. b. Inference of bonds that have not been explicitly drawn in the V2000 file. c. Stereochemistry, though we hope that the canonical layer will be useful for this task by providing a canonical ordering of metal bonds. 3. Cost Bidders should propose an offer at a fixed price in US dollars. 4. Timeframe Bidders should specify the timeframe (in weeks) for the deliverable. Appendix: worked examples Prototype InChIs below begin with InChI=1SO/ in place of InChI=1S/. Not covered below is the case where two species that are different when bound to a metal atom turn out to be indistinguishable by the InChI code following disconnection. One method could be for the higher priority species to be the one with a bond to the lowest numbered atom. Example: Ethyl magnesium bromide Ethyl magnesium bromide, a Grignard reagent, may be drawn as illustrated. Ether molecules are an important part of the structure, but are often omitted. The InChI algorithm treats the bromide and magnesium as separate species, as are the ethyl group and the ethers. The standard InChI for the first case is: InChI=1S/C2H5.BrH.Mg/c1-2;;/h1H2,2H3;1H;/q;;+1/p-1 In this proposal, the organometallic layer adds a bond from the ethyl radical (species 1) to the magnesium (species 3). The ethyl radical has two carbon atoms with numbering given by the hydrogen layer: one for the methyl group and two for the methylene. The inorganic bond, therefore, needs to go from species one, atom 2, which we can write as 1.2, to the only atom in the third species. This can just be written as 3, and the species number uniquely describes the single atom. This inorganic bond can be written: 1.2-3. The lower number is placed first. The other bond in this example, from the bromine (species 2) to the magnesium (species 3) can be written 2-3. Sorting these with the smallest first gives the canonical order: 1.2-3;2-3. The organometallic layer is: /om1.2-3;2-3 Since both the bonds are to the same atom: the magnesium cation, species three, this could be abbreviated: /om(1.2,2)3 Does this compact form introduce any ambiguity? Testing is needed. The first entry within the parenthesis is used to determine the order of the layer, so (1.2,2)3 is chosen over 3(1.2,2) as the smallest number is first. The same structure can be applied to the InChI which describes the ether molecules: InChI=1S/2C4H10O.C2H5.BrH.Mg/c2*1-3-5-4-2;1-2;;/h2*3-4H2,1-2H3;1H2,2H3;1H;/q;;;;+1/p-1 The two ether molecules are listed first in the InChI and are identical. The InChI does not distinguish them, but the organometallic layer may need distinguish identical molecules which differ only in their inorganic bonds. For this reason the species are numbered: (1) ether; (2) ether; (3) ethyl radical; (4) bromide; (5) magnesium cation. In both ethers, the oxygen that interacts with the magnesium is atom five. The organometallic layer is: /om1.5-5;2.5-5;3.2-5;4-5 The alternative form is substantially shorter in this case: /om(1.5,2.5,3.2,4)5 Diethyl magnesium can be described in a similar way: InChI=1SO/2C2H5.Mg/c2*1-2;/h2*1H2,2H3;/om(1.2,2.2)3 Example: Ferrocene Ferrocene, which has two aromatic cyclopentadienyl groups, can be drawn in many different ways. The simplest standard InChI is: InChI=1S/2C5H5.Fe/c2*1-2-4-5-3-1;/h2*1-5H; This illustrates a feature of the InChI approach. Double bonds are not localized and all of the carbon atoms in this structure are identical. The InChI does not express a view as to the best positions for a double bond, and this has to be inferred from the available information. This may well lead to a unique representation, but is very useful in cases for which several reasonable representations exist, because it avoids having to choose between them. Ferrocene may also be regarded as an iron dication, interacting with two cyclopentadienyl anions (right hand structures in the figure). This generates an InChI with a charge layer (/q) but that is otherwise identical to the alternative: InChI=1S/2C5H5.Fe/c2*1-2-4-5-3-1;/h2*1-5H;/q2*-1;+2 In the case where the iron atom has been drawn interacting with all the carbon atoms equally, InChI=1SO/2C5H5.Fe/c2*1-2-4-5-3-1;/h2*1-5H;/om(1.(1,2,3,4,5),2.(1,2,3,4,5))3 In this example, all of the carbons are identical. This type of description can be used for other -complexes, using the maximum number of inorganic bonds. For example, a -allyl species should have three metal-carbon bonds, rather than one next to a double bond, or one -bond and one metal- interaction. Example: Grubbs catalyst This ruthenium catalyst, often called Grubbs II, is described by its InChI as a mixture of six species: InChI=1S/C21H26N2.C18H33P.C7H6.2ClH.Ru/c1-14-9-16(3)20(17(4)10-14)22-7-8-23(13-22)21-18(5)11-15(2)12-19(21)6;1-4-10-16(11-5-1)19(17-12-6-2-7-13-17)18-14-8-3-9-15-18;1-7-5-3-2-4-6-7;;;/h9-12H,7-8H2,1-6H3;16-18H,1-15H2;1-6H;2*1H;/q;;;;;+2/p-2 The organometallic layer only indicates that there is a bond between the metal atom and the non-metal atom, not what kind of bond it is. Following the process outlined above, the organometallic layer for this molecule is: /om(1.13,2.19,3.1,4,5)6 Example: metallo-porphyrins Metallo-porphyrins do not contain carbon-metal bonds and so are excluded from some definitions of organometallics. We include them here as they are molecules that chemists are likely to what to describe and for which an InChI is likely to be useful. The InChI is rather complicated: InChI=1S/C20H12N4.Mg/c1-2-14-10-16-5-6-18(23-16)12-20-8-7-19(24-20)11-17-4-3-15(22-17)9-13(1)21-14;/h1-12H;/q-2;+2/b13-9-,14-10-,15-9-,16-10-,17-11-,18-12-,19-11-,20-12-; The organometallic layer, however, is rather simple. There are four nitrogen-magnesium interactions, and the rule of drawing the maximum number of sensible inorganic bonds needs to be applied. Four bonds are described in the organometallic layer, hence the InChI will be: InChI=1SO/C20H12N4.Mg/c1-2-14-10-16-5-6-18(23-16)12-20-8-7-19(24-20)11-17-4-3-15(22-17)9-13(1)21-14;/h1-12H;/q-2;+2/b13-9-,14-10-,15-9-,16-10-,17-11-,18-12-,19-11-,20-12-;/om1.(21,22,23,24)2 More details available from Steve Heller