From chemistry-request-!at!-server.ccl.net Fri Apr 30 07:00:50 1999 Received: from www.ccl.net (www.ccl.net [192.148.249.5]) by server.ccl.net (8.8.7/8.8.7) with ESMTP id HAA32085 for ; Fri, 30 Apr 1999 07:00:50 -0400 Received: from comsig.nibsc.ac.uk (comsig.nibsc.ac.uk [193.62.43.13]) by www.ccl.net (8.8.3/8.8.6/OSC/CCL 1.0) with ESMTP id GAA20724 Fri, 30 Apr 1999 06:57:12 -0400 (EDT) Received: from nibsc.ac.uk (dlinmf.nibsc.ac.uk [193.62.42.144]) by comsig.nibsc.ac.uk (980427.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id LAA01369; Fri, 30 Apr 1999 11:55:05 +0100 (BST) Message-ID: <37298C17.B38AB66 \\at// nibsc.ac.uk> Date: Fri, 30 Apr 1999 11:55:19 +0100 From: Mark Forster Organization: NIBSC X-Mailer: Mozilla 4.05 [en] (Win95; I) MIME-Version: 1.0 To: Gerald Loeffler , chemistry(+ at +)www.ccl.net Subject: Re: CCL:XML for Bioinformtics Data References: <3729775D.C5D6C8BA ^%at%^ vienna.at> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Dear Gerald That is a nice summary of the capabilities and possibilities offered by XML. Some work in this area has already been done. For more information on the Biosequence Markup Language (BSML) see the WWW page of Visual Genomics Inc. at http://www.visualgenomics.com/bsml/index.html A BSML browser and examples are available for download. What is not currently clear to me is whether a given markup language must to be approved by the WWW consortium, the Math markup language 1.0 (http://www.w3.org/Math/) has been released as a W3C recommendation. in April 98; but is this required ? Gerald Loeffler wrote: > Hi! > > Recently, I've been working a lot with XML (see http://www.w3c.org/xml/ > and e.g. http://www.ibm.com/xml/), which is a standard, human-readable, > extensible markup-language that is rapidly becoming _the_ method of > choice for exchange and storage of any kind of data and documents. It > seems to me that XML would simply be _perfect_ for data exchange and > maybe even data storage in bioinformatics (see end of message for a note > on chemistry and CML). > > E.g. (from the top of my head), a DNA/protein sequence similarity search > engine (e.g. NCBIs BLAST server) might return its search results in the > form of an XML document that > could look like this: > > > > protein > GAVLIFYWSTQ > FASTA3 > SwissProt > -12 > -2 > > > > HPS_HUMAN > homo sapiens > 11 > GAEVLFYWTDQ > 129.3 > > > PA24_MOUSE > mus musculus > 8 > VFIFYWTT > 133.3 > > > > > There are several important points here: > > 1) Without knowing what this XML document is about, a program can assert > that it is well-formed! These programs exist, are free and are > applicable to all XML documents! > > 2) The rules for the nesting and naming of the tags in XML documents of > this type can be formally defined in XML. The above document would be of > type "seq-sim-search-results" and you could easily write a formal > definition (in a DTD file) that says that such a document must contain a > "query" and a "hits" tag; the "query" tag in turn must contain exactly > one of each "type", "seq", ... The "hits" tag in turn may contain 0 or > more "hit" tags which in turn ... > > 3) Having a formal definition of documents of this type, a program can > verify that our above XML document complies with the formal definiton > (is valid). These programs exist, are free and are applicable to all XML > documents! > > 4) Free utilities exist (e.g. IBMs xml4j) that can programmatically > write and read (parse) any XML document and thus give a program access > to the structure and content of the document!! (No more perl-parsers for > BLAST-output!!) > > 5) This file is human-readable! (in contrast to a Corba struct or a > serialized Java object!) > > 6) Modern WWW-browsers can (if a style-sheet is supplied) directly > display this XML document. For old browsers, the XML document can easily > be converted to HTML for display. > > I think you get the idea. > > Does such an XML-based approach sound reasonable? > What does this approach leave to be desired? > Are efforts underway in this direction? > Wouldn't it be a better world if we all used XML (-: > > I know that XML is currently being used for chemistry-related data (CML, > see http://www.xml-cml.org/), but I haven't heard of any efforts in the > area of Bioinformatics. So please view this message as targeted towards > the Bioinformatics community that is not served by CML. (CML has a > DNA/protein sequence tag.) > > cheers, > gerald > -- > Gerald Loeffler > Email: Gerald.Loeffler(+ at +)vienna.at > Smail: Apollo Imaging, Marchettigasse 7, A-1060 Vienna, Austria > Phone: +43 676 3289588 (+43 1 5952333 27) > Fax: +43 1 5952333 20 > Keywords: Java, CORBA, OOA&D, Databases, Bioinformatics, > Computational Biology, Computational Biophysics > > "Wir haben nichts zu berichten, als dass wir erbaermlich sind." > (Thomas Bernhard) > -= This is automatically added to each message by mailing script =- > CHEMISTRY %-% at %-% ccl.net -- To Everybody | CHEMISTRY-REQUEST %-% at %-% ccl.net -- To Admins > MAILSERV _-at-_)ccl.net -- HELP CHEMISTRY or HELP SEARCH > CHEMISTRY-SEARCH(+ at +)ccl.net -- archive search | Gopher: gopher.ccl.net 70 > Ftp: ftp.ccl.net | WWW: http://www.ccl.net/chemistry/ | Jan: jkl-!at!-ccl.net -- Dr Mark J Forster Ph.D. Principal Scientist Informatics Laboratory National Institute for Biological Standards and Control Blanche Lane, South Mimms, Hertfordshire EN6 3QG, United Kingdom. Tel +44 (0)1707 654753 FAX +44 (0)1707 646730 E-mail mforster:~at~:nibsc.ac.uk