From owner-chemistry@ccl.net Thu Dec 30 05:01:01 2021 From: "Yang Guo guoyang0123()gmail.com" To: CCL Subject: CCL:G: Quantum chemistry interoperability library? Message-Id: <-54552-211230045652-22245-zvc/0W50e3k0kb9hUXh1FA%%server.ccl.net> X-Original-From: Yang Guo Content-Type: multipart/alternative; boundary="0000000000004d497a05d45a121a" Date: Thu, 30 Dec 2021 17:56:31 +0800 MIME-Version: 1.0 Sent to CCL by: Yang Guo [guoyang0123- -gmail.com] --0000000000004d497a05d45a121a Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Dear Susi=EF=BC=8C This is a really interesting topic. To have such a format of files is not a very simple task, especially if you want to consider the symmetry of molecules. For C1 symmetry, I do have some scripts dealing with format transformation. There are also many tools developed by many people. For example, "Molecular Orbital KIT " by one of my previous colleagues. I just highlight some problems I ever meet, 1. The orders of angular momentum from different software are quite different. 2. The Cartesian and spherical basis sets may not be easy to consider at the same time. 3. The transformation matrix from Cartesian and spherical basis function is not unique, for high angular momentum shells. Different conventions are used by different packages. Looking forward to such tools, especially symmetry is considered. Best, Yang Susi Lehtola susi.lehtola[A]alumni.helsinki.fi =E4=BA=8E2021=E5=B9=B412=E6=9C=8830=E6=97=A5=E5=91=A8=E5=9B=9B 10:10=E5=86= =99=E9=81=93=EF=BC=9A > > Sent to CCL by: Susi Lehtola [susi.lehtola{=3D}alumni.helsinki.fi] > Hello, > > > > I am again hitting my head against the wall, since I am having trouble > passing data from one quantum chemistry code to another. > > What we are missing as a community is a standard interoperability > library for passing basis set and wave function data from one program to > another. The de facto standard is GAUSSIAN's formatted checkpoint > library, but it also has some deficiencies; for instance, it's not > machine precision. > > Because the library should store all the necessary data for at least SCF > wave functions, that is, the Gaussian basis set and the molecular > orbitals (MOs) and their occupation numbers, an interface to this > library could also serve as a tool for checkpointing calculations that > have not converged. > > Some pieces of the necessary functionality are certainly around. Many > quantum chemistry programs have implemented their own formatted > checkpoint i/o functions. Quantum chemistry analysis programs have > needed to implement their own parsers for the .fchk / MOLDEN / etc format= s. > > I do not think that writing such a common interface library should be > too difficult. All that is needed is > > 1. a data structure that is able to express the data in a common format a= nd > > 2. input and output functions to translate the data from/to specific > quantum chemistry code formats. > > In addition to the .fchk and molden formats, the common interface > library should also be able to read various programs' native data files > > from disk, like DENS and XDENS in TURBOMOLE. > > Since many pieces of the puzzle are already around, and the problem > affects the whole community, I would like to get everyone's feedback on > this idea. > > If there was, say, a portable open-source C++ library with C, Fortran > and Python frontends for handling molecular wave function data, would > you be willing to use it in your own program package? What kinds of > features would you need in it? Does such a library already exist? > > Susi > > PS. I work as a Software Scientist at the Molecular Sciences Software > Institute at Virginia Tech (http://molssi.org), but I am sending this > message from my Helsinki address since it's what I've used here for > close to a decade. > -- > ------------------------------------------------------------------ > Mr. Susi Lehtola, PhD Adjunct Professor > susi.lehtola##alumni.helsinki.fi University of Helsinki > http://susilehtola.github.io/ Finland > ------------------------------------------------------------------ > Susi Lehtola, FT dosentti > susi.lehtola##alumni.helsinki.fi Helsingin yliopisto > http://susilehtola.github.io/ > ------------------------------------------------------------------ > > > > -=3D This is automatically added to each message by the mailing script = =3D-> > > --0000000000004d497a05d45a121a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
=C2=A0Dear Susi=EF=BC=8C

This is a really interesting topic.=C2=A0
To= have such a format of files is not a very simple task, especially=C2=A0if = you want to consider the symmetry of molecules.

Fo= r C1 symmetry, I do have some scripts dealing with format transformation.
There are also many tools developed by many people. For example, &= quot;Molecular Orbital KIT " by one of my previous colleagues.

I just hi= ghlight some problems I ever meet,
1. The orders=C2=A0of angular = momentum from different software are quite different.=C2=A0
2. Th= e Cartesian and spherical=C2=A0basis sets may not be easy to consider at th= e same time.=C2=A0
3. The transformation matrix from Cartesian an= d spherical=C2=A0basis function is not unique, for high angular momentum sh= ells. Different conventions are used by different packages.=C2=A0

Looking forward to such tools, especially symmetry is consi= dered.

Best,
Yang

Susi Lehtola susi.lehto= la[A]alumni.helsink= i.fi <o= wner-chemistry- -ccl.net> =E4=BA=8E2021=E5=B9=B412=E6=9C=8830=E6=97=A5= =E5=91=A8=E5=9B=9B 10:10=E5=86=99=E9=81=93=EF=BC=9A

Sent to CCL by: Susi Lehtola [susi.lehtola{=3D}alumni.helsinki.fi]
Hello,



I am again hitting my head against the wall, since I am having trouble
passing data from one quantum chemistry code to another.

What we are missing as a community is a standard interoperability
library for passing basis set and wave function data from one program to another. The de facto standard is GAUSSIAN's formatted checkpoint
library, but it also has some deficiencies; for instance, it's not
machine precision.

Because the library should store all the necessary data for at least SCF wave functions, that is, the Gaussian basis set and the molecular
orbitals (MOs) and their occupation numbers, an interface to this
library could also serve as a tool for checkpointing calculations that
have not converged.

Some pieces of the necessary functionality are certainly around. Many
quantum chemistry programs have implemented their own formatted
checkpoint i/o functions. Quantum chemistry analysis programs have
needed to implement their own parsers for the .fchk / MOLDEN / etc formats.=

I do not think that writing such a common interface library should be
too difficult. All that is needed is

1. a data structure that is able to express the data in a common format and=

2. input and output functions to translate the data from/to specific
quantum chemistry code formats.

In addition to the .fchk and molden formats, the common interface
library should also be able to read various programs' native data files=
> from disk, like DENS and XDENS in TURBOMOLE.

Since many pieces of the puzzle are already around, and the problem
affects the whole community, I would like to get everyone's feedback on=
this idea.

If there was, say, a portable open-source C++ library with C, Fortran
and Python frontends for handling molecular wave function data, would
you be willing to use it in your own program package? What kinds of
features would you need in it? Does such a library already exist?

Susi

PS. I work as a Software Scientist at the Molecular Sciences Software
Institute at Virginia Tech (http://molssi.org), but I am sending this
message from my Helsinki address since it's what I've used here for=
close to a decade.
--
------------------------------------------------------------------
Mr. Susi Lehtola, PhD=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Adjunc= t Professor
susi.lehtola##alumni.helsinki.fi=C2=A0 =C2=A0University of Helsinki
http://susilehtola.github.io/=C2=A0 =C2=A0 =C2=A0Finland
------------------------------------------------------------------
Susi Lehtola, FT=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 dosentti
susi.lehtola##alumni.helsinki.fi=C2=A0 =C2=A0Helsingin yliopisto
http://susilehtola.github.io/
------------------------------------------------------------------



-=3D This is automatically added to each message by the mailing script =3D-=
E-mail to subscribers: CHEMISTRY- -ccl.net or use:
=C2=A0 =C2=A0 =C2=A0 http://www.ccl.net/cgi-bin/ccl/s= end_ccl_message

E-mail to administrators: CHEMISTRY-REQUEST- -ccl.net or use
=C2=A0 =C2=A0 =C2=A0 http://www.ccl.net/cgi-bin/ccl/s= end_ccl_message
=C2=A0 =C2=A0 =C2=A0 http://www.ccl.net/chemistry/sub_un= sub.shtml

Before posting, check wait time at: http://www.ccl.net

Job: http://www.ccl.net/jobs
Conferences: http://server.ccl.net/chemist= ry/announcements/conferences/

Search Messages: http://www.ccl.net/chemistry/sear= chccl/index.shtml
=C2=A0 =C2=A0 =C2=A0 http://www.ccl.net/spammers.txt

RTFI: http://www.ccl.net/chemistry/aboutccl/ins= tructions/


--0000000000004d497a05d45a121a-- From owner-chemistry@ccl.net Thu Dec 30 09:00:00 2021 From: "Geoffrey Hutchison geoff.hutchison\a/gmail.com" To: CCL Subject: CCL:G: Quantum chemistry interoperability library? Message-Id: <-54553-211230085109-341-NdopeChUVFi4TjkXX+YN4w],[server.ccl.net> X-Original-From: Geoffrey Hutchison Content-Type: multipart/alternative; boundary="Apple-Mail=_8591CB49-82F9-40CC-BCFC-3DA0A69BD523" Date: Thu, 30 Dec 2021 08:51:01 -0500 Mime-Version: 1.0 (Mac OS X Mail 15.0 \(3693.40.0.1.81\)) Sent to CCL by: Geoffrey Hutchison [geoff.hutchison###gmail.com] --Apple-Mail=_8591CB49-82F9-40CC-BCFC-3DA0A69BD523 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii This was at least part of the point of the QC JSON schema. MolSSI sponsored at least one workshop where this was discussed and a = draft was written. At that workshop several years ago, in which many developers, including = quantum programs (NWChem, Q-Chem, Psi4, Molpro, GAMESS .. I know pretty = much the entire community was invited), analysis programs (Horton), = community programs (cclib, Open Babel, Avogadro, Jmol, etc.) were there. = It was a big enough meeting that I'm sure I missed some people. Issues of normalization, metadata, input files (e.g. keywords), = program-specific features, etc were all discussed. The idea was to first get out a "better Molden" interchange. So if you (personally) and/or MolSSI want to push this again, it would = be great. Perhaps either a virtual meeting or discussion forum can get things = moving again? Cheers, -Geoff P.S. The work and some discussion is here: = https://github.com/MolSSI/QCSchema --- Prof. Geoffrey Hutchison Department of Chemistry University of Pittsburgh tel: (412) 648-0492 email: geoffh[*]pitt.edu twitter: [*]ghutchis web: https://hutchison.chem.pitt.edu/ > On Dec 29, 2021, at 8:01 PM, Susi Lehtola = susi.lehtola[A]alumni.helsinki.fi wrote: >=20 >=20 > Sent to CCL by: Susi Lehtola [susi.lehtola{=3D}alumni.helsinki.fi] > Hello, >=20 >=20 >=20 > I am again hitting my head against the wall, since I am having trouble > passing data from one quantum chemistry code to another. >=20 > What we are missing as a community is a standard interoperability > library for passing basis set and wave function data from one program = to > another. The de facto standard is GAUSSIAN's formatted checkpoint > library, but it also has some deficiencies; for instance, it's not > machine precision. >=20 > Because the library should store all the necessary data for at least = SCF > wave functions, that is, the Gaussian basis set and the molecular > orbitals (MOs) and their occupation numbers, an interface to this > library could also serve as a tool for checkpointing calculations that > have not converged. >=20 > Some pieces of the necessary functionality are certainly around. Many > quantum chemistry programs have implemented their own formatted > checkpoint i/o functions. Quantum chemistry analysis programs have > needed to implement their own parsers for the .fchk / MOLDEN / etc = formats. >=20 > I do not think that writing such a common interface library should be > too difficult. All that is needed is >=20 > 1. a data structure that is able to express the data in a common = format and >=20 > 2. input and output functions to translate the data from/to specific > quantum chemistry code formats. >=20 > In addition to the .fchk and molden formats, the common interface > library should also be able to read various programs' native data = files >> from disk, like DENS and XDENS in TURBOMOLE. >=20 > Since many pieces of the puzzle are already around, and the problem > affects the whole community, I would like to get everyone's feedback = on > this idea. >=20 > If there was, say, a portable open-source C++ library with C, Fortran > and Python frontends for handling molecular wave function data, would > you be willing to use it in your own program package? What kinds of > features would you need in it? Does such a library already exist? >=20 > Susi >=20 > PS. I work as a Software Scientist at the Molecular Sciences Software > Institute at Virginia Tech (http://molssi.org), but I am sending this > message from my Helsinki address since it's what I've used here for > close to a decade. > --=20 > ------------------------------------------------------------------ > Mr. Susi Lehtola, PhD Adjunct Professor > susi.lehtola##alumni.helsinki.fi University of Helsinki > http://susilehtola.github.io/ Finland > ------------------------------------------------------------------ > Susi Lehtola, FT dosentti > susi.lehtola##alumni.helsinki.fi Helsingin yliopisto > http://susilehtola.github.io/ > ------------------------------------------------------------------ >=20 >=20 >=20 > -=3D This is automatically added to each message by the mailing script = =3D- > To recover the email address of the author of the message, please = change>=20>=20>=20 > Subscribe/Unsubscribe:=20>=20>=20 > Job: http://www.ccl.net/jobs=20 > Conferences: = http://server.ccl.net/chemistry/announcements/conferences/ >=20>=20>=20>=20 >=20 --Apple-Mail=_8591CB49-82F9-40CC-BCFC-3DA0A69BD523 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii
This was at least part of the point of the QC JSON = schema.

MolSSI = sponsored at least one workshop where this was discussed and a draft was = written.

At = that workshop several years ago, in which many developers, including = quantum programs (NWChem, Q-Chem, Psi4, Molpro, GAMESS .. I know pretty = much the entire community was invited), analysis programs (Horton), = community programs (cclib, Open Babel, Avogadro, Jmol, etc.) were there. = It was a big enough meeting that I'm sure I missed some = people.

Issues = of normalization, metadata, input files (e.g. keywords), = program-specific features, etc were all discussed.

The idea was to first = get out a "better Molden" interchange.

So if you (personally) and/or MolSSI = want to push this again, it would be great.

Perhaps either a virtual meeting or = discussion forum can get things moving again?

Cheers,
-Geoff

P.S. The work and some discussion is here: https://github.com/MolSSI/QCSchema

---
Prof. Geoffrey = Hutchison
Department of Chemistry
University = of Pittsburgh
tel: (412) 648-0492
email: geoffh[*]pitt.edu
twitter: [*]ghutchis
web: https://hutchison.chem.pitt.edu/

On Dec 29, 2021, at 8:01 PM, Susi Lehtola susi.lehtola[A]alumni.helsinki.fi = <owner-chemistry[*]ccl.net> wrote:


Sent to CCL by: Susi Lehtola [susi.lehtola{=3D}alumni.helsinki.fi]
Hello,



I am again hitting my head against the wall, since I am = having trouble
passing data from one quantum chemistry = code to another.

What we are missing as a = community is a standard interoperability
library for = passing basis set and wave function data from one program to
another. The de facto standard is GAUSSIAN's formatted = checkpoint
library, but it also has some deficiencies; for = instance, it's not
machine precision.

Because the library should store all the necessary data for = at least SCF
wave functions, that is, the Gaussian basis = set and the molecular
orbitals (MOs) and their occupation = numbers, an interface to this
library could also serve as = a tool for checkpointing calculations that
have not = converged.

Some pieces of the necessary = functionality are certainly around. Many
quantum chemistry = programs have implemented their own formatted
checkpoint = i/o functions. Quantum chemistry analysis programs have
needed to implement their own parsers for the .fchk / MOLDEN = / etc formats.

I do not think that writing = such a common interface library should be
too difficult. = All that is needed is

1. a data structure = that is able to express the data in a common format and

2. input and output functions to translate the data from/to = specific
quantum chemistry code formats.

In addition to the .fchk and molden formats, the common = interface
library should also be able to read various = programs' native data files
from disk, like DENS and XDENS in TURBOMOLE.

Since many pieces of the puzzle = are already around, and the problem
affects the whole = community, I would like to get everyone's feedback on
this = idea.

If there was, say, a portable = open-source C++ library with C, Fortran
and Python = frontends for handling molecular wave function data, would
you be willing to use it in your own program package? What = kinds of
features would you need in it? Does such a = library already exist?

Susi
PS. I work as a Software Scientist at the Molecular Sciences = Software
Institute at Virginia Tech (http://molssi.org), but I am = sending this
message from my Helsinki address since it's = what I've used here for
close to a decade.
-- =
---------------------------------------------------------------= ---
Mr. Susi Lehtola, PhD =             Ad= junct Professor
susi.lehtola##alumni.helsinki.fi =   University of Helsinki
http://susilehtola.github.io/ =     Finland
---------------------------------------------------------------= ---
Susi Lehtola, FT =             &n= bsp;    dosentti
susi.lehtola##alumni.helsinki.fi =   Helsingin yliopisto
http://susilehtola.github.io/
---------------------------------------------------------------= ---



-=3D This = is automatically added to each message by the mailing script =3D-
To recover the email address of the author of the message, = please change
the strange characters on the top line to = the [*] sign. You can also
look up the X-Original-From: line = in the mail header.

E-mail to subscribers: = CHEMISTRY[*]ccl.net or use:
=      http://www.ccl.net/cgi-bin/ccl/send_ccl_mess= age

E-mail to administrators: = CHEMISTRY-REQUEST[*]ccl.net or use
=      http://www.ccl.net/cgi-bin/ccl/send_ccl_mess= age

=      http://www.ccl.net/chemistry/sub_unsub.shtml=

Before posting, check wait time at: = http://www.ccl.net

Job: = http://www.ccl.net/jobs
Conferences: = http://server.ccl.net/chemistry/announcements/conferences/

Search Messages: = http://www.ccl.net/chemistry/searchccl/index.shtml

=      http://www.ccl.net/spammers.txt

RTFI: = http://www.ccl.net/chemistry/aboutccl/instructions/



= --Apple-Mail=_8591CB49-82F9-40CC-BCFC-3DA0A69BD523-- From owner-chemistry@ccl.net Thu Dec 30 10:41:00 2021 From: "Michaelo Frisch frisch+*+gaussian.com" To: CCL Subject: CCL:G: Quantum chemistry interoperability library? Message-Id: <-54554-211230103718-4877-+gA8Eam0qFCSFnFgsNmH1Q[]server.ccl.net> X-Original-From: "Michaelo Frisch" Date: Thu, 30 Dec 2021 10:37:16 -0500 Sent to CCL by: "Michaelo Frisch" [frisch]~[gaussian.com] A couple of people have pointed out the need for such an interoperability library and noted some of the issues involved. We have attempted to address these issues. As was noted previously, people often use Gaussian formatted checkpoint files for this purpose, but these have several deficiencies. fchk files were originally designed to facilitate post-processing of results. In this context, modest precision is adeequate and using text files has several advantages. For communication of intermediate results as part of having different programs work together on a calculation, the loss of precision is unacceptable. Also, for large calculations, text files are verbose even with modest precision, and slow to process. Just reading in and parsing the 1gb fchk file from a 3000 atom frequency calculation can be tedious. Also, since fchk files were originally intended for post-processing results from Gaussian, some of the data in them is stored in a way which reflects Ggaussian's internal data structures and is not intuitive for people working in other environments. To address these problems, we have a new file format. We originally called this a "matrix element file" but have switched to the name "binary array file" which is more descriptive of its structure. Like the fchk file, this is a self-defining file, but it is binary so that full precision can be retained and reading/writing the file is much faster. The information which is common to all atomistic simulations is stored in a straightforward arrangement which is easy to move back and forth from any program's internal data structures. Gaussian 16 can use this files for interfacing via its External keyword and data can be moved to and from Gaussian's internal data using the formchk and unfchk utilities. We also provide a library to read, write, and use these files which is completely separate from Gaussian and which is open-source. It is provided under a slightly modified version of the Mozilla license, which permits incorporate in other software with or without distribution of source, as long as proper attribution is made. This provides for easy access to these files from compiled languages such as Fortran and c, and from Perl and Python. The Python interface is fully object-oriented. Details about the file format and download of a zip file with the open-source interfacing library can be found at https://gaussian.com/interfacing This version is set up to build using make. An new version which includes installation of the Python interface via wheels and smooth integration with Jupyter notebooks will be made available next month. Mike Frisch From owner-chemistry@ccl.net Thu Dec 30 11:16:01 2021 From: "Dr.N Sukumar n.sukumar++snu.edu.in" To: CCL Subject: CCL:G: Quantum chemistry interoperability library? Message-Id: <-54555-211230022450-23004-vRT4Udv54SHWpq/TgaEbRA*server.ccl.net> X-Original-From: "Dr.N Sukumar" Content-Type: multipart/alternative; boundary="000000000000add83005d457f269" Date: Thu, 30 Dec 2021 12:54:30 +0530 MIME-Version: 1.0 Sent to CCL by: "Dr.N Sukumar" [n.sukumar]![snu.edu.in] --000000000000add83005d457f269 Content-Type: text/plain; charset="UTF-8" There is also significant disparity in how different codes (Gaussian, Orca, AIMAll, Multiwfn) write or parse wavefunction (WFN) files from post-Hartree-Fock (CASSCF, CCSD) calculations. *N. SukumarProfessor of ChemistryDirector, Center for Informatics**Shiv Nadar University, India* https://chemistry.snu.edu.in/people/faculty/n-sukumar "To call a physical system non-linear is like calling the majority of animals non-elephants" - *Stan Ulam* On Thu, Dec 30, 2021 at 7:39 AM Susi Lehtola susi.lehtola[A] alumni.helsinki.fi wrote: > > Sent to CCL by: Susi Lehtola [susi.lehtola{=}alumni.helsinki.fi] > Hello, > > > > I am again hitting my head against the wall, since I am having trouble > passing data from one quantum chemistry code to another. > > What we are missing as a community is a standard interoperability > library for passing basis set and wave function data from one program to > another. The de facto standard is GAUSSIAN's formatted checkpoint > library, but it also has some deficiencies; for instance, it's not > machine precision. > > Because the library should store all the necessary data for at least SCF > wave functions, that is, the Gaussian basis set and the molecular > orbitals (MOs) and their occupation numbers, an interface to this > library could also serve as a tool for checkpointing calculations that > have not converged. > > Some pieces of the necessary functionality are certainly around. Many > quantum chemistry programs have implemented their own formatted > checkpoint i/o functions. Quantum chemistry analysis programs have > needed to implement their own parsers for the .fchk / MOLDEN / etc formats. > > I do not think that writing such a common interface library should be > too difficult. All that is needed is > > 1. a data structure that is able to express the data in a common format and > > 2. input and output functions to translate the data from/to specific > quantum chemistry code formats. > > In addition to the .fchk and molden formats, the common interface > library should also be able to read various programs' native data files > > from disk, like DENS and XDENS in TURBOMOLE. > > Since many pieces of the puzzle are already around, and the problem > affects the whole community, I would like to get everyone's feedback on > this idea. > > If there was, say, a portable open-source C++ library with C, Fortran > and Python frontends for handling molecular wave function data, would > you be willing to use it in your own program package? What kinds of > features would you need in it? Does such a library already exist? > > Susi > > PS. I work as a Software Scientist at the Molecular Sciences Software > Institute at Virginia Tech (http://molssi.org), but I am sending this > message from my Helsinki address since it's what I've used here for > close to a decade. > -- > ------------------------------------------------------------------ > Mr. Susi Lehtola, PhD Adjunct Professor > susi.lehtola##alumni.helsinki.fi University of Helsinki > http://susilehtola.github.io/ Finland > ------------------------------------------------------------------ > Susi Lehtola, FT dosentti > susi.lehtola##alumni.helsinki.fi Helsingin yliopisto > http://susilehtola.github.io/ > ------------------------------------------------------------------> > > --000000000000add83005d457f269 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
There is also significant disparity in how different codes= (Gaussian, Orca, AIMAll, Multiwfn) write or parse wavefunction (WFN) files= from post-Hartree-Fock (CASSCF, CCSD) calculations.
=
<= div dir=3D"ltr">
=
N. Sukumar
Professor of ChemistryDirector, Center for Informatics
Shiv Nadar University, India
https://ch= emistry.snu.edu.in/people/faculty/n-sukumar

"To call= a physical system non-linear is like calling the majority of animals non-e= lephants" - Stan Ulam


On Thu, Dec 30, 2021 at 7:39 AM Susi Lehtol= a susi.lehtola[A]alumni.helsinki.fi <owner-chemistry-x-ccl.net> wrote:
Sent to CCL by: Susi Lehtola [susi.lehtola{=3D}alumni.helsinki.fi]
Hello,



I am again hitting my head against the wall, since I am having trouble
passing data from one quantum chemistry code to another.

What we are missing as a community is a standard interoperability
library for passing basis set and wave function data from one program to another. The de facto standard is GAUSSIAN's formatted checkpoint
library, but it also has some deficiencies; for instance, it's not
machine precision.

Because the library should store all the necessary data for at least SCF wave functions, that is, the Gaussian basis set and the molecular
orbitals (MOs) and their occupation numbers, an interface to this
library could also serve as a tool for checkpointing calculations that
have not converged.

Some pieces of the necessary functionality are certainly around. Many
quantum chemistry programs have implemented their own formatted
checkpoint i/o functions. Quantum chemistry analysis programs have
needed to implement their own parsers for the .fchk / MOLDEN / etc formats.=

I do not think that writing such a common interface library should be
too difficult. All that is needed is

1. a data structure that is able to express the data in a common format and=

2. input and output functions to translate the data from/to specific
quantum chemistry code formats.

In addition to the .fchk and molden formats, the common interface
library should also be able to read various programs' native data files=
> from disk, like DENS and XDENS in TURBOMOLE.

Since many pieces of the puzzle are already around, and the problem
affects the whole community, I would like to get everyone's feedback on=
this idea.

If there was, say, a portable open-source C++ library with C, Fortran
and Python frontends for handling molecular wave function data, would
you be willing to use it in your own program package? What kinds of
features would you need in it? Does such a library already exist?

Susi

PS. I work as a Software Scientist at the Molecular Sciences Software
Institute at Virginia Tech (http://molssi.org), but I am sending this
message from my Helsinki address since it's what I've used here for=
close to a decade.
--
------------------------------------------------------------------
Mr. Susi Lehtola, PhD=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Adjunc= t Professor
susi.lehtola##alumni.helsinki.fi=C2=A0 =C2=A0University of Helsinki
http://susilehtola.github.io/=C2=A0 =C2=A0 =C2=A0Finland
------------------------------------------------------------------
Susi Lehtola, FT=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 dosentti
susi.lehtola##alumni.helsinki.fi=C2=A0 =C2=A0Helsingin yliopisto
http://susilehtola.github.io/
------------------------------------------------------------------



-=3D This is automatically added to each message by the mailing script =3D-=
E-mail to subscribers: CHEMISTRY-x-ccl.net or use:
=C2=A0 =C2=A0 =C2=A0 http://www.ccl.net/cgi-bin/ccl/s= end_ccl_message

E-mail to administrators: CHEMISTRY-REQUEST-x-ccl.net or use
=C2=A0 =C2=A0 =C2=A0 http://www.ccl.net/cgi-bin/ccl/s= end_ccl_message
=C2=A0 =C2=A0 =C2=A0 http://www.ccl.net/chemistry/sub_un= sub.shtml

Before posting, check wait time at: http://www.ccl.net

Job: http://www.ccl.net/jobs
Conferences: http://server.ccl.net/chemist= ry/announcements/conferences/

Search Messages: http://www.ccl.net/chemistry/sear= chccl/index.shtml
=C2=A0 =C2=A0 =C2=A0 http://www.ccl.net/spammers.txt

RTFI: http://www.ccl.net/chemistry/aboutccl/ins= tructions/


--000000000000add83005d457f269-- From owner-chemistry@ccl.net Thu Dec 30 11:51:01 2021 From: "Susi Lehtola susi.lehtola(-)alumni.helsinki.fi" To: CCL Subject: CCL: Quantum chemistry interoperability library? Message-Id: <-54556-211230110834-19857-mN1jBFwbHcoqr9tDQ0xO2w++server.ccl.net> X-Original-From: Susi Lehtola Content-Language: en-US Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=UTF-8 Date: Thu, 30 Dec 2021 11:08:22 -0500 MIME-Version: 1.0 Sent to CCL by: Susi Lehtola [susi.lehtola[*]alumni.helsinki.fi] On 12/30/21 08:51, Geoffrey Hutchison geoff.hutchisona/gmail.com wrote: > This was at least part of the point of the QC JSON schema. > > MolSSI sponsored at least one workshop where this was discussed and a > draft was written. > > At that workshop several years ago, in which many developers, including > quantum programs (NWChem, Q-Chem, Psi4, Molpro, GAMESS .. I know pretty > much the entire community was invited), analysis programs (Horton), > community programs (cclib, Open Babel, Avogadro, Jmol, etc.) were there. > It was a big enough meeting that I'm sure I missed some people. > > Issues of normalization, metadata, input files (e.g. keywords), > program-specific features, etc were all discussed. > > The idea was to first get out a "better Molden" interchange. > > So if you (personally) and/or MolSSI want to push this again, it would > be great. > > Perhaps either a virtual meeting or discussion forum can get things > moving again? Definitely. My point is that the issues with the various basis function orderings and normalizations already exist in QCSchema / QCEngine infrastructure, since it allows you to store molecular orbital coefficients. I am interested in an implementation at a lower level: a communication library that is easy to hook up to to read and write wave function data. The internal library format could be something like QCSchema i.e. data stored as JSON, but the code could also have i/o for Molden and formatted checkpoint. Yang Guo pointed out that "The transformation matrix from Cartesian and spherical basis function is not unique, for high angular momentum shells. Different conventions are used by different packages." I don't think I have ever run into this kind of an issue. As far as I know, the fundamental transformations are the same in all codes; the only differences arise from the normalizations and the orderings of the cartesians and spherical functions, which have MANY standards. -- ------------------------------------------------------------------ Mr. Susi Lehtola, PhD Adjunct Professor susi.lehtola===alumni.helsinki.fi University of Helsinki http://susilehtola.github.io/ Finland ------------------------------------------------------------------ Susi Lehtola, FT dosentti susi.lehtola===alumni.helsinki.fi Helsingin yliopisto http://susilehtola.github.io/ ------------------------------------------------------------------ From owner-chemistry@ccl.net Thu Dec 30 12:43:01 2021 From: "Reichert, David reichertd#%#wustl.edu" To: CCL Subject: CCL:G: Quantum chemistry interoperability library? Message-Id: <-54557-211229223700-3045-fIze3919eeE0EKyJbz2PwQ[*]server.ccl.net> X-Original-From: "Reichert, David" Content-Language: en-US Content-Type: multipart/alternative; boundary="_000_DM6PR02MB5724EC0867822B1EF1949B80BB459DM6PR02MB5724namp_" Date: Thu, 30 Dec 2021 03:36:41 +0000 MIME-Version: 1.0 Sent to CCL by: "Reichert, David" [reichertd---wustl.edu] --_000_DM6PR02MB5724EC0867822B1EF1949B80BB459DM6PR02MB5724namp_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable I=92ve no first-hand experience with it but doesn=92t the cclib package (wr= itten in python) handle most of what you=92d like to see? -Dave David E. Reichert, PhD Associate Professor of Radiology Washington University School of Medicine reichertd(!)wustl.edu https://reichertlab.info/ > From: owner-chemistry+reichertd=3D=3Dwustl.edu(!)ccl.net on behalf of Susi Lehtola susi.lehtola[A]alu= mni.helsinki.fi Date: Wednesday, December 29, 2021 at 8:15 PM To: Reichert, David Subject: CCL:G: Quantum chemistry interoperability library? * External Email - Caution * Sent to CCL by: Susi Lehtola [susi.lehtola{=3D}alumni.helsinki.fi] Hello, I am again hitting my head against the wall, since I am having trouble passing data from one quantum chemistry code to another. What we are missing as a community is a standard interoperability library for passing basis set and wave function data from one program to another. The de facto standard is GAUSSIAN's formatted checkpoint library, but it also has some deficiencies; for instance, it's not machine precision. Because the library should store all the necessary data for at least SCF wave functions, that is, the Gaussian basis set and the molecular orbitals (MOs) and their occupation numbers, an interface to this library could also serve as a tool for checkpointing calculations that have not converged. Some pieces of the necessary functionality are certainly around. Many quantum chemistry programs have implemented their own formatted checkpoint i/o functions. Quantum chemistry analysis programs have needed to implement their own parsers for the .fchk / MOLDEN / etc formats. I do not think that writing such a common interface library should be too difficult. All that is needed is 1. a data structure that is able to express the data in a common format and 2. input and output functions to translate the data from/to specific quantum chemistry code formats. In addition to the .fchk and molden formats, the common interface library should also be able to read various programs' native data files > from disk, like DENS and XDENS in TURBOMOLE. Since many pieces of the puzzle are already around, and the problem affects the whole community, I would like to get everyone's feedback on this idea. If there was, say, a portable open-source C++ library with C, Fortran and Python frontends for handling molecular wave function data, would you be willing to use it in your own program package? What kinds of features would you need in it? Does such a library already exist? Susi PS. I work as a Software Scientist at the Molecular Sciences Software Institute at Virginia Tech (http://molssi.org), but I am sending this message from my Helsinki address since it's what I've used here for close to a decade. -- ------------------------------------------------------------------ Mr. Susi Lehtola, PhD Adjunct Professor susi.lehtola##alumni.helsinki.fi University of Helsinki http://susilehtola.github.io/ Finland ------------------------------------------------------------------ Susi Lehtola, FT dosentti susi.lehtola##alumni.helsinki.fi Helsingin yliopisto http://susilehtola.github.io/ ------------------------------------------------------------------ -=3D This is automatically added to each message by the mailing script =3D-http://www.ccl.net/cgi-bin/ccl/send_ccl_messagehttp://www.ccl.net/chemistry/sub_unsub.shtmlhttp://www.ccl.net/spammers.txt________________________________ The materials in this message are private and may contain Protected Healthc= are Information or other information of a sensitive nature. If you are not = the intended recipient, be advised that any unauthorized use, disclosure, c= opying or the taking of any action in reliance on the contents of this info= rmation is strictly prohibited. If you have received this email in error, p= lease immediately notify the sender via telephone or return mail. --_000_DM6PR02MB5724EC0867822B1EF1949B80BB459DM6PR02MB5724namp_ Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable

I=92ve no first-han= d experience with it but doesn=92t the cclib package (written in python) ha= ndle most of what you=92d like to see?

-Dave

 

David E. Reichert, PhD

Associate Professor of Radiology

Washington University School of Medicine

 

reichertd(!)wustl.edu

https://reichertlab.info/<= /p>

 

 

From: owner-chemistry+rei= chertd=3D=3Dwustl.edu(!)ccl.net <owner-chemistry+reichertd=3D=3Dwustl.edu(!)= ccl.net> on behalf of Susi Lehtola susi.lehtola[A]alumni.helsinki.fi <= ;owner-chemistry(!)ccl.net>
Date: Wednesday, December 29, 2021 at 8:15 PM
To: Reichert, David <reichertd(!)wustl.edu>
Subject: CCL:G: Quantum chemistry interoperability library?

* External Email - Caution *

Sent to CCL by: Susi Lehtola [susi.lehtola{=3D}alumni.helsinki.fi]
Hello,



I am again hitting my head against the wall, since I am having trouble
passing data from one quantum chemistry code to another.

What we are missing as a community is a standard interoperability
library for passing basis set and wave function data from one program to another. The de facto standard is GAUSSIAN's formatted checkpoint
library, but it also has some deficiencies; for instance, it's not
machine precision.

Because the library should store all the necessary data for at least SCF wave functions, that is, the Gaussian basis set and the molecular
orbitals (MOs) and their occupation numbers, an interface to this
library could also serve as a tool for checkpointing calculations that
have not converged.

Some pieces of the necessary functionality are certainly around. Many
quantum chemistry programs have implemented their own formatted
checkpoint i/o functions. Quantum chemistry analysis programs have
needed to implement their own parsers for the .fchk / MOLDEN / etc formats.=

I do not think that writing such a common interface library should be
too difficult. All that is needed is

1. a data structure that is able to express the data in a common format and=

2. input and output functions to translate the data from/to specific
quantum chemistry code formats.

In addition to the .fchk and molden formats, the common interface
library should also be able to read various programs' native data files
> from disk, like DENS and XDENS in TURBOMOLE.

Since many pieces of the puzzle are already around, and the problem
affects the whole community, I would like to get everyone's feedback on
this idea.

If there was, say, a portable open-source C++ library with C, Fortran
and Python frontends for handling molecular wave function data, would
you be willing to use it in your own program package? What kinds of
features would you need in it? Does such a library already exist?

Susi

PS. I work as a Software Scientist at the Molecular Sciences Software
Institute at Virginia Tech (http://molssi.org= ), but I am sending this
message from my Helsinki address since it's what I've used here for
close to a decade.
--
------------------------------------------------------------------
Mr. Susi Lehtola, PhD         =     Adjunct Professor
susi.lehtola##alumni.helsinki.fi   University of Helsinki
http://susilehtola.github.io/=      Finland
------------------------------------------------------------------
Susi Lehtola, FT          = ;        dosentti
susi.lehtola##alumni.helsinki.fi   Helsingin yliopisto
http://susilehtola.github.io/=
------------------------------------------------------------------



-=3D This is automatically added to each message by the mailing script =3D-=       http://www.ccl.net/cgi-bin/ccl/send_ccl_message
      http://www.ccl.net/cgi-bin/ccl/send_ccl_message

Subscribe/Unsubscribe:
      http://www.ccl.net/chemistry/sub_unsub.shtml

Before posting, check wait time at: http://w= ww.ccl.net

Job: http://www.ccl.net/jobs
Conferences: http://server.ccl.net/chemistry/announcements/conferences/

Search Messages: http://www.ccl.net/chemistry/searchccl/index.shtml
      =

RTFI: http:= //www.ccl.net/chemistry/aboutccl/instructions/

 


The materials in this message a= re private and may contain Protected Healthcare Information or other inform= ation of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or th= e taking of any action in reliance on the contents of this information is s= trictly prohibited. If you have received this email in error, please immedi= ately notify the sender via telephone or return mail.

--_000_DM6PR02MB5724EC0867822B1EF1949B80BB459DM6PR02MB5724namp_-- From owner-chemistry@ccl.net Thu Dec 30 13:17:00 2021 From: "Brian Skinn brian.skinn###gmail.com" To: CCL Subject: CCL:G: Quantum chemistry interoperability library? Message-Id: <-54558-211230131108-15475-IGWnYIicdIUBLUY+548zzA^^server.ccl.net> X-Original-From: Brian Skinn Content-Type: multipart/alternative; boundary="0000000000001fbb4a05d460faa4" Date: Thu, 30 Dec 2021 13:10:51 -0500 MIME-Version: 1.0 Sent to CCL by: Brian Skinn [brian.skinn[a]gmail.com] --0000000000001fbb4a05d460faa4 Content-Type: text/plain; charset="UTF-8" Mike (and others), Given the significant work that has already been done on this 'interfacing' file format, this suggestion is very late to the gate, but -- was any consideration made of the HDF5 file format ( https://www.hdfgroup.org/solutions/hdf5/) for this? It has the advantage of being a well established format in other disciplines, with mature bindings available for a number of languages such as Python. It also provides a number of functional features such as direct indexing into on-disk files, chunked storage and retrieval (if I recall correctly, at least), on-the-fly (de)compression, and support for working with larger-than-available memory files. I could see it having been considered and found unsuitable for the purpose, though -- such that a domain-specific format like 'interfacing' is preferable on the merits. Anyways, just a thought. I will be happy to see *any* such standardized format as this put to widespread implementation, for my own use! -Brian On Thu, Dec 30, 2021 at 1:01 PM Michaelo Frisch frisch+*+gaussian.com < owner-chemistry(!)ccl.net> wrote: > > Sent to CCL by: "Michaelo Frisch" [frisch]~[gaussian.com] > A couple of people have pointed out the need for such an interoperability > library > and noted some of the issues involved. We have attempted to address these > issues. > > As was noted previously, people often use Gaussian formatted checkpoint > files > for this purpose, but these have several deficiencies. fchk files were > originally > designed to facilitate post-processing of results. In this context, modest > precision is adeequate and using text files has several advantages. For > communication of intermediate results as part of having different programs > work > together on a calculation, the loss of precision is unacceptable. Also, > for > large calculations, text files are verbose even with modest precision, and > slow > to process. Just reading in and parsing the 1gb fchk file from a 3000 > atom frequency calculation can be tedious. Also, since fchk files were > originally intended for post-processing results from Gaussian, some of the > data in them is stored in a way which reflects Ggaussian's internal data > structures and is not intuitive for people working in other environments. > > To address these problems, we have a new file format. We originally called > this a "matrix element file" but have switched to the name "binary array > file" > which is more descriptive of its structure. Like the fchk file, this is a > self-defining file, but it is binary so that full precision can be retained > and reading/writing the file is much faster. The information which is > common > to all atomistic simulations is stored in a straightforward arrangement > which > is easy to move back and forth from any program's internal data structures. > > Gaussian 16 can use this files for interfacing via its External keyword > and data > can be moved to and from Gaussian's internal data using the formchk and > unfchk > utilities. > > We also provide a library to read, write, and use these files which is > completely separate from Gaussian and which is open-source. It is > provided under a slightly modified version of the Mozilla license, > which permits incorporate in other software with or without distribution > of source, as long as proper attribution is made. This provides for > easy access to these files from compiled languages such as Fortran and c, > and from Perl and Python. The Python interface is fully object-oriented. > > Details about the file format and download of a zip file with the > open-source > interfacing library can be found at https://gaussian.com/interfacing > This version is set up to build using make. An new version which includes > installation of the Python interface via wheels and smooth integration > with Jupyter notebooks will be made available next month. > > Mike Frisch> > > --0000000000001fbb4a05d460faa4 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Mike (and others),

Given the significan= t work that has already been done on this 'interfacing' file format= , this suggestion is very late to the gate, but -- was any consideration ma= de of the HDF5 file format (https://www.hdfgroup.org/solutions/hdf5/) for t= his?

It has the advantage of being a well establis= hed format in other disciplines, with mature bindings available for a numbe= r of languages such as Python.

It also provides a = number of functional features such as direct indexing into on-disk files, c= hunked storage and retrieval (if I recall correctly, at least), on-the-fly = (de)compression, and support for working with larger-than-available memory = files.

I could see it having been considered and f= ound unsuitable for the purpose, though -- such that a domain-specific form= at like 'interfacing' is preferable on the merits.


Anyways, just a thought. I will be happy to see *an= y* such standardized format as this put to widespread implementation, for m= y own use!


-Brian
On Thu, = Dec 30, 2021 at 1:01 PM Michaelo Frisch frisch+*+gaussian.com <owner-chemistry(!)ccl.net> wrote:

Sent to CCL by: "Michaelo=C2=A0 Frisch" [frisch]~[gaussian.com]
A couple of people have pointed out the need for such an interoperability l= ibrary
and noted some of the issues involved.=C2=A0 We have attempted to address t= hese issues.

As was noted previously, people often use Gaussian formatted checkpoint fil= es
for this purpose, but these have several deficiencies.=C2=A0 fchk files wer= e originally
designed to facilitate post-processing of results.=C2=A0 In this context, m= odest
precision is adeequate and using text files has several advantages.=C2=A0 F= or communication of intermediate results as part of having different progra= ms work
together on a calculation, the loss of precision is unacceptable.=C2=A0 Als= o, for
large calculations, text files are verbose even with modest precision, and = slow
to process.=C2=A0 Just reading in and parsing the 1gb fchk file from a 3000= atom frequency calculation can be tedious.=C2=A0 Also, since fchk files we= re originally intended for post-processing results from Gaussian, some of t= he data in them is stored in a way which reflects Ggaussian's internal = data structures and is not intuitive for people working in other environmen= ts.

To address these problems, we have a new file format.=C2=A0 We originally c= alled
this a "matrix element file" but have switched to the name "= binary array file"
which is more descriptive of its structure.=C2=A0 Like the fchk file, this = is a
self-defining file, but it is binary so that full precision can be retained=
and reading/writing the file is much faster.=C2=A0 The information which is= common
to all atomistic simulations is stored in a straightforward arrangement whi= ch
is easy to move back and forth from any program's internal data structu= res.

Gaussian 16 can use this files for interfacing via its External keyword and= data
can be moved to and from Gaussian's internal data using the formchk and= unfchk
utilities.=C2=A0

We also provide a library to read, write, and use these files which is
completely separate from Gaussian and which is open-source.=C2=A0 It is provided under a slightly modified version of the Mozilla license,
which permits incorporate in other software with or without distribution of source, as long as proper attribution is made.=C2=A0 This provides for easy access to these files from compiled languages such as Fortran and c, and from Perl and Python.=C2=A0 The Python interface is fully object-orient= ed.

Details about the file format and download of a zip file with the open-sour= ce
interfacing library can be found at https://gaussian.com/interfacing=
This version is set up to build using make.=C2=A0 An new version which incl= udes
installation of the Python interface via wheels and smooth integration
with Jupyter notebooks will be made available next month.

Mike Frisch



-=3D This is automatically added to each message by the mailing script =3D-=
E-mail to subscribers: CHEMISTRY(!)ccl.net or use:
=C2=A0 =C2=A0 =C2=A0 http://www.ccl.net/cgi-bin/ccl/s= end_ccl_message

E-mail to administrators: CHEMISTRY-REQUEST(!)ccl.net or use
=C2=A0 =C2=A0 =C2=A0 http://www.ccl.net/cgi-bin/ccl/s= end_ccl_message
=C2=A0 =C2=A0 =C2=A0 http://www.ccl.net/chemistry/sub_un= sub.shtml

Before posting, check wait time at: http://www.ccl.net

Job: http://www.ccl.net/jobs
Conferences: http://server.ccl.net/chemist= ry/announcements/conferences/

Search Messages: http://www.ccl.net/chemistry/sear= chccl/index.shtml
=C2=A0 =C2=A0 =C2=A0 http://www.ccl.net/spammers.txt

RTFI: http://www.ccl.net/chemistry/aboutccl/ins= tructions/


--0000000000001fbb4a05d460faa4-- From owner-chemistry@ccl.net Thu Dec 30 13:52:00 2021 From: "Rzepa, Henry S h.rzepa~!~imperial.ac.uk" To: CCL Subject: CCL:G: Quantum chemistry interoperability library? Message-Id: <-54559-211230133651-3874-PSTPEK2lD5Kw09N9xYwJtg .. server.ccl.net> X-Original-From: "Rzepa, Henry S" Content-ID: <6EB7C391B9E7484D9843ABB7DB6B1C5A .. eurprd06.prod.outlook.com> Content-Language: en-US Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="us-ascii" Date: Thu, 30 Dec 2021 18:36:39 +0000 MIME-Version: 1.0 Sent to CCL by: "Rzepa, Henry S" [h.rzepa^^imperial.ac.uk] This is great news. It fits nicely into another topic of great interest, the generation of FAIR data in all areas of chemistry. In this context, the binary array file maps onto both the I of FAIR (interoperability) and R (re-use) very nicely. From the point of view of publishing eg a computational chemistry dataset in a data repository as part of the FAIR procedures associated with publishing results in a journal, adding a binary array file to eg the .FCH, .WFN and .WFX files normally part of such a computational file set, is a real step forward in the "FAIRification" of computational data. In a related context, it would be useful to try to gain agreement on a Media type for such a file. In 1994 we proposed the chemical/x-..... notation for such media files, and usage of this type continues to this day. So if we wish to incorporate the binary array file (file suffix .mat) into this scheme, we need some suggestions for this usage. Nominally, it could be chemical/x-qc-binary-array-file at its simplest. I might also add that an IUPAC working party is currently drafting a set of recommendations for FAIR metadata in the area of NMR spectroscopy, work which is expected to be published for discussion in 2022 or perhaps 2023 under the auspices of IUPAC. Perhaps such a working party in QC should be proposed? If anyone intends to go to WATOC next year (2022! hopefully), perhaps a BOF meeting to discuss further could be arranged? Henry > On 30 Dec 2021, at 15:37, Michaelo Frisch frisch+*+gaussian.com wrote: > > > Sent to CCL by: "Michaelo Frisch" [frisch]~[gaussian.com] > A couple of people have pointed out the need for such an interoperability library > and noted some of the issues involved. We have attempted to address these issues. > > As was noted previously, people often use Gaussian formatted checkpoint files > for this purpose, but these have several deficiencies. fchk files were originally > designed to facilitate post-processing of results. In this context, modest > precision is adeequate and using text files has several advantages. For communication of intermediate results as part of having different programs work > together on a calculation, the loss of precision is unacceptable. Also, for > large calculations, text files are verbose even with modest precision, and slow > to process. Just reading in and parsing the 1gb fchk file from a 3000 atom frequency calculation can be tedious. Also, since fchk files were originally intended for post-processing results from Gaussian, some of the data in them is stored in a way which reflects Ggaussian's internal data structures and is not intuitive for people working in other environments. > > To address these problems, we have a new file format. We originally called > this a "matrix element file" but have switched to the name "binary array file" > which is more descriptive of its structure. Like the fchk file, this is a > self-defining file, but it is binary so that full precision can be retained > and reading/writing the file is much faster. The information which is common > to all atomistic simulations is stored in a straightforward arrangement which > is easy to move back and forth from any program's internal data structures. > From owner-chemistry@ccl.net Thu Dec 30 14:34:00 2021 From: "Kenneth Ruud kenneth.ruud===uit.no" To: CCL Subject: CCL:G: Quantum chemistry interoperability library? Message-Id: <-54560-211230143231-20094-1o9enmOXiGEeBxE4sfyklw[-]server.ccl.net> X-Original-From: Kenneth Ruud Content-Language: en-GB Content-Type: multipart/alternative; boundary="_000_OS4P279MB032413050B5E08F96ADDD625F2459OS4P279MB0324NORP_" Date: Thu, 30 Dec 2021 19:32:16 +0000 MIME-Version: 1.0 Sent to CCL by: Kenneth Ruud [kenneth.ruud ~ uit.no] --_000_OS4P279MB032413050B5E08F96ADDD625F2459OS4P279MB0324NORP_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable There was an effort some years ago on developing this kind of interface sof= tware, Q5Cost, see e.g. https://pubs.acs.org/doi/10.1021/ci7000567 and htt= ps://doi.org/10.1002/jcc.23492. This was driven by the need to communicate = correlated wave functions between different ab initio programs which, as al= ready pointed out in this thread, is a non-trivial task. We have it impleme= nted in Dalton. However, I have been trying to search around but failed to find place to do= wnload the code, though I have not searched very hard (so this is from the = days before assigning DOI=92s to program releases, sadly). Anyway, even if the code cannot be found, the papers may at least indicate = some of the issues that needs to be considered. Best regards, Kenneth Prof. Kenneth Ruud Hylleraas Centre for Quantum Molecular Sciences UiT The Arctic University of Norway Mobile: +47 90098353 Homepage: https://uit.no/go/target/41020 > From: owner-chemistry+kenneth.ruud=3D=3Duit.no-$-ccl.net on behalf of Susi Lehtola susi.lehtola[A]alu= mni.helsinki.fi Date: Thursday, 30 December 2021 at 03:09 To: Kenneth Ruud Subject: CCL:G: Quantum chemistry interoperability library? Sent to CCL by: Susi Lehtola [susi.lehtola{=3D}alumni.helsinki.fi] Hello, I am again hitting my head against the wall, since I am having trouble passing data from one quantum chemistry code to another. What we are missing as a community is a standard interoperability library for passing basis set and wave function data from one program to another. The de facto standard is GAUSSIAN's formatted checkpoint library, but it also has some deficiencies; for instance, it's not machine precision. Because the library should store all the necessary data for at least SCF wave functions, that is, the Gaussian basis set and the molecular orbitals (MOs) and their occupation numbers, an interface to this library could also serve as a tool for checkpointing calculations that have not converged. Some pieces of the necessary functionality are certainly around. Many quantum chemistry programs have implemented their own formatted checkpoint i/o functions. Quantum chemistry analysis programs have needed to implement their own parsers for the .fchk / MOLDEN / etc formats. I do not think that writing such a common interface library should be too difficult. All that is needed is 1. a data structure that is able to express the data in a common format and 2. input and output functions to translate the data from/to specific quantum chemistry code formats. In addition to the .fchk and molden formats, the common interface library should also be able to read various programs' native data files > from disk, like DENS and XDENS in TURBOMOLE. Since many pieces of the puzzle are already around, and the problem affects the whole community, I would like to get everyone's feedback on this idea. If there was, say, a portable open-source C++ library with C, Fortran and Python frontends for handling molecular wave function data, would you be willing to use it in your own program package? What kinds of features would you need in it? Does such a library already exist? Susi PS. I work as a Software Scientist at the Molecular Sciences Software Institute at Virginia Tech (http://molssi.org), but I am sending this message from my Helsinki address since it's what I've used here for close to a decade. -- ------------------------------------------------------------------ Mr. Susi Lehtola, PhD Adjunct Professor susi.lehtola##alumni.helsinki.fi University of Helsinki http://susilehtola.github.io/ Finland ------------------------------------------------------------------ Susi Lehtola, FT dosentti susi.lehtola##alumni.helsinki.fi Helsingin yliopisto http://susilehtola.github.io/ ------------------------------------------------------------------ -=3D This is automatically added to each message by the mailing script =3D-http://www.ccl.net/cgi-bin/ccl/send_ccl_messagehttp://www.ccl.net/chemistry/sub_unsub.shtmlhttp://www.ccl.net/spammers.txt--_000_OS4P279MB032413050B5E08F96ADDD625F2459OS4P279MB0324NORP_ Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable

There was an effort some years ago on developing thi= s kind of interface software, Q5Cost, see e.g. https://pubs.acs.org= /doi/10.1021/ci7000567 and  https://doi.org/10.1002/jcc.= 23492. This was driven by the need to communicate correlated wave functions between different ab initio = programs which, as already pointed out in this thread, is a non-trivial tas= k. We have it implemented in Dalton.

 

However, I have been trying to search around but fai= led to find place to download the code, though I have not searched very har= d (so this is from the days before assigning DOI=92s to program releases, sadly).

 

Anyway, even if the code cannot be found, the papers= may at least indicate some of the issues that needs to be considered.=

 

 

Best regards,

 

Kenneth

 

Prof. Kenneth Ruud

Hylleraas Centre for Quantum Molecular S= ciences

UiT The Arctic University of Norway=

Mobile: +47 90098353

Homepage: ht= tps://uit.no/go/target/41020

 

From: owner-chemistry+ken= neth.ruud=3D=3Duit.no-$-ccl.net <owner-chemistry+kenneth.ruud=3D=3Duit.no-$-= ccl.net> on behalf of Susi Lehtola susi.lehtola[A]alumni.helsinki.fi <= ;owner-chemistry-$-ccl.net>
Date: Thursday, 30 December 2021 at 03:09
To: Kenneth Ruud <kenneth.ruud-$-uit.no>
Subject: CCL:G: Quantum chemistry interoperability library?


Sent to CCL by: Susi Lehtola [susi.lehtola{=3D}alumni.helsinki.fi]
Hello,



I am again hitting my head against the wall, since I am having trouble
passing data from one quantum chemistry code to another.

What we are missing as a community is a standard interoperability
library for passing basis set and wave function data from one program to another. The de facto standard is GAUSSIAN's formatted checkpoint
library, but it also has some deficiencies; for instance, it's not
machine precision.

Because the library should store all the necessary data for at least SCF wave functions, that is, the Gaussian basis set and the molecular
orbitals (MOs) and their occupation numbers, an interface to this
library could also serve as a tool for checkpointing calculations that
have not converged.

Some pieces of the necessary functionality are certainly around. Many
quantum chemistry programs have implemented their own formatted
checkpoint i/o functions. Quantum chemistry analysis programs have
needed to implement their own parsers for the .fchk / MOLDEN / etc formats.=

I do not think that writing such a common interface library should be
too difficult. All that is needed is

1. a data structure that is able to express the data in a common format and=

2. input and output functions to translate the data from/to specific
quantum chemistry code formats.

In addition to the .fchk and molden formats, the common interface
library should also be able to read various programs' native data files
> from disk, like DENS and XDENS in TURBOMOLE.

Since many pieces of the puzzle are already around, and the problem
affects the whole community, I would like to get everyone's feedback on
this idea.

If there was, say, a portable open-source C++ library with C, Fortran
and Python frontends for handling molecular wave function data, would
you be willing to use it in your own program package? What kinds of
features would you need in it? Does such a library already exist?

Susi

PS. I work as a Software Scientist at the Molecular Sciences Software
Institute at Virginia Tech (http://molssi.org= ), but I am sending this
message from my Helsinki address since it's what I've used here for
close to a decade.
--
------------------------------------------------------------------
Mr. Susi Lehtola, PhD         =     Adjunct Professor
susi.lehtola##alumni.helsinki.fi   University of Helsinki
http://susilehtola.github.io/=      Finland
------------------------------------------------------------------
Susi Lehtola, FT          = ;        dosentti
susi.lehtola##alumni.helsinki.fi   Helsingin yliopisto
http://susilehtola.github.io/=
------------------------------------------------------------------



-=3D This is automatically added to each message by the mailing script =3D-=       http://www.ccl.net/cgi-bin/ccl/send_ccl_message
      http://www.ccl.net/cgi-bin/ccl/send_ccl_message
      http://www.ccl.net/chemistry/sub_unsub.shtml

Before posting, check wait time at: http://w= ww.ccl.net

Job: http://www.ccl.net/jobs
Conferences: http://server.ccl.net/chemistry/announcements/conferences/

Search Messages: http://www.ccl.net/chemistry/searchccl/index.shtml
      =

RTFI: http:= //www.ccl.net/chemistry/aboutccl/instructions/

--_000_OS4P279MB032413050B5E08F96ADDD625F2459OS4P279MB0324NORP_-- From owner-chemistry@ccl.net Thu Dec 30 15:34:00 2021 From: "Michael Frisch frisch-*-gaussian.com" To: CCL Subject: CCL:G: Quantum chemistry interoperability library? Message-Id: <-54561-211230145248-6339-q8aLiDFD69GoyX8uXb1XkA*server.ccl.net> X-Original-From: "Michael Frisch" Date: Thu, 30 Dec 2021 14:52:46 -0500 Sent to CCL by: "Michael Frisch" [frisch%a%gaussian.com] > "Susi Lehtola susi.lehtola(-)alumni.helsinki.fi" wrote: > > Definitely. My point is that the issues with the various basis function > orderings and normalizations already exist in QCSchema / QCEngine > infrastructure, since it allows you to store molecular orbital > coefficients. > > I am interested in an implementation at a lower level: a communication > library that is easy to hook up to to read and write wave function data. > The internal library format could be something like QCSchema i.e. data > stored as JSON, but the code could also have i/o for Molden and > formatted checkpoint. > > Yang Guo pointed out that "The transformation matrix from Cartesian and > spherical basis function is not unique, for high angular momentum > shells. Different conventions are used by different packages." I don't > think I have ever run into this kind of an issue. As far as I know, the > fundamental transformations are the same in all codes; the only > differences arise from the normalizations and the orderings of the > cartesians and spherical functions, which have MANY standards. > -- Our interface format does include the order of the AO functions as well as whether they are pure or Cartesian. This is suffcient for Cartesian functions. However, there are different sign conventions for the pure functions and we don't currently have a flag to distinguish them. The pure functions in Gaussian follow the convention in Schlegel and Frisch, IJQC 54, 83 (1995). I think many codes have the same convention for d functions, but I have no information about higher angular momenta. Some conventions have a factor of -1^L, so d's might be the same between two codes and f's different, even if each code is consistent in applying their convention. For contracted AO basis functions we store coefficients of normalized primitives. Any well-defined convnention is fine for this because most codes need to convert back and forth anyway. Non single determinant wavefunctions are a very messy problem which we have not tried to address. Even within Gaussian there are different determinant-based CI algorithms for CASSCF which order determinants differently, and there is the same issue with respect to spin-eigenfunctions. Then there are also a multitude of conventions for the choice of spin-eigenfunctions as well as sign conventions. To handle this in its full generality would require storing a lot of information about the type and order of spin functions as well as the orbital configuration that goes with each CI coefficient. Then to make the information useful for interoperability one would have to be able to transform between all the schemes. Doing all this would be useful but a big project. Mike Frisch From owner-chemistry@ccl.net Thu Dec 30 16:09:00 2021 From: "Michael Frisch frisch/a\gaussian.com" To: CCL Subject: CCL: Quantum chemistry interoperability library? Message-Id: <-54562-211230153145-31414-+Aup92+3DB8zleQQxoFlzQ]-[server.ccl.net> X-Original-From: "Michael Frisch" Date: Thu, 30 Dec 2021 15:31:43 -0500 Sent to CCL by: "Michael Frisch" [frisch-x-gaussian.com] > "Brian Skinn brian.skinn###gmail.com" wrote: > > Sent to CCL by: Brian Skinn [brian.skinn[a]gmail.com] > --0000000000001fbb4a05d460faa4 > Content-Type: text/plain; charset="UTF-8" > > Mike (and others), > > Given the significant work that has already been done on this 'interfacing' > file format, this suggestion is very late to the gate, but -- was any > consideration made of the HDF5 file format ( > https://www.hdfgroup.org/solutions/hdf5/) for this? > The lower-level (Fortran and C) interfaces we provide do allow for processing data in chunks. The higher-level Python interface expects to keep things in memory, but the lower-level functionality is available in Python if someone wants it. The HDFS people have thought about some important issues and there would be some good points to using their stuff. However, we are concerned about ease of use and minimizing the barrier to adopting what we've done. A person with a Fortran or c program can learn a dozen or so simple calls, link with our library using whichever of the common compilers they're already using, and be ready to go. The Python interface uses a couple of common packages like numpy but again fits easily into a variety of ways people are likely to be working. Sitting on top of another package would add another barrier to people using the interface. Mike Frisch From owner-chemistry@ccl.net Thu Dec 30 16:44:00 2021 From: "Tian Lu sobereva#sina.com" To: CCL Subject: CCL: Quantum chemistry interoperability library? Message-Id: <-54563-211230140922-722-23YxUK1RSLbJBE9almMgnA : server.ccl.net> X-Original-From: "Tian Lu" Date: Thu, 30 Dec 2021 14:09:20 -0500 Sent to CCL by: "Tian Lu" [sobereva(-)sina.com] It is worth to mention that last year I proposed a new file format (mwfn) for wavefunction storage and exchange. Detailed description of this format as well as example files are available at https://doi.org/10.26434/chemrxiv- 2021-lt04f-v5. A comparison between different wavefunction formats, including wfn, wfx, fch, molden, mkl, NBO.47, is given in the appendix of this document. The purpose of defining mwfn format is to provide an ideal format for recording wavefunction and transferring wavefunction between different quantum chemistry and wavefunction analysis programs. This format has been supported by current version of Multiwfn code. Currently, a very popular format for storing wavefunction is Molden (input file of Molden software), however, there are many limitations or problems, for example (1) Nuclear charge information is not explicitly recorded. This is quite troublesome if pseudopotential is used. (2) Matrics (e.g. Fock matrix, density matrix, various integral matrices) cannot be recorded, however they are needed in many post-process analyses. (3) Cell information cannot be recorded. This makes direct analysis for periodic wavefunction infeasible. (4) The format is loosely defined, leading to severe compatiblity problems (I feel deeply about this point in the process of developing the Multiwfn wavefunction analysis code. Molden files produced by many quantum chemistry codes were found to be non-standard, making the loading unsuccessful or leading to wrong analysis result. I spent a lot of time to make my code compatible with molden files produced by as many programs as possible. Also due to the loose definition of the molden format, the efficiency of loading has been sacrificed to a certain extent for compatibility considerations.) (5) Only basis function of angular momentum up to g is formally supported. However, today's very high-precision calculations sometimes involve h angular momentum. (6) Only a single wavefunction can be recorded. Therefore, wavefunctions produced during scanning or molecular dynamics have to be recorded individually in different files. Some of the above issues are not present in the well-known "fch" file, but the fch file has additional limitations, such as the lack of dedicated fields for recording orbital occupation numbers and orbital irreducible representations. In addition, fch format often contains many irrelevant information. Therefore, in my opinion, fch is also not well-suited as a general-purpose format for recording wavefunctions. The limitations of existing wavefunction formats have been fully considered when defining the mwfn format, hence the various problems mentioned above do not exist. Moreover, the mwfn format is clear, concise and human-readable, and thus it is fairly easy to write and load. I hope that this format could be widely supported by quantum chemistry programs in the future and replace the old Molden format. Finally, it is worth to note that as mentioned in the document introducing the mwfn format, Multiwfn code provides sanity check capability of inputted mwfn file. The mwfn file exported by a new code should be able to pass this check. Therefore, potential problem of improper normalization and incorrect ordering of basis functions in a shell could be easily detected and thus fully avoided. Best regards, Tian Lu Beijing Kein Research Center for Natural Sciences