Assignment on-line Documentation


the Assignment Module Version 0.1

15 October 1996 version

SET-UP

Presentation

This module implements a set of very simple assignment tools, which may however proof to be useful. It is completely written in the Gifa macro language, and as such can be fully adapted to your needs. Right now, it is principally aimed toward protein and peptide assignment. Extending it to oligonucleotides and sugars is probably a simple matter of extending the basic residues data-bases. However, I have no idea whether it can be used to help in the assignment process of other kind of organic molecules or not.

You will not find here any fancy tools nor automatic assignment, the only help provided here consists in a set of tools permitting to visualise several spectra at the same time, to add notes to peaks, to draw lines to help for visual align search, and to store the information in several data-bases, one for assigned peaks, one for spins and one for spin-systems (consisting simply in a set of spins). For the moment, this module works only for 2D data-sets, and is aimed mostly to homonuclear spectroscopy (however, I'm sure it can be used for 2D heteronuclear spectroscopy).

However, due to the Gifa versability (calling UNIX from Gifa, creating/reading files, etc..) it is quite easy to adapt this canvas to your proper need, for instance calling from within this set-up your favorite automatic assignment tools.

File set-up

A complete assignment is kept in a special directory, called a project. The project, which may reside anywhere on the disk, holds several files and directories used for storing informations, it may also contain any file that the user may wish to keep.

In the project directory, you will find typically two files : The file parameters is a macro which is executed when selecting the project. It contains all the definitions and some basic environment variables. The file zoom_window contains the zoom window coordinates for the 'multi-zoom' tool.

Two obliged directories reside also in the project.

The db directory holds the data-bases in dbm format as described below, the primary sequence is also found in this directory in a file called primary. The format is as follows : one residue per line, coded in one-letter code.

The spectra directory holds all the spectra files associated with the project. Typically, for space optimisation, links to actual experiment files will be stored here rather than the complete file.

Assignment Information Structure

Assignment information is stored as sets of peaks, spins and spin systems, with the following structure :

data-bases

Except for the primary structure file presented above, there are 3 kind of data-base files in the db directory, there are all in dbm format and each of them is thus composed of two files *.pag and *.dir which should be not modified directly. The dbm format is a generic UNIX format for flat data bases. For instance, theese files can very conviniently be accessed with the perl language.

The name_of_experiment.pag and name_of_experiment.dir files hold the peak data base for a given experiment. An entry in the peak data-base stores all the pertinent information for a given peak. Peak entries can be created by copying them from the peak-picker, or during the assignment process. With a peak entry is stored two pointers to the spin databases, pointing to the parent spins of this peak.

Spin.pag and Spin.dir is the spin base, a spin is stored as a chemical shift, a name, and the spin-system to which it belongs.

Finally Spin_sys.pag and Spin_sys.dir is the spin-system base, for each entry, the spin-system type, the index in the primary sequence, as well as the list of the spins is stored.

In all the data-bases, entries are referred to by a numerical id that ranges from 1 to the highest value. The value of the highest id used is stored in a special entry indexed as "LARGEST". However, due to the very nature of the dbm format used for the file, (and of the associative array used internally in Gifa) there is no need for the id to be contiguous. So if an entry is deleted, the numbering of the other entries is unaffected.

build list

The build list is on the main tool for progressing in the assignment work. The idea is to make a list of all the peaks within a spin-system (in the TOCSY sense). Once this list is complete, it is possible to promote the list to a new spin-system which is then entered in the data-base.

The build list is managed with a set of tools found in the graph tool menu. The marker tool permits to detected peak alignments, and to create a spin for each alignment. The list can be listed or showed directly on screen. And of course, the list can be promoted to a spin-system.

MENUS DESCRIPTION

When entering the assignment mode, Gifa will set up an assignment environment with the basic menus, the Peak menus and 3 additional menus that give access to all the commands needed to performed spectral assignment. The macro env_att.g actually sets-up every thing for assignment..

Project menu

This menu permits to create or select a project, and more generally to realize all the operations global to the project

New Project

To create a new project. It will create a directory with all the empty data-bases. You will be also prompted for the primary sequence of the studied protein. The primary sequence can be given, either literally in 1 letter code, from a file (1 or 3 letter code), or from a PDB file.

Select Project

Permit to select any previously created project. Only one project can be used at a time.

When selecting a project, you have the choice, in the dialog box, to either create a backup of the current state of the project (simply a tar file), to recover from a previos backup (thus deleting the current state) or not to do any action.

After selection, the number of assigned systems is displayed on screen, then the multi-zoom tool and the File Selector tool are opened.

Change param

A set of parameters is stored with the project, these parameters can be changed from here. You can define : distance alignment for spin assignment, distance tolerance for mouse clicking, and several display parameters.

Save data-bases

The assignment data-bases are permanently kept on file, however, in case of a program crash, the last modifications may be lost. Clicking here secures the last entries.

File Selector

All the spectra which are used in the project can be accessed from here. A spectrum can be loaded in memory, in the same time the associated peak data-base is loaded. A spectrum can also be only showed (see the SHOW command in the documentation), displaying it on screen, but not loading it in memory.

Whene loading a file in-memory, the content of the related peak data-base is displayed on screen.

Note that you can also use the super2d tool (in the display menu) to display several spectra superimposed.

Add spectra

From here, you can add a spectrum to the list of the currently used spectra. "Adding" a spectrum consists in either copying or linking it into the dedicated directory(see above, File set-up). Linking sets a UNIX soft-link which stores the address of the file only, thus permitting an important gain in disk space.

multi zoom

This is a small utility, which may well be your principal tool when working on an assignment. It permits to define zoom zones, and to rapidly jump to one of the defined zones. The set-up can be stored on disk and reloaded later on. It is also possible to draw the defined zones.

choose base color

Most display command use a contrasting color (see SCOLOR in documentation). This utility permit to define which color will be used.

Quit Assignment

Use this entry if you want to quit the assignment module and restore a normal set-up

data-bases menu

copy pk to db

This one permits to copy the content of the peak table (obtained with the Peak picking tool) to the peak data-base. Peak will be there but without assignment of course. This permits to load a first set of peaks, for instance the finger print region, from which the assignment work can proceed.

This command can be issued several time and at any moment during the assignment work, thus adding peaks into the assignment data-base.

Show database

This command displays on the spectral screen all the peak in the current peak data base.

Find a peak

After selecting this command, the program will wait until you click on a peak on the spectrum, and will high-light the selected peak as well as print its id in the terminal screen..

Edit a peak

The previously high-lighted peak can then be edited with this command. You directly see the content of the assignment data-base, and can actually modify it. If the peak is already assigned, you will be able to see/edit the corresponding spins.

Add a peak

This command wait until you click on the spectral window, and creates a new entry in the peak data-base. The peak can then be edited or removed.

List all peaks

This command produces a listing of all the peaks in the data-base. The file is stored in the current project.

Find a spin

After selecting this command, the program will wait until you click on the spectrum, and will high-light the closest spin.

Edit a spin

The previously high-lighted spin can then be edited with this command. You directly see the content of the assignment data-base, and can actually modify it. Related peaks and spin system can also be edited.

Add a spin

This command wait until you click on the spectral window, and creates a new entry in the peak data-base. The peak can then be edited or removed.

List all spins

This command produces a list of all the spins in the data-base.

Find a system

After selecting this command, the program will wait until you click on a peak on the spectrum, and will high-light the related spin-system.

Edit a system

The previously high-lighted spin-system can then be edited with this command. You directly see the content of the assignment data-base, and can actually modify it. Related peaks and spins can also be edited.

List all systems

This command produces a list of all the spin-systems in the data-base.

List assignment

This command produces an edited list of the current state of the assignment in the data-base (only spin-systems sequentially assigned are listed here)

Show Primary Seq

This commands builds a form box, with one line per residue in the primary sequence of the molecule under study. Each assigned residue is associated to a button showing the corresponding spin-system on screen.

graph tools menu

Point

This is equivalent to the standard point macro : you can click on the current spectrum, and the coordinates of the clicked point are printed, and a cross is drawn at the click point location. You exit the point command by clicking on the third button of the mouse.

Marker

This is the main tool for detecting peak alignment and for building the build list. When activating the command, you are prompted to click on each spectral location that you want to put in the marker. When finished, click on the third button of the mouse. A box is then created, which will remain on screen as long as you do not close it purposely.

From the marker box you can choose to redraw all the horizontal and vertical lines connecting the selected points. You can choose to have diagonal-symmetric locations considered or not, and choose the color. You can show all the peaks in the data-base lying at the intersection of an horizontal and vertical lines, as well as create missing peaks. Finally, you can add all the peaks detected at the intersections to the build list.

Reset build list

This command empties the build-list, which contains peaks.

Print build list

This command simply prints the content build-list, by showing the id of the peaks in the build-list..

Show build list

This will display on the spectral window, the peaks in the actual build-list.

promote

This is the command that will use the actual build list, and create a new spin-system. First, alignment are detected within the peaks in the build list, and spins are detected for each spectral coordinates. If needed, new spins are created. Then a new spin-system is created.

Finally, the tool permitting to edit the spin-system is opened from which you can modify the spin-system parameters and edit each individual spins.

TYPICAL ASSIGNMENT WORK

With this set-up, a typical assignment work on a small protein, done the Wüthrich way, consists in

You will notice that the program slows down when the data-base get bigger, this is why it is not recommended to start with a big peak-picking, and then to handle a big data-base throughout the whole process.

MACRO PROGRAMMING

The complete assignment module is written in nothing but macros. So you could have written it yourself ! At least you can modify it to fit your needs. Here is some help to do so.

All the macros are in /usr/local/gifa/macro/att , this address is added to the GifaPath when entering the module. Static information (list of possible atom names, residue names, etc...) are defined in the basic_db.g and build_static_db.g macros. You can very simply adapt this one for some new residues.

The dbm file are opened when the project is selected, and bound to the associative array att[] spin[] and sys[] for the peak data-base, the spin data-base, and the spin-system data-base respectively. Entries are thus of the kind : $att[$peak_id] for instance. The different pieces of information are stored as blank separated fields in the variable. Coding is the following :

$att[#att] = f1 f2 amp #spin1 #spin2 type note

$spin[#spin] = delta name #sys note

$sys[#sys] = index type list_of_spin note

where

#att, #spin #sys are used here to note the indexes.

f1 and f2 are coordinates in ppm;

amp is the peak amplitude in arbitrary unit;

type codes for the kind of experiment;

delta is chemical shift in ppm;

index is the number in the primary sequence

type is the name of the residue

note is a free field, that you can use for whatever function.

Each dbm associative array att[] spin[] and sys[] contain the special entry LARGEST, which contains the index of the LARGEST id (#att, #spin, or #sys) yet assigned. So, when creating a new entry (spin in the example), you are supposed to do something like :

set new_id = ($spin["LARGEST"] + 1)

set $spin[$new_id ] = "New entry ..."

set $spin["LARGEST"] = $new_id ; updated only if no error occured

When programming some function that scan the whole data-base, you will probably end-up writing something like :

foreach i in att ; let's scan all the peak as an example

if ($i s! "LARGEST") then ; don't forget this one !

set peak = $att[$i] ; this is the complete entry

; then parse the entry, this is one way :

set f1 = (head($peak)) set peak = (tail($peak))

set f2 = (head($peak)) set peak = (tail($peak))

set amp = (head($peak)) set peak = (tail($peak))

; etc...

endif

endfor

If you manage to make something useful, you can transmit it to me so that I will make it available to other users.

CAVEAT

Note that this is a very preliminary work, people have been using this tool here in our lab, however, I'm sure there is still a lot of bugs.