Up to now we have only seen how the processing was performed on the data-set currently held into the main memory. This way of working is fast and efficient for small data-sets, but inefficient for large data-sets. So a capability has been put into the Gifa program (since V3.0) to work on file rather than on memory. It is thus possible to process much larger data-set, that would not fit into the main memory.
This is performed through a cache-memory system that is implemented into the Gifa program. This features has the double advantage of permitting the processing of very large data-set (no limitation in size, as far as the software is concerned), and to speed-up the processing of the processing of large data-sets that can be done otherwise in-memory.
With Gifa V4.0 the cache system has been fully rewritten, in order to extend its capabilities, and reliability. Most of the limitations that were in the previous version of the cache have been removed. You should also notice a large speed increase when working on large files. However, the user interface has been kept as much as possible unchanged, except for the GETC and PUTC commands which have now a syntax more similar to other Gifa syntax.
The cache memory works on the standard file format ( accessed
also through the READ / WRITE commands). It is based on two principles :
*a block structure of 2D and 3D data set files, that permits random access
of data in any direction,
*a memory area that is devoted to minimising the
number of disk accesses (the cache-memory system).
In the previous version of the cache (Cache version 1.0, in Gifa 3.x and in pre-release of Gifa 4.0), the blocks were of a fixed size of 4096 words (16 kbytes) corresponding to 16x16x16 in 3D, to 64x64 in 2D and to 4k in 1D. This is no longer true, in the present version (Cache version 2.0, since Gifa 4.0),, the blocks are adapted to the experiment and to the computer. This means that the block may take any value, optimised for the current hard-ware (typically 4kbytes and 16 kbytes), and that the division of the blocks will depend on the sizes of the experiment, in order to optimise access speed in all the spectroscopic dimensions.
The standard format features also a header which holds all the parameters of the data (such as dimensionality, sizes, spectral widths, etc...). This header is in text format, and can easily be displayed with a more command. The user can easily add his own information in the header with the GETHEADER PUTHEADER commands. The header is of a fixed size equal to the size of a data block. The cache system can be used with several files in the same time.
Cache memory access is much faster than a disk access. The block structure of the file speeds up the processing of large data sets. For instance to access 2 successive planes of a 3D, the first plane will be loaded from the disk, and then the second will be subsequently held into the cache memory. There is no real limitation of the size that the cache system may handle, however each file opened with the cache system will use room in the computer memory allowing to fit several line (plane) of the 2D (3D) experiment.
It is important to note that writing onto the cache is not equivalent to
writing onto disk. There is some mechanism, that will store on disk the content
of modified block when needed; but the content of the file is not warranted
when working through the cache system (this is very different from the WRITE
command). This has no effect when working from within Gifa itself, since all
file access will go through the cache system that will insure the coherence of
the data. However it may have effect in certain cases such as :
*accessing
the file from another program (may be another Gifa);
*power failure of the
computer;
*a bug in Gifa (?).
When needed, it is possible to "flush" the cache and to copy to disk all the modified blocks with the commands FLUSH and FLUSHCACHE.
Processing data-sets with the cache memory system usually requires using some macro for scanning through the complete data-set for the operation to be completed. A set of macros is provided which will permit an efficient and easy processing (see below).
Working with the cache system consists in creating a file in standard format (with WRITE or with NEWFILEC), connecting to the file without actually reading the file (with JOIN), and applying the processing either row by row (1D and 2D) or plane by plane (3D).
The command JOIN permits to connect the program to a standard file format, without actually loading any data into memory. The effect is to load the contexts that describe the connected file (such as size, dimensionality, itype, etc... see below the variable paragraph). Several files can be JOINed independently, the contexts will always hold the parameters of the last JOINed file. DISJOIN will disconnect the program from the currently connected file, any modified data on the data-set will be saved onto the file. LISTFILEC output the list of the currently JOINed file.
dataset lists the value of the contexts describing the last JOINed data-set. PUTHEADER permits to specifically modify a parameter of the currently connected data-set, the modification is directly stored to the file(after a FLUSH or a DISJOIN). The parameters handled by Gifa (as returned by dataset) can be modified, but any parameter can be put into the header with this command. GETHEADER permits to read the value of parameter in the file header. The read value is available in the $c_header variable.
This two commands permits to move data back and forth between the file and the main working memory.
GETC loads data from the file to the memory; PUTC copies the content of the memory to the file. Both commands have a similar syntax. They permit to handle data areas as well as complete lines, planes and cubes. The action taken depends on the dimensionality of the JOINed file as well as the value of DIM in the Gifa working context.
dimensionality of the JOINed file :
value of dim :
|
1D
|
2D
|
3D
|
1D
|
1D
area
|
1D
area
|
1D
area
|
2D
|
not
applicable
|
2D
area
|
2D
area
|
3D
|
not
applicable
|
not
applicable
|
3D
area
|
GETC / PUTC low up
in 1D, where low and up determines the area to load, or
GETC / PUTC axis ... index ... low up ...
in 2D and 3D where the number of axes, indexes and coordinates depends on the kind of transfer.
See per command manual for the detailed syntax.
This command permits to display a currently JOINed data-set, without actually loading it into the main memory. All the current display parameters are used for the display, but the scale which has to be defined to the command. The coordinates of the current zoom window used for the display is converted to ppm, and used for the display of the JOINed data-set. This permits to display spectra acquired in very different conditions.
The SHOWC command does not actually load the whole data-set in memory for displaying, plays game with the cache memory. This is why it is a little slower than the regular display.
This command is used in the super1d and super2d macros which permit to overlay several display on screen.
The GETC, PUTC commands do not read and write data directly from the disk, but from the cache memory. If some data are modified, the cache system takes the burden of updating the file when needed. However, in certain cases, it might be needed to have an updated file.
FLUSH flushes onto disk the modified data corresponding the currently JOINed file. FLUSHCACHE will flush all the file currently JOINed.
This command creates a template for a standard file format, and reserve the room for the data. The command prompts the user for all the parameters that will be needed to create that file. It will be then possible to fill that file with the PUTC command.
This two commands have been already seen in a previous paragraph. They are strictly equivalent to the READ WRITE command. They could have been developed as macros, using the JOIN and GETC commands for READC; and NEWFILEC and PUTC for WRITEC.
A set of macro is provided to process on file data : proc2d and proc3d in /usr/local/gifa/macro.
Each command requires the name of the input file, the name of the output file, the axis to process and the commands to apply.
proc2d in_file out_file axis 'list of commands'
process the data row by row or column by column depending on axis (either F1 or F2). You can also use proc2d in interactive manner, being prompted for each value. In this latter case, when entering the list of commands, you can use several lines, finishing the last empty line with a ^D. The commands are regular Gifa commands, in 1D mode. Macro are valid.
proc3d in_file out_file axis 'list of commands'
is equivalent to proc2d, but processes the 3D in a plane wise manner. Be careful that the command you enter will be in 2D mode, and that whatever plane you choose for the processing of the 3D, the commands will refer to the plane as F1, F2.