logo Sirocco

Coastal ocean modelling

Sequoia User Manual

Installation

1. DISTRIBUTION TYPES

SEQUOIA is distributed in two forms: an installer, or individual tarballs.

2. INSTALLER

If you get an installer, uncompress it at the current location and run make install. Normally, the installer will contain the right tarballs and be configured by default for your type of application (analysis kernel, etc.), but of course the implementation will require some tuning and some interfacing with a numerical model. If you want to check or change the configuration, have a look at what we say in the next sections.

3. INDIVIDUAL TARBALLS

If you get individual tarballs, they will be of the following types:

sequoia-tree-VV.tgz Install that archive first, this is the main SEQUOIA directory tree (VV=version)
sequoia-kernel-shared-VV.tgz Shared kernel library (VV=version)
sequoia-sap-VV.tgz SAP (sequential analysis platform) library (VV=version)
sequoia-doc-VV.tgz Documentation (VV=version)
sequoia-KKKK-VV.tgz Specific analysis kernel library (KKKK=kernel) (VV=version)
sequoia-KKKK-ulibUUUU-VV.tgz Specific user library (ulibUUUU=ULIB type) (KKKK=kernel) (VV=version)

To run SEQUOIA, you must have (1) the tree, (2) the shared kernel library, (3) the SAP library, (4) one analysis kernel of your choice, and (5) one corresponding user library. Check the README files in each archive for Release Notes and Compatibility information.

At this time the following Analysis Kernels are available:

mantaray Reduced-order analysis with 3-D multivariate EOFs (akin to SEEK filter analysis)
Kernel archive name: sequoia-mantaray-VV.tar.gz
sofa Reduced-order analysis with 1-D vertical multivariate EOFs (akin to SOFA analysis)
Kernel archive name: sequoia-sofa-VV.tar.gz
beluga Dual-space (observation-space) Kalman-type analysis
Can be run in Ensemble Filter mode
Kernel archive name: sequoia-sofa-VV.tar.gz

User Libraries contain all of the user-defined code and parameters, including the interface with the numerical model, as well as the main programs (launchers). Typically at this time the following types of User Libraries (ULIBs) are available (note that some types of ULIBs only exist for some specific kernels and not others):

"ulib1" User Library 1. Use this library as a skeleton for building your own User Library. (Contains a "null model" interface to be replaced by yours.) Note that the numerical model library itself does not have to be placed in an ULIB -- its location can be defined via the macro EXTLIBS in sequoia.conf
"ulib-ktest" User Library for kernel tests. This shortcuts the SAP library and directly interfaces with the analysis kernel.
"ulib-CCCC" User Library for particular test cases (CCCC=name of test case). This tests pretty much all of an installed and configured SEQUOIA system.

Here are the general installation steps if you get individual tarballs (if you get an installer, this will be done automatically for you):

(1) Copy the sequoia-tree tarball to the appropriate location of your choice. Then simply unpack the files (tar xzf sequoia-tree-VV.tgz). This will create the SEQUOIA directory tree, including the documentation. cd to the root of the tree. Some of the main files and directories at the root of the tree are shown below:

sequoia.conf.default The default configuration file
Makefile The main Makefile
README Information on the Contents, License, and Release Notes on this version
gpl.htm The GPL license -- DO NOT REMOVE
components/ The directory where plug-in components are installed
work/ Work directory where executables and parameter files normally are and where execution will occur, unless the WORKDIR macro is changed in sequoia.conf

(2) Copy the desired tarballs to the components/ directory. cd components and unpack those tarballs.

Configuration

Here are the general configuration steps, to be repeated each time you modify the configuration:

(1) Optionally, install and unpack new tarballs in the components/ directory

(2) At the tree root, edit sequoia.conf and specify your own configuration parameters (new kernel, new ULIB, new compiler, these sorts of things)

(3) make config will configure your installation. Note that this also creates direct links to the active components: shak/ akern/ ulib/ sap/ doc/ . Then to access the active ULIB, no need to go to components/ and to the long-name directory for that ULIB, just use cd ulib from the tree root. Please note that make config must be issued whenever something is changed in the configuration in sequoia.conf, such as upgrading or adding a component, or moving the whole tree to a new location.

(4) make from the tree root will compile everything and make the executable. make cleanall will clean all objects and libraries. make tarballall will make all tarballs, ready to be copied to a safe location.

Typically, you will have to interface a numerical model to work with with SEQUOIA. This is usually done by creating a new ULIB which will contain that interface. It is a good idea to start from an existing "ulib1" package for your analysis kernel (see what is said about the various ULIBs in the Installation section). Rename the existing "ulib1" package, reconfigure SEQUOIA to use it, and modify it to suit the needs of your interface. Guidelines on how to build an interface for your numerical model are given in the Coupling section. At this time, we are aware of existing interfaces for the following numerical models: SYMPHONIE, POLCOMS, POM, OPA/NEMO, MOM (the last three have been interfaced with the previous SOFA code and would require additional work to be coupled with SEQUOIA).

Testing

See what we say about ULIB libraries in the Installation section. See which testing ULIB tarballs are available for your kernel. Install those tarballs if necessary and reconfigure your installation.

To run the tests, see the README file and Makefile in the corresponding ULIB directory.

Bug Fixes

See the README file associated with the particular component you are interested in. 

Known Problems and Workarounds

See the README file associated with the particular component you are interested in.

KP01 -- sofa kernel -- Need to introduce the notion of vertical distance making it possible to select several observations on the same vertical profile. Right now this is not possible since dist_rel==0 in that case. -- Workaround: set AK_IF_THIN==.FALSE. but this makes the calculations costlier. This notion has been introduced in version 1.2 of the kernels.

KP02 -- shared kernel library -- The code does not work near the pole since distances use the local cartesian approximation. -- Workaround: none in this version, unless the pole is rotated externally prior to analysis.

Release Notes and Version History

See the README file associated with the particular component you are interested in. 

Command Line

When the SEQUOIA system has been properly compiled, a launcher is available for execution.

General case: The launcher is made at the location pointed to by the WORKDIR macro in sequoia.conf. SEQUOIA can then be launched as follows:

launcher_name parameter_file

where "launcher_name" is the name of the launcher (defined by the LAUNCHER macro), and "parameter_file" is the name of the input parameter file.

Ensemble assimilation case: Each identical instance of the program is invoked with a member rank:

launcher_name -m|--master|member_rank parameter_file

Member instances can be run on the same machine or on different machines (for instance on a cluster with fast network connections).

Case of testing ULIBs: The executable is made at the root of the ULIB directory, or in a subdirectory. See the README file associated with the ULIB.

Output Files

In addition to exchanging information with the numerical model, the execution of SEQUOIA produces several types of output files depending on your configuration, including:

  • A log output with detailed information about the configuration and the run (this is output to stdout but can be redirected to a file).
  • An Off-Line Analysis file named label.ola, where "label" is defined in the run parameters. The caribou program which is made along with the launcher and has been introduced in version 1.2.1 can be used to explore those files. More functions should appear in the next versions.

Parameter Modules

Fortran MODULEs containing static parameters can be found in almost all SEQUOIA components. They are named after something_mod.f . Here are a few of them which you may want to change if needed. Comments are provided in each file. In most cases, the default values are sufficient. In some particular cases (in red below), the default choices must be reviewed with more care as they may not work for your configuration.

shak/ ak_constants_mod.f (general constants)
ak_efs_mod.f (ensemble forecasting system integration)
ak_event_mod.f (message level and logging)
ak_geo_mod.f (topology and grid definition parameters)
ak_kdtree_lib.f (k-D tree search)
ak_obs_mod.f (data-related parameters including maximum sizes, etc.)
akern/ ak_params_mod.f (numerical switches)
ak_stats_mod.f (error statistical parameters and sizes)

Parameter File

SEQUOIA reads its variable parameters from the parameter file specified on the Command Line. Some of those parameters are needed by the Analysis Kernel while some others are needed by SAP. The input format, as well as the definition of each parameter, are respectively given in two user-modifiable files in ULIB: ulib/u_io_ak_prm.f and ulib/u_io_sap_prm.f. The standard format in "ulib1" uses FORTRAN NAMELIST statements. Each NAMELIST corresponds to the same-name group below.

Please note that the list of parameters can change in future versions. The ultimate references remain ulib/u_io_ak_prm.f and ulib/u_io_sap_prm.f .

"LOG" GROUP: IDENTIFICATION PARAMETERS

CHARACTER(LEN=800) :: LABEL (sap/sap_mod.f)
The label of the run, which will be used to build output filenames. Default= "saprun".

"OBS" GROUP: OBSERVATION-RELATED PARAMETERS

INTEGER :: AK_S (shak/ak_obs_mod.f)
Number of data files (also called datasets) in input. Default=1, maximum=AK_DATASETS.

CHARACTER(LEN=800) :: FNAME(AK_S) (shak/ak_obs_mod.f)
The names of the files containing data to be analyzed by SEQUOIA. No default.

INTEGER :: DFORMAT(AK_S) (shak/ak_obs_mod.f)
Specifies the data format associated with each data file. this is passed to u_io_data.f which must then select the appropriate format processing. No default.

INTEGER :: STATUS(AK_S) (shak/ak_obs_mod.f)
Specifies the status of observations in the dataset. Useful choices are indicated below (note that the actual numeric values are defined in shak/ak_obs_mod.f and can be different from the values shown in the table below!). No default.

STATUS
0 (AK_REG) this dataset contains normal observations to be read (this dataset-level status definition will be superseded by observation-level status definitions if they are different)
-1 (AK_VERIF) this dataset contains verification observations which will be used to produce an innovation, but that part of the innovation will not be used in the analyses (this dataset-level status definition will supersede observation-level status definitions unless they are "bogus" in which case an error will be reported)
2 (AK_GZR) this dataset contains GZR (zero-guess) "bogus" observations: at these locations the model guess will be assumed to be zero (equivalent to objective analysis with no prior guess) (this dataset-level status definition will supersede observation-level status definitions)
3 (AK_OZR) this dataset contains OZR (observe zero) "bogus" observations, which come back to "observing" the climatology when assimilating fluctuations -- this is a "statistical damping term" (this dataset-level status definition will supersede observation-level status definitions)
5 (AK_ONC) this dataset contains ONC (observe no change) "bogus" observations: at these locations the misfits are assumed to be zero (which is an information in itself) -- useful for regularization and to avoid changing fields at some locations (this dataset-level status definition will supersede observation-level status definitions)

REAL :: AMPMAX (shak/ak_obs_mod.f)
The maximum absolute value of the input data. Data values outside of the range [-AMPMAX,+AMPMAX] are discarded. Default=1e10.
NOTE: In its present form, AMPMAX is not very useful as it does not depend on the dataset or data type, but it has been added for compatibility reasons with SOFA. It is better to use ulib/u_range.f which allows type-dependent skimming of the innovation vector.

INTEGER :: QCRANGE(2) (shak/ak_obs_mod.f)
The acceptable min/max values for the data quality flag. Default=(/ 0, 0 /).

INTEGER :: NSEL(AK_S) (shak/ak_obs_mod.f) (sofa kernel only)
The maximum number of influential observations per dataset in the suboptimal data selection. Default=20.

REAL :: REDUN(AK_S) (shak/ak_obs_mod.f) (sofa kernel only)
The informational redundancy factor in the suboptimal data selection. Default=1.0.

"REGMASK" GROUP: REGIONAL MASKS

INTEGER :: NMASKS (shak/ak_geo_mod.f)
The number of regional masks to be defined for diagnostic calculations. Default=0.
NOTE: This feature is yet unimplemented

"SEQ" GROUP: TIME SEQUENCING PARAMETERS (see also this comment on time units)

REAL :: TTMIN (sap/sap_mod.f)
The starting date of the run. No default.

INTEGER :: NTIMES (sap/sap_mod.f)
The number of integration/analysis cycles. No default.

REAL :: TINT (sap/sap_mod.f)
The interval between analyses. No default.

LOGICAL :: IF_SMOOTHER (sap/sap_mod.f)
Chooses how the innovation in the past and future will be handled.  Default=.FALSE.

IF_SMOOTHER
.FALSE. filter mode. Only innovation prior to the analysis time is used in the correction. The innovation is calculated as differences between the data and the model at the same time and location as the observations.
.TRUE. smoother mode. The innovation within TINT time distance of the analysis time (both in past and in future) is used. This requires running the model some time in the "future" in order to calculate the model-data differences.

SEQUOIA/Model Coupling Reference

1. FORTRAN INTERFACES

The detailed Fortran INTERFACEs to user-written and user-callable routines are contained in ulib/u_interface_mod.f. That file contains the arguments lists, the types and description of the routine parameters, and the returned types of FUNCTIONs.

Please note that the INTERFACEs can change in future versions. The ultimate reference remains ulib/u_interface_mod.f.

2. ESTIMATION GRID

SEQUOIA can handle both finite-element grids made up of triangular elements, and finite-difference grids in which the cells are quadrangular. Finite-difference quadrangles are actually split into two finite-element triangles internally. Both types of cells can therefore be mixed in a single grid. This is useful for instance in order to reproduce a smoother coastline in coastal areas.

The estimation grid can have any geometry and any topology, including islands or lakes, and can lie in a plane or on a sphere. In the spherical case it can be periodic. However the built-in geometric routines will fail in polar or quasi-polar situations.

At this time, there is another important limitation: SEQUOIA assumes that all state vector variables are available at each node of the estimation grid. This is due to the built-in interpolation algorithm which interpolates over triangles to observation locations. As a consequence, if required, the user must interpolate the missing model variables at SEQUOIA grid nodes in routines such as ulib/u_yf.f. For instance, if the numerical model is solved on an Arakawa B grid, the user has two solutions:

  • Set up SEQUOIA on one of the overlapping model grids, i.e. either the velocity grid (on the nodes of which the velocity components are defined) or the mass grid (where all scalar variables are defined); whenever u_yf() asks the model for a forecast variable at a location where it does not exist (because it is defined on the "other grid"), the user must interpolate that variable on the B grid; then the user must interpolate the SEQUOIA correction on the whole B grid after analysis
  • Set up SEQUOIA on the grid made up of the velocity grid + the mass grid; a similar interpolation must occur in u_yf(); but there is no need for any further interpolation of the correction since it is available on both components of the B grid.

The same type of comment holds for some types of finite elements where variables are defined at the middle of triangle sides and not at vertices.

The model grid is declared in the user-written routine ulib/u_ingrid.f. That routine must define the grid variables and arrays listed in shak/ak_geo_mod.f: grid-level properties, nodes and the way they are connected, grid parameters, etc. There are two ways of doing this in practice in u_ingrid.f:

  • Simply USE ak_geo_mod and set the required variables and arrays using the comments given in shak/ak_geo_mod.f
  • Use the kernel calls ak_geo_declare_grid(), ak_geo_declare_node(), ak_geo_declare_Dcell() (where D=3 or D=4 depending on whether a finite-element cell or finite-difference cell is declared).

The second method is more error-proof and is preferred (it is the method used in the distribution files). Here are the INTERFACEs for those kernel calls as well as the description of the call parameters:

LOGICAL FUNCTION ak_geo_declare_grid( ak_nodes, ak_cells, ak_mvs, coord_type, pradius, orglon, orglat) Declares grid-level parameters. See argument details in shak/ak_geo_mod.f .
INTEGER :: ak_nodes Number of allocated grid nodes (this must be equal to or larger than the actual number of grid nodes)
INTEGER :: ak_cells Number of allocated triangular cells (this must be equal to or larger than the actual number of triangular cells) (reminder: one quadrangular cell = two triangular cells!)
INTEGER :: ak_mvs Number of allocated MVS slots on the vertical, i.e. max. number of variables on the vertical in the state vector
INTEGER :: coord_type Horizontal coordinate type: 0=cartesian, 1=tangent-plane(e.g. beta-plane) 2=spherical
REAL :: pradius

When coord_type == 1, this is the planet radius in kilometers -- A few choices: 

  • Earth, mean 6370.949 "Handbook of chemistry and Physics"
  • Earth, mean 6378.137 "Half major axis of WGS84"
  • Earth, equator 6366.198
REAL :: orglon, orglat When coord_type == 1, these coordinates are the longitude and latitude of the origin of the tangent plane (point of tangency).
LOGICAL FUNCTION ak_geo_declare_node( node_id, xnod, ynod, znod, nmvs, rmvs, tmvs) Declares an individual node. The grid-level parameters must have been defined beforehand. See argument details in shak/ak_geo_mod.f .
INTEGER :: node_id A unique node identification number
REAL :: xnod, ynod Horizontal node coordinates
REAL, DIMENSION(nmvs) :: znod Vertical node coordinates ("depths" or "altitudes") in Packed MVS format, i.e. for all elements of the Reduced Multivariate Vertical Sequence (MVS)
INTEGER :: nmvs The number of elements of the Reduced Multivariate Vertical Sequence (MVS) at that location (this is typically the number of active state vector elements at that location, for instance the total size of the state vector minus the number of elements which are "under the seafloor")
REAL, DIMENSION(nmvs) :: rmvs The Reduced Multivariate Vertical Sequence (MVS) at that location
REAL, DIMENSION(nmvs) :: tmvs The variable types in Packed MVS format, i.e. for all elements of the Reduced Multivariate Vertical Sequence (MVS) at that location.
LOGICAL FUNCTION ak_geo_declare_3cell( node_ids) Declares an individual triangular cell. The corresponding nodes must have been declared beforehand. See argument details in shak/ak_geo_mod.f .
INTEGER, DIMENSION(3) :: node_ids The identification numbers of the nodes located at the vertices of the triangle.
LOGICAL FUNCTION ak_geo_declare_4cell( node_ids) Declares an individual quadrangular cell. The corresponding nodes must have been declared beforehand. See argument details in shak/ak_geo_mod.f .
INTEGER, DIMENSION(4) :: node_ids The identification numbers of the nodes located at the vertices of the quadrangle.
WARNING: Nodes must follow each other and be ordered in anticlockwise (direct) sequence around the quadrangle.

3. VERTICAL ORDERING: THE MULTIVARIATE VERTICAL SEQUENCE (MVS)

SEQUOIA uses hypotheses on how things are ordered on the vertical. This is explained in more detail now.

Let us consider the list of prognostic model variables at one (x,y,t) location. Depending on the model, this list could look like {u(z),v(z),w(z),h,T(z),S(z)} with classic notations. In order to be restarted after correction, the model will need the new values of all these variables (which we call the "model state") derived in a consistent manner.

It is generally possible to identify a short list of corrections which will in turn infer corrections on the rest of the model state. For instance, if we assumed geostrophy on increments, it would be sufficient to calculate corrections on {T(z),S(z),h}, or on {T(z),S(z),bstr} where h is the surface height anomaly and bstr is the barotropic transport streamfunction (for a rigid-lid model). We will call this short list of variables defined at any (x,y,t) location the "full state" or "filter state". We will call that sequence of variables the Multivariate Vertical Sequence, or MVS. For instance, the MVS might be:

  • 31 levels of temperature
  • 31 levels of salinity
  • 1 instance of barotropic transport streamfunction

which makes a sequence with a length of 63. Of course, sone of the "slots" in the sequence might be empty at some locations (e.g. when some model levels are not present because of bottom topography or even land).

Note that the full state needs not be composed of prognostic variables only. Diagnostic variables are also appropriate if they permit to go back to the complete model state and if they are observable.

At some location (x,y,t), SEQUOIA will expect a profile to appear in "Packed MVS" format. Let us assume that in the previous example there are only two model levels available. Therefore, only five slots will be filled in the MVS. Such a profile stored in MVS format would use too much space. In Packed MVS format, two vectors are defined: a vector of values, and a vector of pointers to the original MVS sequence. For our example:

  • vector_1 = {T(1),T(2),S(1),S(2),bstr}
  • vector_2 = {1,2,32,33,63}

vector_2 above is called a Reduced MVS or "RMVS" in the SEQUOIA system. vector_1 is said to be in a Packed MVS format. The index allowing to access elements of the RMVS and of MVS-packed arrays is called MVS index (in our example, the MVS index can vary between 1 and 5 at that location).

The Reduced MVS must be defined at every model gridpoint during the definition of the estimation grid.

4. VARIABLE TYPES AND OBSERVATION OPERATORS

As many variable types can be defined in SEQUOIA as needed within the limit of AK_TYPE in shak/ak_obs_mod.f. For instance, temperature, salinity, humidity, x/y components of wind, x/y components of currents, sea elevation, etc. are variable types. Variable types must be defined:

  • At each node of the estimation grid, to qualify the model variables which are available there
  • For each input observation. An observation operator must be provided for each observational variable type. Observation operators must be entered by the user in ulib/u_obsop.f (simple examples as well as comments are given in the distribution, see also the summary of user-written routines). In SEQUOIA, observation operators are local and act only in the vertical and cross-variable dimensions.

Obviously, variable types definitions for the model must match the definitions for the observations (e.g. if temperature is type 2, if will be 2 both for the model and for the observations).

5. K-D TREE SEARCH

By default, SEQUOIA uses fast search routines based on k-D tree search (akin to multidimensional dichotomy). The parameters to switch off k-D tree search or to modify its behavior are located in shak/ak_kdtree_lib.f. If k-D tree search is switched off, slower exhaustive search is used. The results should be the same, but k-D tree search is much faster if the data count and/or grid size are large.

6. FORECAST ERROR STATISTICS

The required forecast error statistics are defined dynamically at each analysis time in ulib/u_setstats.f. They can therefore be time-dependent and calculated as part of the integration of a filter. SEQUOIA can obtain those error statistics in either one of the following ways:

  • If IF_EFS=.TRUE. (Ensemble execution), the kernel will use ensemble statistics calculated from member samples gathered, e.g. from the cluster nodes (see ensemble execution). In this way, an Ensemble filter is programmed. For instance, for a full-order kernel such as BELUGA, an Ensemble Kalman Filter can be programmed. If the kernel is a reduced-order kernel, a Reduced-Order Ensemble Kalman Filter is programmed.
  • Otherwise, by default, error statistics will have to be directly specified in ulib/u_setstats.f. This will be the case if some sort of Optimal Interpolation scheme is programmed. If the kernel is a reduced-order kernel, a Reduced-Order Optimal Interpolation scheme will be programmed.

The actual statistics, fields and parameters which must be set inside ulib/u_setstats.f depend on the type of analysis kernel which is used. Examples follow: (see the distribution versions of ulib/u_setstats.f for more details)

  • "sofa" reduced-order kernel: inverse of the simplification operator on the vertical (s_inv) in Packed MVS format, correlation radii (rcx, rcy, rct), size of the influence bubble (multiplicative factors rix_factor, riy_factor, rit_factor)
  • "mantaray" reduced-order kernel: inverse of the 3-D simplification operator (s_inv) in Packed MVS format, size of the influence bubble if analysis is local (ri_x, ri_y)
  • "beluga" local, full-order kernel: local representers and representer matrix, size of influence bubble (ri_x, ri_y)

The reduced-order kernels (sofa, mantaray) also call ulib/u_ef_std.f which must return the forecast error standard deviation in reduced state space.

7. DIMENSIONALITY

One natural way of solving an assimilation problem with SEQUOIA is to keep everything dimensional: the observation operator in u_obsop(), the forecast error statistics in u_setstats(), the analyzed state increment returned by ak_dxa(). However most analysis kernels involve an algebraic inversion of some matrix which can be ill-conditioned in multivariate problems depending on how units are chosen. It might therefore be more efficient to use a nondimensional version of the state vector. This comes back to changing the norm for the problem at hand.

In univariate problems, the norm of the state vector is simply the sum of its squares. In multivariate estimation problems, a norm which will not depend on the choice of units must be defined. In general, this comes back to defining a constant scale for each variable type (e.g., 0.5°C for temperature, etc.) and defining a derived norm by (1) normalizing the vector with the scales, then (2) calculating a standard norm (sum of squares). When temperature (or salinity, etc.) profiles are included in the state vector, it is good practice to divide the corresponding values by the square root of the number of levels so that the "profiles" part of the state vector does not dominate the norm.

If we nondimensionalize our problem along those lines:

  • The values returned by u_obsop() (i.e. the elements of the observation operator) must be dimensional and must therefore have been multiplied by the ad hoc scales inside u_obsop()
  • The forecast error statistics in u_setstats() must be nondimensional (they can be calculated from dimensional state vector samples whose elements have been divided by the scales)
  • The analyzed state increment returned by ak_dxa() is nondimensional and must be multiplied by the scales before using it for correcting the model forecast
  • The observations, innovation vector and observational error statistics remain dimensional.

The first two steps above are done automatically by the ak_geo_declare_scales() call (see shak/ak_geo_declare_scales.f).

8. COORDINATES

Grid coordinates and data coordinates must be in the same coordinate system (the coordinate system is set as part of the estimation grid definition). For horizontal coordinates:

coord_type == 0 all horizontal coordinates must be in cartesian units (e.g. km)
coord_type == 1,2 all horizontal coordinates must be in degrees; the first coordinate is assumed to be longitude, and the second to be latitude

The vertical coordinate is arbitrary but must be the same everywhere. The sole exception is the "ulib1" version of u_obsop.f which assumes that vertical coordinates are downward-increasing depths (but that can be easily changed).

The horizontal scales (influence and correlation radii) needed by some kernels must be in cartesian coordinates (e.g. km).

9. TIME

All times in SEQUOIA are assumed to be Julian Dates or fractions thereof.   All times (parameter file, data files, Fortran calls) must be in the same Julian calendar.  Besides this, everything is possible, including using seconds or centuries as units; the time sequencing will be correct provided that everything is in the same unit.

10. SEQUENCING AND COUPLING WITH THE NUMERICAL MODEL

In the SEQUOIA system, the SAP component takes care of the sequencing and of the macro-coupling with (1) the analysis kernel, and (2) the numerical model.

SAP needs to control the model's basic functions: initialization, integration, and conclusion. To that end, three LOGICAL FUNCTIONs must be provided by the user as part of the User Library to serve as an interface between SEQUOIA and the numerical model:

  • u_modini(i_efs,t) must initialize the model for member i_efs at time t
  • u_modrun(i_efs,t1,t2,t3) must correct the model fields at time t1 using the SEQUOIA analysis and integrate the model from time t1 to time t3 while saving a restart file at intermediate time t2
  • u_modend(i_efs,t) must conclude the model run at time t
  • In addition, the routine u_postan(), usually empty, is called as a post-analysis step during each assimilation cycle.

Again, details can be found in the distribution ULIBs, and in particular in ulib/u_interface_mod.f and u_nullmodel.f.

One possible handy way to split the model calculations into those three functions is to turn the main model PROGRAM into a SUBROUTINE and then to add ENTRY statements corresponding to those three sections. (The ENTRY statement is obsolete in Fortran-90, but several models are still written in Fortran-77.)

Inside u_modrun(i_efs,t1,t2,t3), we will find the following sections:

  • Retrieve the restart fields at time t1
  • Call ak_dxa() (see ulib/u_interface_mod.f) and correct the restart fields
  • Time-loop from time t1 to time t3
    • At the end of each time step, call ak_dy() to store the current innovation components
  • At time t2, store fields to restart from on next iteration

One last item: at any time, the model must be able to provide a proxy ("model equivalent") to any given observation which occurred during the current time step. This is done by means of u_yf(), which must be provided by the user. Note that u_yf() is a LOGICAL FUNCTION and can return .FALSE., which will result in rejecting the observation from the upcoming analysis. This mechanism permits for instance to omit observations which lie in a dynamically masked area such as a river plume, etc.

11. ENSEMBLE EXECUTION

At this time, Ensemble execution is only implemented with the BELUGA kernel, yielding a Local Ensemble Kalman Filter (LEnKF). There is no theoretical or practical obstacle to modifying the other kernels for Ensemble operation as well, but such an application has not been developed in SEQUOIA yet and would have to be programmed by the user.

Interprocess member synchronization is done by means of the ASEYA library, a part of the shared kernel library.

The ASEYA library has been developed for SEQUOIA but is available separately. It typically runs ensemble members as slaves of a master process which organizes "pow-wows" at different execution stages to synchronize their execution. The processes (identical instances of a SEQUOIA executable) can run on the same machine or on different machines (for instance on a cluster, with fast internal network connections). Typically in an ASEYA implementation a slave process is an ensemble member, while the master process takes care of the collection of data produced by the members in order to calculate the appropriate statistics for data assimilation. However, the master process can also be an ensemble member (with member rank 0) which will run the numerical model as well.

ASEYA has several nice features hardly found elsewhere:

  • Delays and time-outs are managed and programmable (e.g. faulty communications do not hang Ensemble calculations forever)
  • Hanging or crashed or otherwise unsatisfactory members or cluster nodes can be manually dropped at any time by the user, or automatically dropped after a time-out, making it possible to continue execution of the Ensemble with the remaining members and nodes
  • Any member can kill itself (abort) as it encounters trouble, with the result that it will automatically and gracefully be dropped by the master and by the other members, allowing the execution to continue
  • If the master aborts, all members will automatically abort
  • It is possible to manually and simultaneously lock all members (e.g. in order to examine results or to see which ones should be dropped), and also to simultaneously abort all members.

This particular implementation of ASEYA is based on files. All members and the master must run in the same directory, which requires the use of NFS or similar technology on a cluster. It is then up to the user to decide how large data transfers to and from the master node are handled -- as soon as they are ready, or as batches. Of course each member can also have its private files not shared with the master node (not cross-exported).

Typically the steps needed for Ensemble execution of SEQUOIA on a cluster are as follows:

  • Choose a master node, or the front-end, and export a working directory from that machine to all other cluster nodes where members will be run
  • "make" the same SEQUOIA executable on all machines (make once and copy if architectures are the same)
  • Run an instance of that executable on all machines, specifying the member rank on the command line.

More detailed information can be found in shak/aseya_lib.f .

12. SUMMARY OF USER-WRITTEN ROUTINES

SEQUOIA calls routines which must be written by the user. They must be contained in the user's customized User Library (ULIB). Their list and functions are summarized in the table below. The third columns shows the default versions which can be found in the "ulib1" version of the User Library. Detailed Fortran INTERFACEs can be found in ulib/u_interface_mod.f.

ROUTINE NAME FUNCTION DISTRIBUTION (sequoia-KKKK-ulib1-VV.tar.gz)
u_ingrid sets the estimation grid u_ingrid_FD.f: cartesian, rectangular, non-periodic FD grid with one variable type and trivial MVS
u_io_ak_prm loads analysis kernel parameters u_io_ak_prm.f: fully functional loader with NAMELIST input
u_io_sap_prm loads SAP parameters u_io_sap_prm.f: fully functional loader with NAMELIST input
u_io_data loads data for the entire run u_io_data.f: fully functional loader with choice of free-format input (DFORMAT==0) or binary input (DFORMAT==1)
u_modini initializes the numerical model (cold start) u_nullmodel.f: empty
u_modrun corrects the numerical model (warm start) and integrates it forward u_nullmodel.f: only calls ak_dy() after each time step
u_postan perform post-analysis tasks u_nullmodel.f: empty
u_modend perform numerical model conclusion tasks u_nullmodel.f: empty
u_yf returns model forecast in data space u_nullmodel.f: returns zero (null model)
u_xf returns model forecast in state space (needed by EFS only) u_nullmodel.f: returns zero (null model)
u_obsop observation operator (see also Dimensionality) u_obsop.f: simple observation operator with choice of sea-level (TYPE==1), temperature (TYPE==2), salinity (TYPE==3), u-velocity (TYPE=4), v-velocity (TYPE=5), vertical average (TYPE=6)
u_efs "ulib1": not used
"ulib2": EFS integration; collect members results across cluster and calculate Ensemble assimilation statistics
u_efs.f:
"ulib1": skeleton
"ulib2": contains skeletons of results query routines
u_setstats sets the forecast error statistics (see also Dimensionality) u_setstats.f: kernel-dependent simple statistics (e.g., stationary, homogeneous)
u_ef_std returns the reduced state space forecast error standard deviation in reduced order kernels (mantaray, sofa) (for reduced-order kernels:) u_ef_std.f: kernel-dependent simple statistics (e.g., stationary, homogeneous)
u_range checks if innovation is within acceptable range u_range.f: no check (pass-thru)

NOTE: u_wrola() and ulib/u_ola_lib.f do not exist anymore in version 1.2.1 (the OLA features are handled directly by the kernel)

13. SUMMARY OF USER-CALLABLE ROUTINES

The table below lists SEQUOIA routines which need to be called by user code as part of the coupling between SEQUOIA and the numerical model:

ROUTINE NAME FUNCTION CALLER
ak_dxa gets the analyzed state increment (see also Dimensionality) u_modrun: called initially by the numerical model in order to correct its forecast fields after an analysis
ak_dy updates the innovation vector at current model time u_modrun: called by the numerical model at the very end of each time step
ak_ea gets the analysis error variance (see also Dimensionality) only for information -- not used in calculations

14. EVENT LOGGING

Error reporting and most informational output is under the responsibility of the SEQUOIA event handler shak/ak_event.f. Events recognized in the standard version of ak_event() include: start of run, end of run, informational message, warning signal, fatal error signal. The behavior of SEQUOIA in those occurrences can be easily changed by modifying the code.

  • Since all error codes are different, it is easy for instance in an operational system to intercept a particular error signal and launch a recovery procedure.
  • It is also straightforward to add start-up tasks and conclusion tasks.