Variational Data Assimilation in A Class of Environmental Disasters: Mathematical Aspects

Variational data assimilation can be a powerful allied in order to estimate the determination of the trajectory of a pollutant accidentally spilled on the sea surface, when a set of observation of the resulting pollutant spot is available, as well as a computational model of the sea surface dynamics of the region of the accident. In order to solve the computational problem, the adjoint equations method will be of the paramount importance. Here it is presented the mathematical foundations of the problem, a brief discussion of some computational issues, and an very simple example is discussed.


Introduction
In several accidents involving the spillage of pollutants in the sea surface, one of the main problems faced by the authorities is to determine where to combat the resulting spot.The experience have shown this is a hardy problem.For example, in 2000, there was a terrible accident involving the Brazilian company, Petrobras, in a bay known as Baía de Guanabara, in Rio de Janeiro.An large amount of oil leaked from Reduc (see Figure 1), and, despite all the efforts, it reached in less than 48 hours I do Governador, São Gonc ¸alo and Niterói, as in a real cat and mouse game.It was estimated some mangrove with a rich biodiversity would need some 20-30 years to recover its original state, in an optimist scenario.At that time, we were trying to understand the mathematical foundations of Variational data assimilation in the meteorological context, as a way to improve the numerical weather predictions, and we thought that technique could be used to obtain a better estimation of the trajectory of the oil spot, using the available information.In fact, from the mathematical view point, both problems have the same structure, namely, a set of partial differential equations, with initial and boundary conditions, describing the problem dynamics, and also we could have a set of information of the the object we want to estimate, that is, meteorological variables, in one case, and the shape and location of the spot, in the other.Data assimilation, in its two main approach, sequential and variational, establishes that the assimilation of all available information (roughly, observations) will improve the predictive ability of the model, and that is a passive process, in the sense that it does not interfere with the model itself.
The key point in the variational data assimilation is the assumption of the existence of an initial condition (or model state) such that, when this condition is used as the initial data of the model, the subsequent states of the model will minimize a functional that measures the discrepancy between model and observed states, and, if so, the procedure is more than adequate as a first step to solve the problem of the oil spillage problem.
Although the data Assimilation method can be invaluable in computational simulations, since observations of the systems act ont the numerical model as restriction which the solution must verify, with the exception of a few articles and books, the mathematical formalism is not always clearly presented, and this is not an appropriated situation to the development of the method, since important aspects for its future must be placed in solid theoretical foundations.Therefore, is our intention to present a rigorous mathematical formulation not only for Variational Data Assimilation but also Adjoint Equation Method as well, so as to make them perfectly understood.
In order to realize the numerical control of the space-time evolution of a system involving a geophysical flow it is necessary to know its initial condition, which is, the numerical counterpart of the real initial state or the initial configuration of the system under study.In the case of geophysical flows, the dimension and the geographic location of the spatial domain problems make the distribution of an observational network difficult, what causes a scarcity of data about the phenomenon studied or even turns the initial configuration unavailable for computational simulations (Talagrand, 1997).As in those flows are presented phenomena of difficult representation in a numerical model, and even in analytical models, being, in certain situations, processes highly sensitives to initial conditions variations , to establish an initial condition adequate for simulations, in this case, is of the crucial importance for the results.
In fact, although we have not considered here the question, we should mention that due to the range of space-time scales present in oceanographic problems, any numerical model designed to solve one of those problems will face numerical instabilities related to the fractal structures inherent in those situations (Candy, 2009) For an adequate initial condition of the numerical experiment, one means that one which generates the numerical model states that fit, in accordance with the desired precision, the available observations.In order to decide whether or not an initial condition is appropriated to certain simulation, one should consider, in addition to the flow dynamics, the full set of observations of the system being studied.That is exactly what is performed by the methodology known as Data Assimilation, approach in which all the knowledge on a system being analyzed, its dynamics and observations, is combined in order to produce an initial condition that minimize the error between the model generated configurations and the correspondent known configurations, in a sense to be precise later.
A problem involving geophysical flows in its discrete form produce sa very high number of variables, so that, straight away, Variational Data Assimilation is a methodology computationally intractable.A solution for this difficulty can be obtained with the use of the Adjoint Equations Method, which adequately insert the problem of the initial condition determination to simulations of geophysical flows in the domain of Optimal Control Theory , and its mathematical foundations are analyzed here with the necessary accuracy.A second consequence of the resulting methodology of combining Variational Data Assimilation with Adjoint Equation Method is the transformation of a minimization problem with restrictions, as originally proposed by the Variational Data Assimilation formalism, in a minimization problem without, what makes it possible the use of more agile routines, as L-BFGS (Liu and Nocedal, 1989), in the determination of an optimal initial condition.
In the first section of this work, the mathematical foundations of two crucial methods are presented, Variational Data Assimilation and Adjoint Equations Method.In the second section, in order to fix ideas, one works with a one-dimensional model which permits the exploration of several aspects of the methods used and also to develop explicitly the adjoint of a given equation.Finally, in the third section, one describes a bi-dimensional numerical experiment, showing with some detail (i) an algebraic formalism, fundamental in the development of a computational program to perform the experiment, formalism that has already produced, in the Computational Linear Algebra domain, a new line of investigation, known as Automatic Differentiation and as Automatic Adjoint Generation (Faure and Papegay, 1998), and (ii) the structure of a FORTRAN code, created by the author, in order to realize the Variational Data Assimilation of a flow described by the bi-dimensional advection-diffusion equation.

The Mathematical Approach Definition
, the dual vector space of R n , it follows from Linear Algebra the existence and uniqueness of the vector where ⟨, ⟩ is the canonical inner product in R n .

The Data Assimilation Method
Let S be an evolutionary system, observed during the time interval [t 1 , t 2 ], from which is available a set of observations, all of them collected in the same time interval and distributed in the spatial domain of interest on S.
Besides, it is known the dynamics of S. Based on those pieces of information, one wants to determine the initial condition of the modeled S, or the numerical representation of the configuration of S at the time t 1 , the starting moment for a computational simulation of S generates configurations to approach, within an acceptable precision, to the available observations of S at correspondent instants of time, in order to obtain a configuration of the model at t f > t 2 which reproduces the real configuration of S at time t f , also within an acceptable precision.From a conceptual point of view, this process is similar to the Best Linear Unbiased Estimation (Sorensen, 1980) of the configuration of S at t f , given the S configurations within [t 1 , t 2 ], here solved in a deterministic approach as an optimal control problem, in which the initial conditions is the control data.At first, one considers the continuous version of the Variational Data Assimilation problem, which admits the appropriate mathematical treatment.In this case, the mathematical objects are: • the system observations: given by Z : operator defined on the vector space V provided with the inner product ⟨, ⟩, where [t 1 , t 2 ] is the assimilation interval or assimilation window.
• the dynamics of the system S, described by where • the "weight" function W : [t 1 , t 2 ] → L(V), where L(V) is the vector space of the linear operators in V, that results from the statistics information of the instruments used to collect the data on S , and that, each t ∈ [t 1 , t 2 ], associate the injective linear operator W(t) : V → V.
• the quadratic functional Then the Variational Data Assimilation problem is to find a trajectory of the system states S, X : [t 1 , t 2 ] × V → V such that X is the solution of equation ( 1) which minimize the functional (2), in other words, we arrive to the following minimization problem with constrait: Problem M R : find the solution of equation ( 1) minimizing the functional (2) It would be much more interesting (and, in many problems, the only feasible way) if instead of the solution X ∈ E one searches the solution X I ∈ E I , where operators in V, which could be called the space of the initial configurations of S, that minimizes the restriction of the functional J to E I .
However, it is not known an explicit relation between J and X I , since J is not a function of X.Nevertheless, if the problem in equation ( 1) is well posed, in the Hadamard sense, then the knowledge of X and the knowledge X I are equivalent.The Adjoint Equation Method provides, from the relation between X and X I , the differential of the restriction of J to E I .

The Adjoint Equations Method
One rewrites the functional in equation( 1)3 as where As the differential of J in X ∈ E is a linear functional in the Hilbert space E with the inner product by using the Riesz' Representation Theorem (Kreyszig, 1989), one obtains, where ∇ X J and ∇ X T are the gradients of J and T , respectively, with respect to X. Now, one considers the linear version of the equation ( 1) resulting from the substitution of the operator F for its first order approximation in an equilibrium point, omitted from the equation, and in which it was used the notation ( ∂F ∂X ) for the differential Of F, and its adjoint equation is where ( ∂F ∂X ) * is the adjoint operator of ∂F ∂X .
Then one obtains the following result: is the solution of the following unconstraint minimization problem: Problem M U : find δx ∈ E I minimizing ⟨∇ X I J, δx⟩.
Proof : Let Y : [t 1 , t 2 ] × V → V be the solution of ( 5) such that Y(t 2 ) = 0 and δX ∈ E I is a perturbation of any solution of equation ( 6).Then it follows that Equation ( 7) is the main point to be addressed by the Adjoint Equation in this context, and its meaning must be clear.The right side of the equality is the expression of the differential of J (relative to X), dJ(X).As X I is the projection of X onto E I , the left side of equation ( 7) is the expression of the differential of the restriction of J (relative to X I ) to E I .By the uniqueness of the representation of the last differential as an inner product in E I , it follows that Y I is precisely the gradient of the restriction of the funtional J to E I , that is, Y I is the projection of ∇ X J onto E I .Then, one obtains Therefore, using the Adjoint Equation Method, one projects the solution set which represents, from the computational viewpoint,a considerable reduction in the number of variables in the discrete problem.Besides, this projection transforms a constraint minimization problem in one unconstraint problem, namely, the problem of minimizing equation ( 2) with the restriction equation (1), becomes The problem M R , involving the quadratic functional J, defined in equation ( 2), has the existence of its solution guaranteed but not its uniqueness.In order to obtain the uniqueness of the solution of M R , and of M U , one redefines the problem using the following quadratic functional: where, X b is an available estimate of the initial configuration of X, and B ∈ L(V) is the operator determined by the available statistical information on the estimate of X b and Then, one obtains the following result: Theorem 2: Given Problem M R ′ : find the solution of equation ( 1) that minimizes equation ( 10), there exists a unique solution of M R ′ , and X 1 = X E I ∈ E I is the solution of the following unconstrait minimization problem Problem M U ′ : find the δX 1 ∈ E I that minimizes ⟨∇ X I J ′ , δX 1 ⟩ E I .
Proof: It is only necessary to redefine the inner product on E as Let X b be the available estimate of the initial configuration of X.The problem of optimal solution will be iteratively solved by shearching the perturbation δX I ∈ E I of δX ∈ E such that (X b + δX) will be the solution of M U ′ .
Let us emphasize once again this very important point: we need the adjoint code (of a giving code) in order to obtain de differential of the functional J with respect to the initial conditions X I ∈ E I

Implementing the Adjoint Equation Method
Another interesting and also important feature of the implementation of Variational Data Assimilation is the development of automatic routines to perform the data assimilation using the Adjoint Equation Method.Nowadays, due to the several academic and industrial application of the method and considering, the extension of the work needed to produce such computational routine, the automatic generation of the adjoint to a given code can be realized by computational codes developed for this task (Faure and Papegay, 1998).As a good knowledge of the rules used in the preparation of the linear and the adjoint programs to a given computational code precede the correct use of such automatic routines, it is presented here some illustrative examples of the rules to be followed when manually generating these codes.Of course, the rules are the same used by automatic routines.

The Linear Code
A computational code in, exempli gratia, FORTRAN language is nothing but an ordered sequence of command lines and sometimes one or more subroutines or functions, which are also made of an ordered sequence of command lines.One begins defining a rule to obtain the adjoint instruction to a given command line, and, in the sequel, it is derived the adjoint instruction to the instruction resulting of two consecutive command lines.Let one consider the following FORTRAN command line, Z=2*X+Y**3 which can be identified with the fuction or even with the function a very useful process in order to clarify the many stages in the work out of the adjoint instruction and its implementation, decreasing the possibility of errors when writing the computational code.
Although equation ( 11) and equation ( 12) are mathematically equivalents, notation in the latter is much more convenient to write the computational codes.
As g is a nonlinear operator, and nonlinear operators do not posses adjoint, when a command line is involves one nonlinear operation, the first action to do is to take the linear version of the nonlinear declaration.This can be achieved simply taking the differential of the operator, or in matrix notation, much more useful for the adjoint process, considering the jacobian matrix of the operator.In the case of function g, its jacobian is given by Then the linear of the instruction g, dg, is where the subscripts o and i denote, respectively, the output and the input amounts stored in the variables, just after and before the execution of the command line, that is, Continuing to associate each instruction line of the code to a function, which means that a program is a composition of functions, so the linear version of the given program is obtained by using the Chain Rule of the Calculus, while the adjoint of a given code, by multiplying the reverse order the adjoint matrices that correspond to each command line.The processes, although simple, demands careful, since one outside variable in a command line can or not be defined using the same variable (in this case, seem as na inside variable), as it will be considered later.
Consider, for example, the following sequence of instructions with the associated functions that we will refer as i 1 and i 2 , respectively.
To make clearer the processes of taking the linear (and later also the adjoint) of a given instruction, as well as the writing of the correspondent instructions in FORTRAN, one adopts the following convention: let E i , E in and E o , respectively, the input, the intermediate and the output spaces, that is, the sequence i 2 • i 1 , meaning, applying first i 1 and then i 2 , will represent by Therefore, writing v i ∈ E i , v in ∈ E in and v 0 ∈ E o , and as its known the relations between them, given the perturbation data Computing the jacobian matrices of d f 2 (v in ) and d f 1 (v i ), we obtain And in matricial form, the sequencial instructions produce, being d i = (dx i , dy i , dz i , dw i ), such that Leaving the input and output indexes (which are not written in a real code) and discarding the unchanged variables, as dx = dx and dy = dy, the linear instruction of instructions i 1 and i 2 above are the following FORTRAN instructions: as it was expected.

The Adjoint Code
The linear code is an intermediate step in order to obtain the adjoint of a given code.Considering again the sequence of i 1 and i 2 instructions, we have the following adjoint operators Considering that v o = (x i + 2y i ) and leaving the indexes, one obtains the adjoint instruction in FORTRAN as: A more complex situation, demanding a careful scrutiny, arises when one of the variables appears in one command line as an output variable and as an input variable in the following command line, will be considered in the sequel.So, let i 3 and i 4 be, respectively, the instructions: Consider again the input, intermediate and output spaces, E i , E in and E o , all of them isomorphs to R 2 , and the functions representing the instructions i 3 and i 4 We define the function associate with the sequential action of i 3 and i 4 instructions, which gives, for any v i ∈ E i , And, applying the Chain Rule, we obtain, for a given vector Therefore, expression ( 16) is given by Now we can proceed to derive the adjoint instructions of i 3 and i 4 .Taking the adjoint of matrices in (17),we have ) , and, then, we obtain ( 1 y 2 0 2xy

A Very Simple Numerical Experiment
One studied the oil stain trajectory determination based on a a posteriori simulation of the Guanabara Bay accident in 2000.We use a number of 16 computer generated oil spot as the the observation within the time interval [t 1 , t 2 ], were t 1 = 19 : 00h (January, 18) and t 2 = 19 : 00h (January, 22).
The transport of the oil spot was described by the bi-dimensional advection-diffusion which is the universal transport model of oil in the sea surface (Lehr and Cekirge, 1980).
One has integrated then the Equation ( 18) using as initial condition the artificial oil spill (Figure 2), and, after the integration of Equation ( 18) (in G(U)), we storage these configurations as observations of the oil stain, for example, the configurations obtained at the times t = 6 × i, i = 1, ..., 16, which is one observation (for t = 19 : 00h, January, 19) and is shown in (Figure 3).After we performed the direct integration, we use a perturbed initial configuration, shown in (Figure 4), as new initial condition, integrate Equation ( 18), again using G(U) routine, and we compare the discrepancies between the correspondents configurations, that is, "observational" configurations and new computer generated with the perturbed initial conditions at the same time in J(G(U)) routine.If the value measured after J(G(U)) routine is above the a priori tolerance, the main program calls the subroutine [D(G(U))] * , in order to obtain the integration backward in time of the adjoint Equation ( 6), which gives the gradient of the functional J in relation to the initial conditions of the problem, which is then sent to the minimization subroutine (L-BFGS), giving a new initial configuration U which, after integration in G(U), possible will decrease the value of J(G(U)).When the error generated by the two configurations, model generated and observations, measured in J(G(U)), is bellow the tolerance, the process of Data Assimilation stopped, producing the optimal initial condition (Figure 5).

Discussion
Despite the simplicity of the numerical experiment we realized, Data Assimilation, process in which observations of the studied systems are considered as conditions to the computational simulation must verify, together with Adjoint Equation Method, produce a methodology to obtain reliable and highly accurate computational configurations of a system at a feasible computational cost and within a time interval that allows an effective decision in an oil spill.A further question, also in the realm of the formalism presented here, is the optimal location of measurement instruments, enhancing an existing network.