Copulas and Dependency Measures for Multivariate Linnik Distributions

Alexandru Belu1, Wesley Maddox2 & Wojbor A. Woyczynski3 1 Bloomberg, LP., Financial Software Engineering, New York, NY, USA 2 Department of Statistical Science, Cornell University, Ithaca, NY 14853, USA 3 Department of Mathematics, Applied Mathematics and Statistics, and Center for Stochastic and Chaotic Processes in Science and Technology, Case Western Reserve University, Cleveland, OH 44122, USA

In this section we recall the basic classical concepts related to Lévy processes, and infinitely divisible distributions.Two special cases, one-dimensional and multivariate, are considered: the α-stable processes, and α-Linnik processes.Then the fundamental properties of the two classes are provided and compared with each other.In particular, the differences in the tail behavior of their distributions, and the fractional moment properties are explained.A study of the nonlinear evolution equations driven by the infinitesimal generators of Linnik processes can be found in Gunaratnam and Woyczynski(2015).

Preliminaries on General One-Dimensional Lévy Processes and Infinitely Divisible Distributions
A stochastic process X t , t ≥ 0, is called a Lévy process (see, e.g, Bertoin (1996), Samorodnitsky andTaqqu (1994), andSato (1999)) if : (i) X 0 = 0, with probability 1, (ii) X t has independent increments, i.e., for any n ∈ N, and any 0 < t 1 < t 2 < • • • < t n , the random variables X t 2 − X t 1 , X t 3 − X t 2 , . . ., X t n − X t n−1 are independent, (iii) X t has stationary increments, i.e., for any t > s ≥ 0, X t − X s is equal in distribution to X t−s (iv) The trajectories of X t are right continuous with left limits, with probability 1.
The 1-D distributions of the Lévy processes are infinitely divisible (ID) which, in terms of the characteristics functions, can be expressed as the obvious equality: (1.1) for every n ≥ 1, t ≥ 0. A more detailed description of the structure of characteristic functions of infinitely divisible distributions is given by the following classical Lévy-Khinchine Representation Theorem: THEOREM 2.1.A random variable X has an infinitely divisible distribution if, and only if, there exist µ ∈ R, σ ∈ R + , and a nonegative measure Λ on R\{0} satisfying the condition ∫ R (1 ∧ |x| 2 )Λ(dx) < ∞, such that its characteristic function is of the form, ϕ X (ξ) := E[e iξX ] = e ψ(ξ) , (1.2) where the characteristic exponent (1. 3) The triplet (µ, σ, Λ) is called the characteristic triplet of X, ψ(u) -the characteristic exponent of X, µ ∈ R -the drift coefficient, σ > 0 -the Gaussian, or diffusion coefficient, and Λ -the Lévy measure of X.The Lévy measure describes the "intensity" of jumps of a certain size of a Lévy process.

One-Dimensional α-Stable Distributions
The self-similar (symmetric) Lévy probability distributions form the family of the classical α-stable distributions (denoted S (α, c) in this paper) with the characteristic functions of the form ϕ S (ξ; α, c) = exp(−|c ξ| α ), (1.4) where the constant 0 < α ≤ 2, is called the index of stability, and c is the scale parameter.In general, probability density functions (PDFs), f S (x; α, c), of the α-stable random variables do not have a clean closed-form representations in terms of elementary functions.Apart from the obvious Gaussian case of α = 2, we would like to mention the Cauchy case, α = 1, in which case the PDF is of the form, (1.5) The Lévy measure Λ S of the distribution of S (α, c) is absolutely continuous with respect to the Lebesgue measure, and its density is given by the expression, where the easily determined constant C = c α (2 ∫ ∞ 0 (cos v − 1)v −(1+α) dv) −1 .
(1.12) The importance of α-stable distributions comes from their universality as approximants of the distributions of sums of a fixed but large numbers of independent, identically distributed random variables.Indeed, the fundamental central limit theorem states that if X 1 , X 2 , . . ., X n , . . .are independent, identically distributed (i.i.d.) random variables then their sums , in distribution (after some rescaling ) to an α-stable distribution for some α ∈ (0, 2].This fact is directly related to the "stability" property: if ) has the same distribution as each of the X ′ i s .A method of simulating stable random variables can be found in Chambers, Mallows and Stuck (1976).
The α-Linnik distributions have a similar stability property, but under random geometric summation.Indeed (see also Kozubowski (1994)), we have the following simple result: PROPOSITION 2.1.If X, X 1 , X 2 , . . .are i.i.d.random variables and N is an independent of X 1 , X 2 , . . .random variable with the geometric distribution with mean 1/p, 0 < p < 1, that is, P(N = n) = p(1 − p) n−1 , n = 1, 2, . . ., then the following two statements are equivalent: (i) X is α-stable with respect to geometric summation, i.e., Proof: The verification is straightforward via the characteristic function: Two elementary, but crucial observations are in order here: Remark 2.1.The α-Linnik distributions are in the domain of attraction of the α-stable distribution, that is, if . and X i ∼ L(α, γ) for all i, then the distribution of Z n converges to an α-stable distribution as n tends to infinity.Indeed, working again with characteristic functions, we get Remark 2.2.If Z is a standard exponential random variable, and Y ∼ S (α, c) is an independent α-stable random variable then is a Linnik random variable with the distribution L(α, γ), and γ = c, see Devroye (1990).Indeed, Remark 2.3.If N is a normal N(0, γ) random variable, and M α/2 is an independent Mittag-Leffler random variable, then is a Linnik random variable with the distribution L(α, γ).Recall, that the Mittag-Leffler distribution with paramater α is typically defined in terms of its distribution function is the standard Mittag-Leffler special function.

Comparison of Tails and Lower-order Fractional Moments of α-Stable and α-Linnik Distributions
For 0 < α < 2, the tail behavior of the distribution of Y ∼ S (α, c) is as follows (see Samorodnitsky and Taqqu (1994)): (1.15) where In the α-Linnik case we have a similar result.

Lévy Measure of the Linnik Distribution
There is an obvious relationship between the characteristic function ϕ S (ξ; α, 1) of the symmetric stable distribution, and the characteristic function ϕ L (ξ; α, 1) of the Linnik distribution: . (1.21) This fact permits expressing the density of the Lévy measure of Λ L of the α-Linnik distribution in terms of the α-stable distribution (see, also Kozubowski and Rachev (1999)): where X ∼ S (α, 1).
Finally, using the tail asymptotics (2.14) of the S (α, c), one obtains the following asymptotics for the Lévy measure of the α-Linnik distribution: Remark 2.5.Generalized Linnik Distributions.Since Linnik distributions belong to the much larger class of geometric stable distributions (which also include Mittag-Leffler distributions), various different generalizations of the Linnik distribution exist.A commonly used generalization, called the asymmetric Linnik distribution, is defined by the characteristic function, (1.24)

Multivariate Linnik Distributions
Finally, we turn to the description of the multivariate d-dimensional Linnik distributions which will be the main object of investigation in the remainder of this paper.Despite the potential usefulness of these models in active applied research, not much effort has been made so far to develop methods for simulation and parameter estimation of these distributions.
Following Anderson (1992) we define a multivariate Linnik distribution L(α, Ω) via its characteristic function, where the transpose ξ ξ ξ T = (ξ 1 , . . ., ξ d ).The two parameters here are: α, which represents the thickness of the tail of the distribution, serving the same purpose as in univariate α-stable, and α-Linnik distributions, and the matrix Ω.Here, Ω is assumed to be symmetric and positive definite, much like a covariance matrix that is used to characterize a multivariate normal distribution.In the one dimensional case, this characterization reduces to Eq. (2.7), where γ = Ω 2 .
An integral expression for the PDF corresponding to this characteristic function is of the form (see Ostrovskii (1995), and Kozubowski (2000)), where K v is the Bessel function of the third kind.When α = 2, and d = 2, this aligns with the standard bivariate distribution function for the Laplace distribution.However, due to its complexity, this expression is typically of limited usefulness.
Remark 2.5.In analogy to the one-dimensional representation of Linnik distribution described in Remark 2.3 we also have the following mutivariate representation: If N N N is a normal N(0, Ω) random vector, and M α/2 is an independent Mittag-Leffler random variable, then is a Linnik random vector with the distribution L(α, Ω).

Basic Facts About Copulas
For a random vector (X 1 , X 2 , . . ., X d ) with continuous marginal cumulative distribution functions (CDFs), the copula function is defined as the joint CDF of the random vector So, the copula function contains complete information about the dependence structure between the components of the random vector (X 1 , X 2 , . . ., X d ).The classical 1959 Sklar's Theorem (see, e.g., Nelsen (2006)) states that every multivariate cumulative distribution function, can be expressed in terms of its marginal CDFs, F i (x i ), i = 1, . . ., d, by the formula Nice and concise proofs of this result, in both the bivariate and multidimensional cases, are given in Nelsen (2006).Sincce (U 1 , U 2 , . . ., U d ) have uniform marginal distributions on the unit interval I = [0, 1], a copula function in d-dimensions is a function mapping [0, 1] d into the unit interval [0, 1] with the following basic properties: (i) For every vector-valued u u u ∈ I d , if at least one u i = 0, then C(u u u) = 0.This is the so-called groundedness property of copulas.
(ii) Similarly, if all coordinates u i = 1 except for possibly u k , then C(u u u) = u k .
(iii) The copula function is uniformly continuous on its domain, that is for every u u u and (iv) Additionally, the marginal derivatives, C ′ k (u), all exist and are equal to u, ie Taken together, these properties show that the copula function, C(u u u) is grounded and increasing in every dimension (termed n-increasing) (Nelsen (2006)).
While (3.1) defines the joint distribution function in terms of the marginals and the copula function, the inverse also holds.
Taking the generalized inverse of the marginal distributions gives the copula function in terms of the marginals and the joint distribution, It is worthwhile to remember this relationship, as it is important in both sampling and estimation procedures for copulas (Nelsen (2006)).
Similarly, the joint probability density function (PGF), f (x 1 , ..., x d ), is also well-defined in terms of copulas.By taking successive partial derivatives ∂/∂x i of the copula function, we can define the copula density function as Thus, a simple application of the chain rule of differentiation gives the formula for the joint PDF: If the copula density is well-defined and easy to use, this gives a powerful tool for maximum likelihood estimation and sampling.However, in many cases, despite its existence finding it analytically may be intractable.

Examples
In this subsection, several examples of copula functions are given to make the preceding discussion on copulas less abstract.
Example 3.2.1.Independence Copula.The most basic copula is the independence copula, which codifies the relationship between independent random variables.Its obvious copula function is (2.4) Example 3.2.2.Gaussian Copulas.A more widely used and useful copula function is the Gaussian copula, which encodes a Gaussian relationship between distributions of random variables.Its copula function is given by where ϕ 0,Σ is the distribution function of the multivariate zero-mean Gaussian vector with covariance matrix Σ, and ϕ −1 is the quantile function of the standard normal 1-D distribution.This construction is efficient for both simulation and estimation, as the only parameters which must be estimated for the copula are just the rank-based correlations between the marginal distributions (easily estimated).
Through manipulation of the Gaussian density function, it is possible to get the copula density function (see Zezula (2009)).Since the multivariate density function of the Gaussian distribution with zero mean and covariance matrix Σ is of the form, where u i = x i /σ ii , it can be rewritten in the form, where R i j = Σ i j / √ Σ ii Σ j j (i.e., R is the correlation matrix corresponding to Σ).In view of Eq 2.3 the copula density function in this case is simply of the form, This density function is highly useful as it can be used for either simulation or analytic purposes.
The Gaussian copula is part of a larger class of elliptical copula functions, which follow the same type of structure (and thus simulation and estimation algorithms).Additionally, the ease of simulation and estimation for these copulas have led them to become widely used across mathematical finance (where they were the core of the original CreditMetrics model (see Li (2000)) and are increasingly used in the machine learning literature (see Rey (2015), Wilson and Ghahramani (2010) ).
Example 3.2.3.Elliptical Copulas.A random vector X X X is said to be elliptically distributed if its characteristic function ϕ(ξ) satisfies the following functional equation for some location parameter µ, some nonnegative definite matrix Σ, and some scalar function ψ.In terms of the density functions an elliptical distribution can be written in the form, where a represents a scale parameter, g is a scalar function, µ is the median vector, and Σ is a positive definite matrix.
Alternatively, a random d-dimensional vector X X X has an elliptical distribution if it admits a representation where µ µ µ ∈ R d , A is a k × d-matrix, and R is a positive random variable (random radius) transforming the k-dimensional random vector S S S which is uniformly distributed on the unit sphere.
An elliptical copula is thus defined as , where F is the distribution function of the related elliptical distribution and ) are the corresponding univariate quantile functions (see Mai and Scherer (2012) for more details).
Example 3.2.4.Empirical Copulas.The empirical copula is a parameter-less copula function that can be defined directly from a sample of random variables (see Nelsen (2006)).The following definition formulated just in the two-dimensional case can easily be extended to d-dimensions.
The empirical copula of a sample of size n of a continuous bivariate distribution is defined as where x (i) and y ( j) represent the i-th and j-th order statistics from the bivariate sample.
Alternatively and equivalently (see Ruschendorf (2009)), the empirical copula can be described in terms of the univariate empirical distribution function, which is simply If we let F n (x i ) = U i , we clearly know that U i ∈ [0, 1], then the empirical copula function is simply It is worth noting that the empirical copula function has been shown to be a consistent estimator of the true copula function, and that it converges uniformly to the true distribution function.Additionally, the error of replacing the true copula by the empirical copula decreases exponentialy to zero (see Poczos, Ghahramani and Schneider (2012)), so that, for any ϵ > 0, where u u u is just (u 1 , ..., u d ) written in vector notation.As several standard measures of dependence can be written in terms of the related copula function, the above result shows that numerical calculations using the empirical copula will provide strong estimates of these measures of dependence.

Multivariate Simulation
Several different methods for sampling from copula distributions exist.Drawing samples in the bivariate case is often accomplished conditionally (see Chapter 1 of Mai and Scherer (2012)), while in higher dimensions, algorithms typically are dependent on the class that the copula belongs to (see Chapters 2-5 of Mai and Scherer (2012)).In this subsection, only one simulation algorithm is provided due to its usefulness in the rest of the paper.
Happily, work from the theory of one-dimensional geometrically stable distributions can be easily extended, so that it is possible to simulate multivariate Linnik distributions(see Kozubowski (2000)).In one dimension, another equivalence holds for the Linnik distribution, Y α,γ where M ρ is a Mittag-Leffler random variable and X ∼ N(0, γ).This relation is due to the geometric stability of Linnik random variables and through manipulations of the mixture representation of Linnik random variables.Since this holds, it easily gives rise to a generator for the multivariate Linnik distribution, where M is distributed as above, and X X X ∼ N d (0, Σ).
Thus, it is also very easy to simulate from the multivariate Linnik distribution, since it is simple to draw random samples from a multivariate normal distribution.This gives rise to the following simple algorithm for sampling from the multivariate Linnik distribution: (a) Draw m ∼ Mittag − Leffler(α/2).

A Multivariate Linnik Copula
In this section a novel elliptical Linnik copula is presented.The first subsection focuses on the derivation of this copula through the fact that a Linnik distribution is part of the larger family of elliptical distributions.Next, two methods of drawing samples from this copula are described.Finally, a numerical estimation procedure is presented.

Elliptical Linnik Copulas
A copula can be constructed from the multivariate Linnik distribution using the knowledge that a multivariate Linnik distribution is elliptical.Thus, the Linnik distribution fits the definition of an elliptical distribution given in Section 3. Additionally, using Eqn (2.25) we can see that the relevant characteristic function, φ, is of the form, Then, a Linnik copula follows from the definition of elliptical copulas, where F α,Σ is the multivariate distribution function of the Linnik distribution and F −1 1 , . . ., F −1 d are the quantile functions of the Linnik distribution with unit spread.Unfortunately, no analytic form can be easily found with this construction.Even a copula density function is difficult to find analytically and may require using numerical differentiation.

Simulation
In this subsection, two methods of simulating data using the above defined copula are presented.One utilizes the Linnik distribution function, while the other bypasses this to utilize the empirical distribution function.They are demonstrated to be practically equivalent.
The first approach uses the inverse transform of Sklar's Theorem (in the bivariate case) defined as where the marginal distributions of x i are ∼ L(α, γ i ), where α is the characteristic exponent of the distribution, and γ i are the diagonal entries of the covariance matrix Σ.This approach gives rise to a relatively straightforward simulation algorithm to draw samples from a Linnik copula (with any marginals).First, draw samples from the multivariate Linnik copula (using the Algorithm from Section 3.3).Then, numerically evaluate the density function of the univariate Linnik distribution for these samples.Finally, one only needs to use the quantiles of the marginal distributions to obtain samples from the copula based distribution.Of course, this method assumes that the quantiles are known for the marginal distributions, but in practice, this is a low barrier.This method is of course equivalent to the standard method for drawing samples from the elliptical copula, given in Example 3.2.3.
An example of the scatterplot of 1000 samples from the Linnik copula with parameters α = 1.One issue with the sampling method described above is that the probability estimation part is quite slow and potentially inaccurate.For example, producing the scatterplot of just 1000 samples from a two-dimensional Linnik distribution takes about 29 seconds.
An alternative method is to use approximate probabilities for each of the samples in the distribution and to use the empirical probabilities generated using the empirical cdf.This is a very similar approach to the empirical copula function, and could be understood as estimating an empirical copula from samples of the Linnik distribution, but then using the copula to draw samples from another distribution.
In practice, this method produces very similar results to the actual Linnik copula, but is much quicker.An experiment with 10 replications of the Kullback-Leibler (KL) distance between samples of increasing size from the empirical cdf copula and the exact copula was performed, with results shown in Fig. 4.4.
The results of this experiment show the quick convergence of samples from the empirical copula to the Linnik copula for sample sizes of at least 325.This suggests that when the Linnik copula is modeled on real world data sets, using the empirical copula is an effective, speedy and highly accurate alternative.This convergence result is due to the convergence properties of the empirical copula to the true copula function (see Eqn. 2.7).

General Properties of Dependency Measures
Quantifying the dependence between random variables is a fundamental problem in statistics and many related issues remain unsolved.Metrics that are commonly used often pick up a limited class of associations, are computationally inefficient, or suffer from a lack of interpretability.In this section, we describe the requirements for the dependency measures, review several popular measures of statistical dependence and present results of experiments estimating dependency measures for Linnik copulas..
In (1959) classic paper on dependency Alfred Renyi identified seven properties that a measure of association, δ X,Y , must satisfy for any two random variables X and Y: 1. δ X,Y is defined for every pair X and Y.

Symmetry
4. Independence: δ X,Y = 0 if and only if X and Y are independent.
5. Monotonicity: δ X,Y = 1 if and only if X and Y are almost surely strictly monotone functions of each other.
6. Invariance: if f and g are almost surely strictly monotone functions on the range of X, and range of Y, respectively (i.e., bijective Borel-measureable functions), then δ f (X),g(Y) = δ X,Y 7. Pointwise convergence: if X n , Y n are sequences of random variables with joint distribution H n that converges pointwise to the joint distribution Several variants of this definition exist (see, e.g., Nelsen (2006), and Lopez-Paz ( 2013)), but the key details in all those definitions are the same.

Useful Dependency Metrics
In this suubsection, several different dependency metrics that account for various types of dependence are reviewed.When possible, the type of statistical dependence that they account for is also given.For further detail on the first three metrics and their relation to copulas, see Chapter 5 of Nelsen (2006).

Pearson's Correlation
Pearson's correlation is a measure of the linear dependence between two random variables X and Y.We can recall from basic statistics that Pearson's correlation is a moment based measure, defined in the theoretical case as .
The existence of the second moments is a necessary condition for the existence of the correlation metrics.However, for all finite samples, the empirical correlation is well defined.1

Kendall's Tau
Similar to the preceding metric, Kendall's tau is another basic measure of dependence between two random variables.Defined simply, it measures the association between "large" values of a random variable X and "large" values of a random variables Y -that is the linear association between the ranks of the variables.Thus, in the sample version, it is defined as , where c is the number of concordant pairs in the sample, ie those such that x i < x j and y i < y j , and d is the number of discordant pairs in the sample, which is simply the number of pairs in the sample that are not concordant.
In the case when the copula function, C (u, v), between the random variables is known, then an explicit population version can be derived:

Spearman's Rho
Spearman's rho rank correlation coefficient is a third basic measure of linear dependence between two random variables.Similar to Kendall's tau, it operates over the ranks of the random variables, but is defined like the standard Pearson's correlation coefficient, just over the ranked (ordered) random variables, instead of the random variables themselves.From a random sample, it can be defined as S ρ = Cov(rank(x),rank(y)) , where σ rank(x) , and σ rank(y) , are standard deviations of the rank variables/ The second equality is valid only if all ranks are distinct integers.
Similar to Kendall's tau, if the copula function describing two random variables is known, then we have the following expression for the Spearman's rho correlation coefficient : This expression gives rise to bounds on the following relation between Spearman's rho and Kendall's tau, (see also Nelsen (2006))

Szekely's Energy Correlation
Szekely's distance correlation is an energy-based measure of correlation that tests the for independence of the characteristic functions of two random variables (see Szekely, Rizzo and Bakirov (2007)).Simply put, it is a hypothesis test against the null hypothesis that the joint characteristic function and the product of the two characteristic functions are equal to each other.Under the assumption that both distributions on X and Y have finite first moments, the distance correlation between two random variables (or potentially vectors, X and Y) is given by the formula, where dCov(X, Y) is the distance covariance and can be represented in terms of characteristic functions as where c p = c q = π Γ(1) = π if both distributions are one-dimensional.Similarly, dVar(X) = dCov(X, X).One can define a variant of the distance correlation that requires only existence of the moments of fractional order β ∈ (0, 2) This is can be defined explicitly in the multivariate case, however, in the bivariate case, the distance covariance is defined as Now, only the β-th moment needs to exist in order for the distance covariance to be finite.Distance correlation can then be constructed from the distance covariance and variance in exactly the same way as Equation 4.2.Unsurprisingly, this metric can be numerically found with a similar algorithm as the standard distance correlation.One substitution must be made however, in that the βth moment must be substituted for the first moment in the integral.

Schweizer and Wolff's Sigma
Schweizer and Wolff's Sigma is another copula based method of quantifying the dependence.It is define by the formula (see Nelsen (2006)), and geometrically, this can be interpreted as taking the L 1 -norm of the distance between the specified copula function and the independence copula function.Unfortunately, this expression requires explicit knowledge of the copula that describes the data.However, when the copula is not known the method can still be used by an application of the empirical copula function.

Randomized Dependence Coefficient and Tail Dependence
The Randomized Dependence Coefficient (RDC) is a relatively new method for determining the dependence between two random variables, first published in the 2013 paper by Lopez-Paz, Hennig and Schoelkopf.It is described as measuring the "largest canonical correlation between k randomly chosen non-linear projections of their [empirical] copula transformations" .This coefficient comes out of the machine learning literature where it has been successfully used for feature selection in high dimensional random variables.For details see the above cited paper.
The tail dependence of a bivariate distribution on X, Y with distribution functions F, G is a measure of how dependent the tails of the distribution are on each other (see Nelsen (2006)).The lower tail dependence is the limit of the (conditional) probability that Y is less than the 100u-th percentile of G given that X is less than or equal to the 100u-th percentile of F, that is Similarly, the upper tail dependence is the limit of the conditional probabilty that Y is greater than the 100u-th percentile of G given that X is greater than or equal to the 100u-th percentile of F, that is This has a copula based definition, where and Estimates of these coefficients could potentially be evaluated using the empirical copula.Unfortunately, these estimates may not perform well in practice; see Frahm, Junker, and Schmidt (2005) for a nice discussion of the difficulties in estimating the tail dependence from samples of bivariate distributions.

Numerical Experiments Evaluating Dependency Metrics
To test the effectiveness of the above dependency metrics in capturing the type of association in the Linnik copula, we conducted numerical expariments in which 40 samples of increasing sample size (from 2 to 5000) were drawn from a Linnik copula with α = 1.4 and Σ = ( 1 0.4 0.4 1 ) .Then, the dependence metrics described in subsections 5.2.1-5.2.6 were evaluated over these samples.The purpose of the test was twofold; first, to see which metrics (if any) would converge over the increasing sample size, and second, to see what values of different dependence metrics would be produced.The results are shown in Fig. 5.1.
Interestingly, the randomized dependence coefficient RDC showed the highest value of dependence.This may be a property of the definition of this coefficient.Experiments in Lopez-Paz, Henning and Scheolkopf (2013) showed that the RDC gave estimates similar to Pearson's rho when the relationship is Gaussian.Perhaps the RDC acts in an additive manner, increasing the value when the relationship is more than Gaussian.
Pearson's correlation, distance correlation and Spearman's rho all quickly reached values of about 0.33.This may be due to the value of Σ 12 in the experiment, but is currently unknown.Kendall's tau sat around 0.25 for the entire experiment, and it is again unknown what the meaning of this value is.Finally, Schweizer and Wolff's sigma did not converge over the entirety of the experiment for unknown reasons.Overall, RDC seems to outperform the other metrics in this experiments.

Estimation of Parameters for Multivariate Linnik Distributions
We have tested several estimation approaches including the empirical characteristic functions and the method of logarithmic moments which although efficient in the one-dimensional case (see Kozubowski (2001), and Cahoy (2012, 2013)) did not perform well for the multivariate case.The only approach which displayed good convergence to theoretical values of the multivariate parameters involved was the divergence minimization method which is described below.

Multivariate Parameter Estimation Using Divergence Minimization
Approaching parametric estimation as divergence minimization has a long history that is explored in greater detail in Basu, Shioya, and Park (2011).With this method, parametric estimation is done through the numerical minimization of a given divergence between the population data and simulated data (with the given parameters).That is, to produce estimates of the parameters, we only need to solve the following minimization problem, arg min θ d(p(x), q(x; θ)), where p(x) is the true population distribution function and q(x; θ) is the distribution function with parameters θ.In practice, we do not know the population distribution, but have only samples from it.Thus, we must use a discrete approximation of the divergence function d(., .)and samples from the distribution q(x; θ).
For estimation of the parameters of the multivariate Linnik distribution, we can easily get a large sample with given parameters, and as an easily computable divergence function we selected is the Kullback-Leibler (KL) divergence (see, Kullback, Leibler (1951)), defined as where p(x) and q(x) are density functions that may or may not be vector valued (see Joyce (2011)).This divergence function is widely used across most mathematical literature due to its nice properties.In particular, a maximum likelihood estimator will minimize the KL divergence between the distributions, and directly minimizing the KL divergence will produce an estimator identical with the maximum likelihood estimator (see Joyce (2011), andBasu, Shioya, andPark (2011)).A useful numerical implementation of the KL divergence in high dimensions is in the FNN R package2 which uses a k-nearest neighbors approach to compute the divergence in high dimensions (see Boltz, Debreuve and Barlaud (2007)).
Given a dataset p(x), E p(x) ( log p(x) ) will be constant, so minimizing the KL divergence is just maximizing E p(x) ( log q(x) ) .If the distributions p and q are the same, this is just maximizing the log-likelihood of the data, which corresponds to the maximum likelihood estimation.
One popular algorithm for optimization, particularly in a noisy setting, is the simultaneous perturbation stochastic approximation (SPSA) algorithm invented by James Spall (1998).SPSA is an iterative procedure where the parameter value changed via a subtraction of the gradient at each step.However, the gradient is calculated through a simultaneous perturbation of the current theta value, which reduces the number of loss function calculations to two at each step (versus 2p for traditional gradient descent).Ultimately, this vastly reduces the computational time of the algorithm.

Test of Numerical Estimation Procedure
Starting values of the parameters for the SPSA algorithm were chosen as follows: α = 0.8, Σ 12 = 0.4, and σ 2 1 = σ 2 2 = 1.Due to the multi-dimensional optimization, and the repeated KL-divergence calculations, this procedure is not instantaneous, and takes a fairly long time to run.A plot of the time for each iteration of the experiment is shown in Fig. 6.1.The estimated parameter values for each iteration are shown in Fig. 6.2.Finally, the KL distance was recalculated and is shown in Fig. 6.3.With a few exceptions, the time of estimation grows linearly with the sample size, ranging from about 1 second for a sample size of 10 to about 250 seconds for the largest sample size, of 50,000.Overall, the KL divergence minimization gives more accurate results than the MoM estimator, but comes at the price of speed.This is largely due to the fact that estimating the KL divergence requires a numerical integration.In practice, this speed trade-off should not be too terrible, but could potentially be an issue in high dimensions, or when updating the parameters over time is required.

Figure 4
Figure 4.1.Plots of different numerically computed dependency metrics for Linnik copula as functions of the number of samples.

Figure 5 Figure 5
Figure 5.1.Plot of the time needed for each iteration of parameter estimation experiments using divergence minimization.

Figure 5
Figure 5.3.Plot of calculated KL distance for experiments using divergence minimization.Potentially negative values are due to error in numerical calculation.