A Diagnostic Test Based on a 9-Component Mixture Gaussian Copula Model

Here in this paper, we investigate the performance of a diagnostic test based on a mixture Gaussian Copula which incorporates a Markov Chain. Suppose that in the context of an infectious disease, there are three states; Susceptible   S , Infected   I , or Recovered   R . We compare the performance of this approach with the ROC (Receiver Operating Characteristic) Curve which is usually used in diagnostic studies.


Introduction
In biometry especially in epidemiological modeling SIR (Susceptible, Infected, and Recovered) model has been in use for a long time. Each compartment S (Susceptible), I (Infected), and R (Recovered) have their rates such the rate of susceptibility   st, rate of infection   it, and the rate of recovery   rt are modeled by using differential equations. This requires a good understanding of differential equations and partial differential equations. In addition, this approach at times requires iterative numerical methods to obtain the solutions. On the other hand, Copula models do not require the differential equations or the iterative numerical methods. These are probabilistic models and are fairly easy to manage compared to other existing mathematical models such as SIR model. The Copula based methods originated as a result of the pioneering work done by Sklar (1959). The Copulas use the dependence structure and the direction of the association for modeling. There are several copulas and each differ based on their properties. The Copula models fall under one of two major categories; Archimedean Copulas and non-Archimedean family of Copulas. In addition, there are copulas for the discrete type variables and the continuous type variables. These Copulas have applications in Actuarial Science, Biometry, Economics, Finance, Engineering etc. In Actuarial Science, Bowers et al (1997), Cox and Oakes (1984), David and Moeschberger (1978), Carriere (1994) used the Copulas to construct the competing risk models. Marshall and Olkin (1988) used the Copulas to construct the machine frailty models. Zheng and Klein (1995) considered the use of Copula models in the context of survival models. In Epidemiology, especially in the case of malaria, Demongeot et al (2013) used an Archimedean Copula known as the Gumbel Copula to study the interaction among the SIR compartments by using the rate of susceptibility, rate of infection, and the rate of recovery which were based on the Ross-Macdonald model. These are mainly differential equations. They used the Gumbel Copula model to derive the conditional distribution for the interaction from one compartment to another compartment. Moreover, the Copulas have applications in Quantile Regression too. For the literature review on the copulas, the interested readers are referred to Nelson (2006).
Here in this paper, the focus is on studying the suitability of a diagnostic test in the context of an epidemic disease which uses the Copula model. Nanthakumar (2013) used a mixture Gaussian Copula model to study the suitability of diagnostic tests in the context of a two state Markov Chain. Before that, Pundir (2011), Krazanowski and Hand (2009), Gonen (2007, Pepe (2003), Zhou et al (2002), Shultz (1995) and others have investigated the use of Receiver Operating Characteristic Curve (ROC Curve) to study the suitability of diagnostic tests based on single and multiple variables.
This paper extends the earlier results obtained by Nanthakumar (2013) to a three compartment situation. The objective here is to evaluate the diagnostic ability of the pre-treatment measurement   1 V taken at time 1 t  and the post-treatment measurement   2 V taken at time 2 t  . In this regard, we will use the probability that   discussion and conclusion in section 4.

Methodology
We believe that these three compartments (states) follow a transition pattern according to the following transition probability matrix, VV . This is where we need the Copula. Here, we use a mixture Gaussian Copula which captures the transition among the states and at the same time gives a fairly approximate estimate of this probability As noted earlier, let 1 V be the measurement taken at time = 1 (say for example temperature) and 2 V be the measurement taken at time = 2.
This leads to the result that the pre-treatment measurement 1 V taken at time 1 t  follow a mixture normal distribution as indicated below, Similarly, under the assumption that the transition takes place according to the first-order Markov Chain described earlier, the post-treatment measurement 2 V taken at time 2 t  follow a different mixture normal distribution As we can see, the marginal distributions are each three component mixture of normal distributions. It appears reasonable to model the joint distribution of 12 , VVas a nine-component mixture of bivariate Gaussian Copulas.

Mixture of Bivariate Gaussian Copulas
Here, we define the nine-component mixture Gaussian Copulas.
Remark: The Copulas are supposed to yield the marginal distributions when the data is collapsed. Therefore, collapsing the data and then equating the marginal distributions yield the following equations. VV  is an indication of improvement as a result of this treatment in this diagnostic study. So, the interest is in computing the probability   21 P V V  .
Note that based on this copula density,

Numerical Example
As we know there are infectious diseases like Cholera, Malaria, SARS, COVID-19 that have the potential to affect the human population from time to time. There are mainly three possible states when it comes to these infectious diseases; susceptible ( S ), infected ( I ), and recovered ( R ). Suppose that an infectious disease is prevalent in a region and due to that people are advised to take a preventive medicine (treatment). In order to ensure the effectiveness of this treatment, two measurements (a pre-treatment measurement 1 V was taken at time 1 t  . Then again, a post-treatment measurement 2 V was taken at time 2 t  from the same individual. Here, we assume that the state to state transition is taking place according to a Markov Chain as described earlier.
Here we assume the estimates of    . This was the case in the previous study conducted by this author in the context of two component mixture.

ROC Curve
We can study the effectiveness of the diagnostic by using the ROC curve too. For this, we will use the area under the ROC (AUROC) for the evaluation. In this analysis, we will draw the ROC curve ( y versus x ) and evaluate the area under the curve. The area under the curve is a measure of the performance of the diagnosis test. As you can see, the area under the ROC curve is about 0.5 and it is in agreement with the estimate given by the mixture-Gaussian Copula model.

Conclusion and Discussion
The purpose of this study was to see whether the mixture-Gaussian Copula model can be used in evaluating the diagnostic ability of the pre-treatment and post-treatment measurements 1 2 , VV when the joint distribution of these measurements is unknown. Here, we use a 9-component mixture Gaussian Copula to model the joint distribution of 1 2 , VV . As seen from this study, this mixture-Gaussian Copula model is doing fairly well in evaluating the diagnostic ability of the pre-treatment and post-treatment measurements. This is supported by the empirical as well as ROC-curve based estimates of   2 1 P V V  which measures the diagnostic ability of this test.