Multivariate Relationships Between Physiologic and Anthropometric Variables : A Data Based Analysis

To establish the relationship between two sets of variables measured on the same subject, canonical correlation analysis (CCA) is the most appropriate and popular method. In this study we consider two sets of variables which consist of different types of measurements. Here one set has three physiologic variables whereas the other set has eighteen anthropometric variables (mentioned in section 3.1 with abbreviations). The aim of this study is to evaluate the relationship between two sets and to find out the factors which influence the relationship between the two sets. This study has revealed that first two canonical correlations were significant and WT, APC, TVC, CCN, MUAC and WC (anthropometric variables) are the risk factors for SBP and DBP (physiologic variables). Furthermore considering these risk factors, General Linear Model (GLM) indicated that CCN and WC are highly significant factors which influence the physiologic set. Thus the model (CCA+GLM) provide the most important factors which influence physiologic variables.


Introduction
Blood pressure is an important risk factor in cardiovascular and renal diseases.It is known that the increase in blood pressure is a result of increased cardiac risk (Stamler et al.,1993;Vasan et al., 2001).High blood pressure is one of the causes of sudden death and is considered as a common problem all over the world.Several anthropometric factors are related to the blood pressure level (Cassani et al., 2009).Among them genetic factors, low physical activity, dietary habits, environment and socio-economic conditions are identified as most influencing factors (Arkwright et al., 1982;Melby et al., 1991;Mitchell et al., 1996;Bhat et al., 2013;Yu et al., 2000).Many researchers have established the significant relationship between lipid profiles and blood pressure (Akhtar et al., 2006;Kannel, 1985).Some researchers have pointed out that stress and strain are important factors for increasing the blood pressure (Cesana et al., 1996).Furthermore, high blood pressure causes some diseases like anaemia, diabetes etc (Badaruddoza & Kumar, 2009;Kaur & Kochar, 2010).Some researchers have analyzed a few cardiovascular diseases with fish oil and omega-3 fatty acid which control high blood pressure (Kris-Etherton et al., 2002).Apart from these, anthropometric variables including adiposity, body mass index, obesity and abdominal obesity, waist circumference, waist hip ratio and few skin fold thicknesses are highly associated with blood pressure (Bose et al., 2003;Dalton et al., 2003;Feldstein et al., 2005;Gus et al., 2004;Gustat et al., 2000;Haslam & James, 2005;James, 2008;The obesity in Asia Collaboration, 2008;Kannel et al.,1967;Kim et al., 2006;Kopelman, 2000;Lee et al., 2008;Zhao et al., 2000.Another result shows the association of body mass index and blood pressure depending on age and gender (Benetou et al., 2004) as well as waist circumference and body mass index as predictor of hypertension (Peixoto et al., 2006).
From the existing literature, it turns out that most of the papers have been analysed using univariate methods and got the findings.However, few studies are available where multivariate analysis were used to explore the effect of different sets of anthropometric variables (such as body circumferences, skin fold thickness, body fatness variables etc.) on cardiovascular factors (Mueller et al., 1991;Sangi et al., 1992).Several risk factors (anthropometric variables) have been detected which are related with blood pressure (physiologic variables), but the concerned factors have not always been significant for different populations because of varied objective, parameters, location, physical activity and environment.In India there have been many studies on populations in different locations which show the significant relationship between blood pressure and anthropometric measurements.In our knowledge, till now we have not got any evidence which establishes a composite effect of anthropometric set on physiologic set and vice versa.Hence the method of canonical correlation analysis (CCA) is considered as an appropriate statistical tool for finding out the effect of one set over the other and vice versa.
Data were collected from fishing community in coastal area where both men and women catch fish and consume sea fish regularly as a major food item.Among the study community it was observed that only 20% subjects had blood pressure above normal.Further we examined the association between combined variables of anthropometry set to physiologic set.Methodology of data collection is mentioned in 2.1 -2.5.

Reasons for Choosing Canonical Correlation Analyses
As the combined effect of one set of variables on other set and vice versa cannot be explained effectively by either GLM or stepwise multiple regression technique.That's why, we have adopted canonical correlation analysis to explain this relationship.Also the variables of each set are significantly correlated with themselves.

Objectives of This Study
(i) To find the influencing factor (i.e the significant risk factors from the set of anthropometric variables) for blood pressure and establish the relationship between the physiologic set and anthropometric set.
(ii) To describe the relationships between the variables in the first set with the variables in the second set.
(iii) How many dimensions (canonical variates) are necessary to understand the association between the two sets of variables?

Study Location
Data for the present study were collected on fishing community from nine coastal villages of West Bengal and Orissa, namely, Dattapur, Khadalgobra, Bilamudia, Dahadaya, Gadadharpur, Gangadharpur, Jatimati and Podima of East Midnapur district, West Bengal; and Udaypur of Balasore district, Orissa, respectively.These nine villages are situated within 5 Kilometers distance from Digha, a tourist spot of East Midnapur, West Bengal.

Study Population
The villages were selected depending on cultural homogeneity with respect to occupation, socio-economic condition and environment.Thus, the main occupation of the villagers was fishing.For this study a total of 719 subjects from fishing community of both sexes were selected from nine contiguous villages with age ranging from 18 to 77 years.Out of total subjects 347 were male and 372 were female respondents.Given the purpose of the study, a complete enumeration of the villages consisting of 537 fishermen households was done.The vast majority of the study population was illiterate and belonged to lower earning group.Thus, they belonged to low socio-economic class.

Socio-Demographic Variables
Information on age, occupation, income and educational status were obtained from all the subjects with the help of pre-tested questionnaire.Using the questions related to economic conditions of the households, monthly earnings from principal occupation were considered.Per Capita Monthly Income (PCMI) was calculated through dividing monthly income by total number of households.

Anthropometry Variables
Anthropometric measurements were obtained from the adult members (18 year to 77 years) of the fishing community following the standard method (Weiner & Lourie, 1981) by trained investigators.According to this method, the body weight (in Kg.) was taken on a spring weighing machine, asking the subject to stand on it with an erect posture with light apparels.Height was measured along the vertical distance from the floor to the vertex using an anthropometer, taking care that it was kept absolutely vertical.Reading in centimeter (cm) and its fraction was then recorded.Circumferential (mid upper arm, waist and hip circumference, chest and calf) body dimensions were measured with a non-elastic tape.Measurements of Biceps, triceps, supra-iliac, sub-scapular, abdomain and calf skinfolds were taken using Harpenden skinfold caliper following standard techniques as recommended by (Weiner & Lourie, 1981).

Physiologic Variables
After the subject was allowed to rest for ten minutes, measurements of blood pressure were taken with the help of mercury sphygmomanometer in a sitting position with the right forearm placed horizontal on the table.An appropriate size cuff was fitted on the arm of the subject.Then the readings were taken as recommended by the American Heart Association, 1981.The pulse rate was measured for 60 seconds.All measurements were based on three consecutive measurements and an average value was taken to avoid the technical error.

Methods
Let X and Y are two sets of variables.Suppose p variables in set 1: X: (X 1 , X 2 , X 3 , X 4 , ......, X p ) and q variables in set 2: Y: (Y 1 , Y 2 , Y 3 , Y 4 , ......, Y q ) where p ≤ q.Define a set of linear combinations named U and V, where U and V corresponds to the set of linear combinations from X and Y respectively.Each member of U will be paired with a member of V.Such that U 1 is linear combination of the pX variables and V 1 is the linear combination of the qY variables.Similarly U 2 is a linear combination of the pX variables and V 2 is the linear combination of the qY variables and so on.

More generally
The r th pair of canonical variables is the pair of linear combinations U r = (a(r)) T X and V r = (b(r)) T Y, each has unit variance and uncorrelated with the first (r − 1) pairs of canonical variables and having maximum correlation.
The correlation between U i and V i is calculated from the following formula ) is called the first canonical correlation (http://onlinecourses.science.psu.edu/stat505/nod/63).

Results
Table 1 shows the descriptive statistics i.e. minimum, maximum, mean, and standard deviations of each variable of two sets.In this study the average systolic and diastolic blood pressures are found to be (114.28± 14.92) and (75.89 ± 11.41) respectively while average pulse rate was (78.354 ± 9.90).The averages for other variables can be described in a similar way.Among all the variables the variability in physiologic variables are high due to more variability in age.Test of dimensionality is usually done for testing the significance of dimensions.In general, the number of canonical dimensions is equal to the number of variables in the smaller set; however, the number of significant dimensions may be even smaller.Here dimension reduction analysis has been done using Wilks Lambda Test and is shown in Table 4.This test revealed that all the three dimensions are significant (Wilks L = 0.5636, F = 7.37, P = 0), when they are taken together.Also dimensions 2 and 3 (combined) are significant (Wilks L = 0.8341, F = 3.68, P = 0), but the last dimension 3 is not significant (p = 0.153).It shows that there are only two significant dimensions (the first and the second).5 presents the raw canonical coefficients (weights) given to the variables of two sets which maximize the canonical correlations between two sets.The magnitudes of canonical coefficients are used to assess the relative importance of individual variables in a canonical variate.These magnitudes represent the contribution of the individual variables to the corresponding canonical variable.These magnitudes also depend on the variances of the corresponding variates.In anthropometric set the contribution of APC and MUAC to the first canonical variate is relatively more than others.Also the contribution due to BAD, BID and SKS cannot be ignored.In the physiologic set, the weights for SBP and DBP are positive in first dimension while it is negative for PR.In second dimension weights given to DBP and PR are positive but negative for SBP.
In this coefficient table SBP and DBP contribute positive role to create the canonical variate but PR plays an inverse role in first dimension.For example, consider the independent variable WT, a one unit increase in WT leads to a 0.0126 unit decrease in the score on the first dependent canonical variate (V 1 ) when the other variables in the model are held constant.Table 6 shows the correlations between observed variables and their own canonical variates (i.e.canonical loading) for all the three dimensions.In the first dimension physiologic variables SBP and DBP have high canonical loadings, exceeding 0.8; resulting in the high shared variance (0.707), while for PR it is negative one.This indicates a high intercorrelation between the two variables and suggests that both, or either, measures are representative of the effects of physiological variables.In the anthropometric set of dimension 1, canonical loading ranges from 0.0389 to 0.8358.In this table it is observed that six variables WT, APC, TVC, CCN, MUAC and WC have high loading values, exceeding 0.77, which indicates that these variables have definite influence on the canonical variate (U 1 ).Similarly for second dimension canonical loading for SKB, SKT, SKS, SKI, SKA, SKC and FM is more as compared to other variables and so their relative contribution is more to canonical variate (U 2 ) Table 8 shows the redundancy index for independent and dependent canonical variates of first two canonical functions.Basically redundancy index provides a measure of ability of a set of independent variables (taken together) to explain the variation in a set of dependent variables (taken all at a time).From the table it can be seen that the redundancy index for the first dependent canonical variate is more than that for corresponding independent canonical variate.This is due to relatively low shared variance in the independent variables and not due to canonical correlation (R 2 ).Redundancy analysis for second canonical function is rather different from that of first.This may due to low second canonical correlation (0.3721).Moreover both sets have low shared variance (0.0297 and 0.3759 for independent and dependent variables respectively).Though the second canonical function is statistically significant, it is not of practical significance (i.e. the variation explained is so small that it is not of practical importance) as it explains only a small proportion of variation in dependent variable's set and vice versa.Thus from redundancy analysis we found that only first function should be accepted.
Table 9 depicts the results of sensitivity analysis for independent set in which the canonical loadings are examined for their stability after deletion of independent variables one by one from the analysis.After deletion it is observed that the Eigen values, canonical correlations and redundancy are remarkably unchanged and consistent in each of the three cases, where AGE, HT, and BAD are removed respectively from the analysis.Table 10 shows the results obtained from general linear model, considering WT, APC, TVC, CCN, MUAC and WC as independent variables and physiologic set as dependent variables.Result shows that only two variables CCN and WC among the independent set of variables are the most effective variables considering SBP, DBP and PR as physiologic variables in the model.Figure 2 and Figure 3 showed the scatter plots between first and second pairs of canonical variates respectively.

Discussion
Cardiovascular disease is considered as the most significant disease.Many researchers have indicated the role of multiple factors causing cardiovascular diseases.Among them blood pressure and pulse rate are note-worthy.In this study we were trying to find out risk factors among the anthropometric variables influencing the physiologic set, which further cause cardiovascular diseases.These results are based on the subjects of fishermen community comprising of both the sexes.Based on the first two objectives of the study, the result reveals a significant relation between anthropometric set (all the variables taken together) and physiologic set (all the variables taken together) (shown in Table 3).This means that there is a very strong association between two multidimensional sets of variables.Among the physiologic variables there are no doubt that both blood pressures are highly associated between them.In case of pulse rate, there is least relation with systolic blood pressure and diastolic blood pressure.
On the other hand among the anthropometric variables, WT CCN, MUAC, WC, CC and HIPC are consistent and strongly related with each other.FM is highly correlated with WT, CCN, MUAC, WC, HIPC and all skin folds.Also, BAD is highly correlated with HT.All skin fold variables are highly correlated among themselves.Rest of the variables in this set are in moderate relation but few of them give negative relation such as SKB, SKT, and SKC with HT; SKT, SKC and SKB with BAD and SKT and SKA with TVC.Low correlation could be attributed to few variables like AGE, BAD and BID and PR.Furthermore, both blood pressures systolic and diastolic are highly related with all the anthropometric variables while pulse rate is negatively related with most of the variables except BID and all skin folds.However canonical correlation result provides the positive relationship between the two sets.Also the variables WT, APC, TVC, CCN, MUAC and WC are the major contributor to anthropometric set and SBP and SDP are major contributor to physiologic set.Significant association between the two sets of variables indicates that variables of one set (taken together) affect variables of other set (taken together) and vice versa.Thus the role of risk factors comes into play.A risk factor is something that increases the chance of developing a disease, disorder or condition.Thus it is a considerable contrast result for this population as our study population consists of fishermen only.They are residing near sea and consume fish regularly.Here we have considered anthropometric variables as Potential risk factors for physiological variables and vise versa.In first the screening (using bivariate correlation) we found that they are in significant association with the physiologic variables.Using four consecutive steps: -canonical weight, canonical loading, cross canonical loading and redundancy analysis we found that anthropometric variables WT, APC, TVC, CCN, MUAC and WC have the maximum association with the physiologic variables SDP and SBP.Consequently confirmation of this result is done by applying sensitivity analysis and observed that similar results produced from this analysis.It is noted from dimension analysis that first two dimensions are very important to establish the relationships between two sets.Also from redundancy analysis we found that only first dimension was practically significant.
Later general linear model were fitted considering the most important anthropometric variables (WT, APC, TVC, CCN, MUAC and WC) as independent variables and physiologic set as dependent variables and from the result we found that CCN and WC (only two) variables out of six are significant.
Furthermore, multiple stepwise regression models were fitted taking SBP, DBP and PR as dependent variables with all anthropometric variables as independent variables.Result showed that the predictor's AGE, CCN, and APC were significant in the model when SBP was taken as dependent variable.When DBP was considered as dependent, predictor's AGE, CCN, WC and SKS were important and when pulse rate was considered as dependent variable the variables HT, BID, MUAC, SKC and FM are suitable in the model.
Apart from this, another interesting result is observed in the case of hypertensive subjects (SBP > 120 mmHg and DBP > 80 mmHg).From the results of analysis (not shown here) we found that the predictors AGE, APC and CCN are significant when SBP is taken as dependent; WC and MUAC are significant for DBP and predictor AGE, HT, MUAC, WC and FM are significant when pulse rate is dependent (the results of this paragraph are for hypertensive subjects).

Limitation
Some limitations of the study may be noted.For example ages of few respondents were noted without birth records.
Similarly behaviour of blood pressure may differ for different sexes which may not have been considered separately.
Finally, it was a cross sectional study which often contains errors, inconsistencies in response or measurements, outlier etc.Although the investigators are aware of such type of problems and give their best effort to eliminate the problems.

Conclusion
Based on the canonical correlation analysis (a kind of multivariate analysis) on the available data we infer that these two sets are closely related with each other.It is to be noted that only six anthropometric variables, out of nineteen variables provide large contribution to this relationship.From general linear model we found that the variables CCN and WC were the most significant factors which influence the physiologic variables.Combining the two results of both analyses we can say that CCN and WC were the major influencing factors for blood pressure.But from multiple stepwise regression analysis we get different risk factors.Thus this model (CCA+GLM) would be appropriate for other population.
Also in hypertensive subjects the influencing factors were found different.So the normal values of the anthropometric indicators especially WT, APC, TVC, CCN, MUAC and WC may help the population in controlling blood pressure and further cardiovascular diseases.
For generalization we need to study different populations from different locations with respect to various socioeconomic and behavioural concomitants.

Figure 1 .
Figure 1.Relationship among variables with their canonical variables and first canonical correlation

Table 1 .
Descriptive statistics among the variables of two sets

Table 2
Outof the three canonical correlations only two are significant as tested by Wilks Lambda Test (discussed below).The first canonical correlation is very important as 71.25% of the maximum variance is shared in the first canonical function whereas the second function deals with the residual variance left from the first.On the other hand the first canonical eigenvalue is 0.4799 which reflects the proportion of variance explained by first canonical correlation.Also the second and third eigen canonical values are 0.1607 and 0.0329 respectively which reveals the proportion of variation by the second and third canonical correlations respectively.It is interesting to note that the first two canonical correlations explain 95.11% of variance.
provides the bivariate correlations among all the variables in the two sets.Most of them are positively correlated with high significance (p < 0.001) and few variables are negatively correlated and are also significant (p < 0.05).Out of twenty-two variables only nine are significantly associated with age (p < 0.05).Except age, all the variables are significantly correlated with WT.The variables CCN, MUAC, HIPC and CC are strongly associated (r > 0.80) with it.Also APC, TVC, CCN, MUAC, WC, HIPC and CC are highly correlated with themselves.All skinfold variables are positively associated among themselves.From the table we can see that eleven variables are negatively correlated with PR but eight variables are not significantly related with it.Systolic and Diastolic blood pressures are significantly (p < 0.01) correlated with all the variables in the two sets.Table2.Correlation coefficient (r) among the variables of two sets Table3contains information about the eigenvalues, percentage of variance, canonical correlation and its square.Square of canonical correlation reveals that 32.43% of the variation in V 1 is explained by the variation in U 1 and 13.85% of the variation in V 2 is explained by U 2 whereas only 3.19% of the variation in V 3 is explained by U 3 .

Table 4 .
Dimension reduction analysis (multivariate test of significance)

Table 5 .
Raw canonical coefficient of Set-1 and Set-2

Table 6 .
Correlation between observed variables and their own canonical variables (canonical loading)

Table 7
The six highest cross loadings of the first independent canonical variate correspond to the variables with highest loadings as well.Thus all relationships are direct except the one.Examining all the results (loading and cross loading) we can infer that the independent variables WT, APC, TVC, CCN, MUAC and WC have the maximum association with the dependent variables SBP and DBP and vice versa.
contains the correlation coefficient (cross loading) values between the observed variable of one set with the canonical variate of other set.In dimension 1 the dependent variables SBP and DBP have high association with the canonical variate of opposite set.On the other side WT, APC, TVC, CCN, MUAC and WC exhibit high correlation with the other set (physiologic set) in the same dimension, thus having similar results compared to loading.Similarly in second dimension we can see that DBP and PR are more correlated with independent canonical variates and SKB, SKT, SKS, SKI, SKA, SKC and FM are comparatively more correlated with dependent canonical variate.These results reflect that there exists high shared variance between the two sets.In first dimension all variables have positive coefficients (except PR) i.e., a direct relationship.

Table 7 .
Correlation between observed variables with the other canonical variables (canonical cross loading)

Table 8 .
Redundancy analysis for independent and dependent canonical variates in first and second dimensions

Table 9 .
Sensitivity analysis

Table 10 .
General linear model