A multivariate gamma distribution applied to compositional data analysis
Abstract
Parametric compositional data analysis in a high dimensional simplex can be performed by employing the Dirichlet distribution, or alternatively, through the logistic normal distribution if the Dirichlet is not appropriate. In this paper, a multivariate gamma (MGAM) distribution is proposed as an alternative distribution for compositional data. In addition, the MGAM distribution is extended to a multivariate extreme value (MEV) distribution and goodness of fit statistics are calculated for comparison against the logistic normal distribution. An application is considered where the amount of gas produced from a coal gasication facility depends crucially on the size distribution of the coal, which is measured as compositional data and characterised by six variables. The observed sample space is divided into three regions of high (H), standard (S) and low (L) gas production by choosing appropriate thresholds, and new observations are classified among the regions.