# A new approach to compositional data analysis

For many years methods for analysing compositional data have been dominated by Dirichlet distribution regression and Aitchison’s log-odds transformation method. These approaches have several limitations. The new “probability-ratio” approach presented here overcomes some of these limitations and permits any distribution whose support is (0,1) to be applied to the analysis of compositional data. Given compositional data \(\pi_k\), where \(\sum\limits_{k = 1}^K {{\pi _k} = 1}\), we generate \(K - 1\) \({\nu_k} = W\left( {{\pi_j},j = 1, \ldots ,K} \right)\), such that \(0 < \nu_k < 1\) and they are not sum-constrained. The \(\nu_k\) may then be modelled via copulas whose marginal distributions include any distribution whose support is (0, 1), such as the beta or the CDF-Quantile family.

A typical example of suitable \(\nu_k\) is equivalent to Aitchison’s (1986) “additive log-ratio” transformation: \( \nu_k = \pi_k / \left( \pi_k + \pi_K \right) \), for \(k = 1, \ldots, K-1\). The probability-ratio method’s strengths are as follows:

- It includes all of the log-ratio transforms and many others such as stick-breaking.
- It expands the variety of distributions for modelling compositional data without having to construct such distributions de novo or add more parameters to a model.
- Via the CDF-Quantile family, all quantiles can be jointly modeled with the same number of parameters as an equivalent log-ratio model of conditional means (unlike conventional quantile regression).
- It models dispersion routinely, which generally is not done in the log-ratio tradition and is restricted in the Dirichlet regression method.
- It is easier to interpret than the log-ratio method.
- Unlike the Dirichlet, the copula model is not limited to negative association parameters.