Australasian Mathematical Psychology Conference 2019

# A new approach to compositional data analysis

Michael Smithson
Psychology, The Australian National University

For many years methods for analysing compositional data have been dominated by Dirichlet distribution regression and Aitchison’s log-odds transformation method. These approaches have several limitations. The new “probability-ratio” approach presented here overcomes some of these limitations and permits any distribution whose support is (0,1) to be applied to the analysis of compositional data. Given compositional data $$\pi_k$$, where $$\sum\limits_{k = 1}^K {{\pi _k} = 1}$$, we generate $$K - 1$$ $${\nu_k} = W\left( {{\pi_j},j = 1, \ldots ,K} \right)$$, such that $$0 < \nu_k < 1$$ and they are not sum-constrained. The $$\nu_k$$ may then be modelled via copulas whose marginal distributions include any distribution whose support is (0, 1), such as the beta or the CDF-Quantile family.

A typical example of suitable $$\nu_k$$ is equivalent to Aitchison’s (1986) “additive log-ratio” transformation: $$\nu_k = \pi_k / \left( \pi_k + \pi_K \right)$$, for $$k = 1, \ldots, K-1$$. The probability-ratio method’s strengths are as follows:

• It includes all of the log-ratio transforms and many others such as stick-breaking.
• It expands the variety of distributions for modelling compositional data without having to construct such distributions de novo or add more parameters to a model.
• Via the CDF-Quantile family, all quantiles can be jointly modeled with the same number of parameters as an equivalent log-ratio model of conditional means (unlike conventional quantile regression).
• It models dispersion routinely, which generally is not done in the log-ratio tradition and is restricted in the Dirichlet regression method.
• It is easier to interpret than the log-ratio method.
• Unlike the Dirichlet, the copula model is not limited to negative association parameters.