## Abstract:

We follow up on last weeks post on using Gapminder data to study the world’s income distribution. In order to assess the inequality of the distribution we compute the Gini coefficient for the world’s income distribution by Monte Carlo approximation and visualize the result as a time series. Furthermore, we animate the association between Gini coefficient and homicide rate per country using the new version of gganimate.

## Introduction

One of the main messages of the Chapter ‘The Gap Instinct’ of the book Factfulness is that there is no justification of the ‘we’ and ‘them’ classification of countries anymore, because ‘they’ have converged towards the same levels in key indicators such as life expectancy, child mortality, births per female. The difference between countries is, hence, not as many imagine it to be: there is less inequality and no real gap. While reading, I became curious about the following: what if countries became more equal, but simultaneously inequality within countries became bigger? This was also indirectly a Disqus comment by F. Weidemann to the post Factfulness: Building Gapminder Income Mountains. Aim of the present post is to investigate this hypothesis using the Gapminder data by calculating Gini coefficients. Furthermore, we use the country specific Gini coefficients to investigate the association with the number of homicides in the country.

## Gini coefficient

There are different ways to measure income inequality, both in terms of which response you consider and which statistical summary you compute for it. Not going into the details of these discussion we use the GDP/capita in Purchasing Power Parities (PPP) measured in so called international dollars (fixed prices 2011). In other words, comparison between years and countries are possible, because the response is adjusted for inflation and differences in price of living.

The Gini coefficient is a statistical measure to quantify inequality. In what follows we shall focus on computing the Gini coefficient for a continuous probability distribution with a known probability density function. Let the probability density function of the non-negative continuous income distribution be defined by $$f$$, then the Gini coefficient is given as half the relative mean difference:

$G = \frac{1}{2\mu}\int_0^\infty \int_0^\infty |x-y| \> f(x) \> f(y) \> dx\> dy, \quad\text{where}\quad \mu = \int_{0}^\infty x\cdot f(x) dx.$

Depending on $$f$$ it might be possible to solve these integrals analytically, however, a straightforward computational approach is to use Monte Carlo sampling - as we shall see shortly. Personally, I find the above relative mean difference presentation of the Gini index much more intuitive than the area argument using the Lorenz curve. From the eqution it also becomes clear that the Gini coefficient is invariant to multiplicative changes in the income: if everybody increases their income by factor $$k>0$$ then the Gini coefficient remains the same, because $$|k x - k y| = k | x - y|$$ and $$E(k \cdot X) = k \mu$$ and, hence, $$k$$ cancels from numerator and denominator.

The above formula indirectly also states how to compute the Gini coefficient for a discrete sample of size $$n$$ and with incomes $$x_1,\ldots, x_n$$: $G = \frac{\sum_{i=1}^n \sum_{j=1}^n |x_i - x_j| \frac{1}{n} \frac{1}{n}}{2 \sum_{i=1}^n x_i \frac{1}{n}} = \frac{\sum_{i=1}^n \sum_{j=1}^n |x_i - x_j|}{2 n \sum_{j=1}^n x_j}.$

#### Approximating the integral by Monte Carlo

If one is able to easily sample from $$f$$ then can instead of solving the integral analytically use $$k$$ pairs $$(x,y)$$ both drawn at random from $$f$$ to approximate the double integral:

$G \approx \frac{1}{2\mu K} \sum_{k=1}^K |x_k - y_k|, \quad\text{where}\quad x_k \stackrel{\text{iid}}{\sim} f \text{ and } y_k \stackrel{\text{iid}}{\sim} f,$ where for our mixture model $\mu = \sum_{i=1}^{192} w_i \> E(X_i) = \sum_{i=1}^{192} w_i \exp\left(\mu_i + \frac{1}{2}\sigma_i^2\right).$ This allows us to compute $$G$$ even in case of a complex $$f$$ such as the log-normal mixture distribution. As always, the larger $$K$$ is, the better the Monte Carlo approximation is.

##Precision of Monte Carlo approx is controlled by the number of samples
K <- 1e6

##Compute Gini index of world income per year
gini_year <- gm %>% group_by(year) %>% do({
x <- rmix(K, meanlog=.$meanlog, sdlog= .$sdlog, w=.$w) y <- rmix(K, meanlog=.$meanlog, sdlog= .$sdlog, w=.$w)
int <- mean( abs(x-y) )

## Discussion

Based on the available Gapminder data we showed that in the last 25 years the Gini coefficient for the world’s income distribution has decreased. For several individual countries opposite dynamics are, however, observed. One particular concern is the share that the richest 1% have of the overall wealth: more than 50%.

## Literature

Fajnzylber, P., D. Lederman, and N. Loayza. 2002. “Inequality and Violent Crime.” Journal of Law and Economics 45 (April). http://siteresources.worldbank.org/DEC/Resources/Crime&Inequality.pdf.