Naming Uncertainty: Ranks and Collisions

This page contains additional material for the article

Höhle, M. (2017), Rank uncertainty: Why the “most popular” baby names might not be the most popular. Significance, 14: 30–33. doi:10.1111/j.1740-9713.2017.01037.x
 

Pre-Print Version of the Article

Data

Software and Computations


Background 


Bonus Material

Instead of by tables the frequencies of the 2015 baby names can also be visualized using word clouds. The size of the name is proportional to the number of births in 2015.

Word clouds with text size proportional
        to the number of girls and boys born.

Based on the ONS Baby Names Statistics data available for the years 1996-2016 containing all names in a given year occurring more than twice, we can compute the yearly collision probability. Note: The computed probability will be an upper limit to the actual collision probability, because no statistics is available for names occurring once or twice. However, for 2015 this information was obtained from the ONS (see above), upon request the ONS in 2016 incorporated this information directly into their bulletin. As a consequence, we can compute the correct collision probability for 2015-2016 as done in the article. The obtained time series plot can be compared to similar plots for the US 1880-2014 as well as Sweden 1998-2016.

Time series of the collision
        probability.
Figure: Collision probability for the cohorts born 1996-2016 based on the baby names datasets containing all names with more than two uses in a given year. The orangish line in the figure indicates the collision probability for 2015-2016 computed on all names, i.e. also those including those names occurring just once or twice in a given year.