- An exploration of suicide rates and how they vary across demographic cohorts. Linear regression is used to model the relationship between suicide rates and per capita GDP, and a small positive effect is found.
- A brief illustration of the problem of overfitting in neural network classification, showing that dense-er is not always better. The 'Human Activity Recognition' dataset is used, composed of smartphone accelerometer readings from different activities.
- I examine some of the common shrinkage methods employed to combat the problem of overfitting. Specifically, the LASSO, ridge regression, and the elastic-net are detailed. The techniques are motivated by common issues that arise in the estimation of a known real-world parameter.
- The classification technique of logistic regression is introduced, alongside a discussion of revealed preferences. This is done using a dataset on speed dating, generated experimentally as part of a paper by two professors at Columbia University.
- Two typical NLP techniques are explored in relation to the problem of topic modelling. These are applied to the 'A Million News Headlines' dataset, which is a corpus of over one million news article headlines published by the ABC.