Between 2007 and 2009, researchers collected data on penguins in three islands in the Palmer Archipelago in Antarctica: Biscoe, Dream, and Torgersen. The penguins
dataset has data for 342 penguins from 3 different species: Chinstrap, Gentoo, and Adélie. It includes the following variables:
species
: The penguin’s species (Chinstrap, Gentoo, and Adélie)island
: The island where the penguin lives (Biscoe, Dream, and Torgersen)bill_length_mm
: The length of the penguin’s bill, in millimeters (distance from the penguin’s face to the tip of the bill)bill_depth_mm
: The depth of the penguin’s bill, in millimeters (height of the bill; distance from the bottom of the bill to the top of the bill)flipper_length_mm
: The length of the penguin’s flippers, in millimetersbody_mass_g
: The weight of the penguin, in gramssex
: The sex of the penguinyear
: The year the observation was madeKnowing the difference between bill length and bill depth is tricky if you’re not a bird expert (I’m not!), so here’s a helpful diagram:
Penguin bill length vs. bill depth
We first need to clean the data a little. Some of the observations are missing the sex of the penguin!
Missing data will mess up our regression models, so we remove any rows with missing sex. That also fixes the issues we had with the other missing variables, since those rows were missing the sex.
We’ll save this clean data as a CSV file so we can use it in other analysis (or other files within this analysis).
First we’ll look for any patterns in the data. Maybe specific species are heavier or have longer wings or longer or taller bills?
Penguin weight by species
It looks like Gentoo penguins are heavier on average than the other two species, and substantially so. Gentoo penguins weigh an average of 5,092 grams, while Adelie and Chinstrap penguins weigh an average of 3,706 and 3,733, respectively.
Next we’ll look at bill depth (again, this refers to the distance between the top and bottom of the bill) across species:
Penguin bill depth by species
Again, Gentoo penguins are quite distinctive and have the shortest bills. On average, Gentoo bills are 15 millimeters deep, while Adelie and Chinstrap penguins have bills that are 18.3 and 18.4, respectively.
Are there any patterns in where these birds live?
Penguin location by species
Neat! Gentoo penguins are only on Biscoe Island, Chinstrap penguins are only on Dream Island, and Adelie penguins live on all three of the islands in the dataset—and they’re all alone on Torgersen Island.
We’ve seen that Gentoo penguins are pretty distinctive and are both heavier and have shorter bill depths. What’s the overall relationship between bill length and bird weight? Are Gentoos still distinctive?
According to this plot, it looks like there’s a negative relationship between bill depth and body mass—as bills get taller, penguins get lighter. We can create a regression model to see the exact relationship. We’ll use this model:
\[ \widehat{\text{Body mass}} = \beta_0 + \beta_1 \text{Bill depth} + \epsilon \]
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 7519.9808 | 342.32506 | 21.967368 | 0 |
bill_depth_mm | -193.0061 | 19.81378 | -9.741003 | 0 |
Based on this model, a 1-mm increase in bill depth is associated with a -193 gram decrease in body weight, on average.
However, that’s wrong! The coefficient for \(\beta_1\) is negative here, but we’re not accounting for species. If we look at the original scatterplot, the trend line does go down, but if we color the points by species, we can see that the relationship is actually positive within species.
The dark red line shows the trend when not considering species, while the yellow, purple, and blue lines show the within-species trends. The directions reverse! This is a great example of something called Simpson’s Paradox, which according to Wikipedia means that
…a trend appears in several groups of data but disappears or reverses when the groups are combined.
If we control for species in a new regression model, we can see the positive relationship between bill depth and body mass. Here’s the new model:
\[ \widehat{\text{Body mass}} = \beta_0 + \beta_1 \text{Bill depth} + \beta_2 \text{Species} + \epsilon \]
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -1000.845936 | 324.96796 | -3.0798295 | 0.0022457 |
bill_depth_mm | 256.551128 | 17.63746 | 14.5458078 | 0.0000000 |
speciesChinstrap | 8.111481 | 52.87375 | 0.1534122 | 0.8781672 |
speciesGentoo | 2245.878347 | 73.95557 | 30.3679398 | 0.0000000 |
It worked! After controlling for species, on average, a 1-mm increase in bill depth is associated with a 256.6 gram increase in weight. Also, interestingly, the coefficients for Chinstrap and Gentoo penguins show the trends across these species’ weights. Compared to Adelie penguins, Chinstrap penguins are only 8.1 grams heavier, while Gentoo penguins are 2,246 grams heavier than Adelie penguins, on average.
Therefore, Gentoo penguins are neat. They
Also, there seems to be a fairly strong relationship between bill depth and body weight. Within all three species, penguins with taller bills tend to be heavier. This relationship can get hidden by Simpson’s Paradox if we don’t look at within-species trends though.
The end.
Penguins!