Randomization and matching

# Randomization<br>and matching

**Session 7**

]

---

# Plan for today

.box-4.medium.sp-after-half[The magic of randomization]

.box-5.medium.sp-after-half[How to analyze RCTs]

.box-6.medium.sp-after-half[The "gold" standard]

.box-1.medium.sp-after-half[Adjustment with matching]

---

name: magic-randomization
class: center middle section-title section-title-4 animated fadeIn

# The magic<br>of randomization

---

---

# Why randomize?

.box-4.large[Fundamental problem<br>of causal inference]

$$
\delta_i = Y_i^1 - Y_i^0 \quad \text{in real life is} \quad \delta_i = Y_i^1 - ???
$$

---

# Why randomize?

.box-inv-4.medium[Comparing average outcomes only works<br>if groups that received/didn't receive<br>treatment look the same]

---

# Why randomize?

.box-inv-4[With big enough samples, the magic of randomization<br>helps make comparison groups comparable]

.center[
<figure>
  <img src="img/07/wb-4-1.png" alt="Figure 4.1 from WB book" title="Figure 4.1 from WB book" width="80%">
</figure>
]

---

# RCTs and DAGs

$$
E[\text{Malaria infection rate}\ |\ do(\text{Mosquito net})]
$$

.pull-left[
.box-4.smaller[Observational DAG]
<img src="07-slides_files/figure-html/observational-dag-1.png" width="90%" style="display: block; margin: auto;" />
]

.pull-right[
.box-4.smaller[Experimental DAG]
<img src="07-slides_files/figure-html/experimental-dag-1.png" width="90%" style="display: block; margin: auto;" />
]

.box-inv-4.small[When you *do*() X, delete all arrows into X; **confounders don't influence treatment!**]

---

# How to randomize?

.center[
<figure>
  <img src="img/07/wb-4-4.png" alt="Figure 4.4 from WB book" title="Figure 4.4 from WB book" width="75%">
</figure>
]

---

# Random assignment

.box-inv-4.medium[Coins]

.box-inv-4.medium[Dice]

.box-inv-4.medium[Unbiased lottery]

.box-inv-4.medium[Random numbers + threshold]

.box-inv-4.medium[Atmospheric noise]

.box-4.tiny[random.org]

---

# How big of a sample?

.center.small[
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Person </th>
   <th style="text-align:left;"> Group </th>
   <th style="text-align:center;"> Before </th>
   <th style="text-align:center;"> After </th>
   <th style="text-align:center;"> Difference </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> 295 </td>
   <td style="text-align:left;"> Control </td>
   <td style="text-align:center;"> 122.09 </td>
   <td style="text-align:center;"> 229.04 </td>
   <td style="text-align:center;"> 106.95 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 126 </td>
   <td style="text-align:left;"> Treatment </td>
   <td style="text-align:center;"> 205.60 </td>
   <td style="text-align:center;"> 199.84 </td>
   <td style="text-align:center;"> -5.76 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 400 </td>
   <td style="text-align:left;"> Control </td>
   <td style="text-align:center;"> 133.25 </td>
   <td style="text-align:center;"> 130.40 </td>
   <td style="text-align:center;"> -2.85 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 94 </td>
   <td style="text-align:left;"> Treatment </td>
   <td style="text-align:center;"> 270.11 </td>
   <td style="text-align:center;"> 206.56 </td>
   <td style="text-align:center;"> -63.54 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 250 </td>
   <td style="text-align:left;"> Control </td>
   <td style="text-align:center;"> 344.37 </td>
   <td style="text-align:center;"> 222.89 </td>
   <td style="text-align:center;"> -121.49 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 59 </td>
   <td style="text-align:left;"> Treatment </td>
   <td style="text-align:center;"> 312.41 </td>
   <td style="text-align:center;"> 268.06 </td>
   <td style="text-align:center;"> -44.35 </td>
  </tr>
</tbody>
</table>
]

---

# Power

]

]

---

# What's the right sample size?

.center[
<figure>
  <img src="img/06/power-search.png" alt="Google power calculator" title="Google power calculator" width="50%">
</figure>
]

---

layout: false
name: rct-how
class: center middle section-title section-title-5 animated fadeIn

# How to analyze RCTs

---

---

# How to analyze RCTs

.box-inv-5.sp-after[Surprisingly easy, statistically!]

---

# Example RCT

```r
imaginary_program
```

```
## # A tibble: 800 × 6
##    person treatment   age sex    income_after male_num
##     <int> <chr>     <dbl> <chr>         <dbl>    <dbl>
##  1    498 Control      45 Female         179.        0
##  2    308 Treatment    37 Male           247.        1
##  3    677 Control      35 Female         369.        0
##  4     31 Treatment    39 Female         203.        0
##  5    543 Control      36 Female         190.        0
##  6    434 Control      30 Female         278.        0
##  7    234 Treatment    28 Male           356.        1
##  8    272 Treatment    45 Male           260.        1
##  9    523 Control      49 Female         174.        0
## 10    649 Control      49 Male           224.        1
## # … with 790 more rows
## # ℹ Use `print(n = ...)` to see more rows
```
]

---

# 1. Check balance

```r
imaginary_program %>% 
  group_by(treatment) %>% 
  summarize(avg_age = mean(age),
            prop_male = mean(sex == "Male"))
```

```
## # A tibble: 2 × 3
##   treatment avg_age prop_male
##   <chr>       <dbl>     <dbl>
## 1 Control      35.1     0.562
## 2 Treatment    35.1     0.512
```

---

# 1. Check balance

```r
ggplot(imaginary_program, 
       aes(x = treatment, y = age, 
           color = treatment)) +
  stat_summary(geom = "pointrange", 
               fun.data = "mean_se", 
               fun.args = list(mult=1.96)) +
  guides(color = FALSE) +
  labs(x = NULL, y = "Age")
```
]

---

# 1. Check balance

```r
ggplot(imaginary_program, 
       aes(x = treatment, y = male_num, 
           color = treatment)) +
  stat_summary(geom = "pointrange", 
               fun.data = "mean_se", 
               fun.args = list(mult=1.96)) +
  guides(color = FALSE) +
  labs(x = NULL, y = "Proportion male")
```
]

---

# 2. Calculate difference

```r
imaginary_program %>% 
  group_by(treatment) %>% 
  summarize(avg_outcome = mean(income_after))
```

```
## # A tibble: 2 × 2
##   treatment avg_outcome
##   <chr>           <dbl>
## 1 Control          205.
## 2 Treatment        251.
```

```r
251 - 205
```

```
## [1] 46
```
]

```r
rct_model <- lm(income_after ~ treatment, 
                data = imaginary_program)
tidy(rct_model)
```

```
## # A tibble: 2 × 3
##   term               estimate std.error
##   <chr>                 <dbl>     <dbl>
## 1 (Intercept)           205.       3.66
## 2 treatmentTreatment     46.0      5.17
```
]

---

# 2a. Show difference

```r
ggplot(imaginary_program, 
       aes(x = treatment, 
           y = income_after, 
           color = treatment)) +
  stat_summary(geom = "pointrange", 
               fun.data = "mean_se", 
               fun.args = list(mult=1.96)) +
  guides(color = FALSE) +
  labs(x = NULL, y = "Income")
```
]

---

# Should you control for stuff?

.box-inv-5.large[No!]

---

layout: false
name: gold-standard
class: center middle section-title section-title-6 animated fadeIn

# The "gold" standard

---

---

# Types of research

.box-inv-6.medium.sp-after[Experimental studies vs.<br>observational studies]

.box-inv-6.medium[Which is better?]

---

.center[
<figure>
  <img src="img/07/nyt-wellness-program.png" alt="NYT wellness program report" title="NYT wellness program report" width="75%">
</figure>
]

???

https://www.nytimes.com/2018/08/06/upshot/employer-wellness-programs-randomized-trials.html

---

---

???

https://twitter.com/MIT/status/1183752282988564480

---

&nbsp;

.box-6.large.sp-after[RCTs are great!]

.box-6.large[Super impractical to do<br>all the time though!]

---

---

# "Gold standard"

.box-inv-6.medium.sp-after["Gold standard" implies that all<br>causal inferences will be valid it<br>you do the experiment right]

---

# Moving to Opportunity

.center[
<figure>
  <img src="img/07/hud.jpg" alt="HUD HQ in DC" title="HUD HQ in DC" width="70%">
</figure>
]

???

**MTO** - main question = does your neighborhood matter?

Subtext = kids who grow up in poor neighborhood do worse - lots of theories about that. Problems reside in the family vs. something inherent in the neighborhood / social capital or other benefits inherent in the community (be in the network of people who know where the jobs for people with low education are)

Alternative explanations:

- Poor parents don't do well in the labor market, so they live in cheap neighborhoods, where they're surrounded by the same type of people
- Racism and discrimination

Randomly assign people to where they live - ideally from birth or even pre-birth - but families that we can possibly randomly assign already have kids (required for receiving public housing assistance)

So they randomly assign families in public housing who are willing to move and accept risk of having that move controlled - no other way to really do this - you'd have to pay middle-class and higher people a ton of money to get them to move

Then randomly assign them to (1) stay, (2) move to anywhere that would take voucher, (3) anywhere with less than 10% poverty rate + get relocation counseling

People were hoping to get option 2, since people already weren't choosing 3 - it's uncomfortable to move to a neighborhood where you don't fit in

https://commons.wikimedia.org/wiki/File:Housing_and_Urban_Development_headquarters.jpg

---

# RCTs and validity

.box-inv-6.medium[Randomization fixes a ton of<br>internal validity issues]

.pull-left[
.box-6[**Selection**<br>Treatment and control<br>groups are comparable;<br>people don't self-select]
]

.pull-right[
.box-6[**Trends**<br>Maturation, secular<br>trends, seasonality,<br>regression to the mean<br>all generally average out]
]

---

# RCTs and validity

.box-inv-6.medium[RCTs don't fix attrition!]

.box-inv-6.medium.sp-before[If attrition is correlated<br>with treatment, that's bad]

.box-6[People might drop out because of the treatment,<br>or because they got/didn't get into the control group]

???

You don't have data on them. NA values. Even if people don't comply (like move to a private school), you can still get data on them, so that's okay. People might drop out randomly, and that would be fine. But if attrition is correlated with the treatment, then it's bad. People might drop because of the treatment, or because they didn't get the control group. Impossible to sign the bias—could be tons of different reasons.

---

# Addressing attrition

.box-inv-6.less-medium[Recruit as effectively as possible]

.box-6.sp-after[You don't just want weird/WEIRD participants]

.box-inv-6.less-medium[Get people on board]

.box-6.sp-after[Get participants invested in the experiment]

.box-inv-6.less-medium[Collect as much baseline information as possible]

???

- Recruit as effectively as possible (so you don't get volunteers that routinely sign up for randomized experiments). Use money for recruitment

- Get people on board. Treatment might be less/more enjoyable than the control group, so people will feel lucky or unlucky to be in the treatment group - explain why the experiment itself is important. Make them invested in getting the experiment to work. Science! Why you want the right answer beyond whether or not the treatment works

- Collect as much baseline information as possible before assigning to treatment and control groups - doesn't help reduce attrition like getting people on board, but it lets us see if attrition is random with respect to preexisting characteristics - checks for randomization failures. Don't rerandomize. Randomize and then suck it up.

---

# RCTs and validity

.box-inv-6.less-medium[Randomization failures]

.box-inv-6.less-medium[Noncompliance]

.box-6[Some people assigned to treatment won't take it;<br>some people assigned to control will take it]

???

ITT is probably the most policy-related measure - if there's a low compliance rate but a good ITT effect, you can try to make the program nicer, better

---

# Other limitations

.box-inv-6.medium.sp-after[RCTs don't magically fix construct validity<br>or statistical conclusion validity]

.box-inv-6.medium.sp-after[RCTs **definitely** don't<br>magically fix external validity]

???

MTO varied income, not race, since it's illegal to tell people they can only move to a white neighborhood. So they used income instead of race. Keys under the light issue, since the original issue was about race

Scalability issue with STAR. It wasn't hard to hire 40 more teachers in TN. California couldn't find enough teachers, so they emergency certified "a bunch of morons," which messed up the program effect

---

???

https://www.vox.com/future-perfect/2019/10/14/20913928/nobel-prize-economics-duflo-banerjee-kremer

---

---

# When to randomly assign

???

- When demand for treatment exceeds supply or treatment must be phased in over time (instead of doing the closest place first, etc.)
- When people don't know what they want to do = equipoise (medical trials have to be in equipoise - unethical to use a treatment that's clearly beneficial)
- When the local culture is favorable to random assignment - prime people to be more comfortable with it
- When you are a nondemocratic monopoly provider - if you're the only one with the treatment, you decide who gets it - like Google and Facebook and A/B testing
- When people won't know (as long as it's ethical) - resume name experiments, e-mails to politicians
- When lotteries are going to happen anyway

---

# When to <span style="color: #F6D645;">not</span> randomly assign

???

Past: effects of city segregation or political regime type

Universal phenomena: climate change, social norms

---

layout: false
name: matching
class: center middle section-title section-title-1 animated fadeIn

# Adjustment<br>with matching

---

.center[
<figure>
  <img src="img/05/mm-matching.png" alt="Matching table from Mastering 'Metrics" title="Matching table from Mastering 'Metrics" width="70%">
</figure>
]

---

# Why match?

.box-inv-1.medium[Reduce model dependence]

.box-1.sp-after[Imbalance → model dependence → researcher discretion → bias]

.box-inv-1.medium.sp-after[Compare apples to apples]

.box-inv-1.medium[It's a way to adjust for backdoors!]

---

---

.smaller[
$$
\color{white}{\beta_0 \text{E}^2} \text{Outcome} = \beta_0 + \beta_1 \text{Education} + \beta_2 \text{Treatment} \color{white}{\beta_0 \text{E}^2}
$$
]

---

.smaller[
$$
\text{Outcome} = \beta_0 + \beta_1 \text{Education} + \beta_2 \text{Education}^2 + \beta_3 \text{Treatment}
$$
]

---

---

.smaller[
$$
\color{white}{\beta_0 \text{E}^2} \text{Outcome} = \beta_0 + \beta_1 \text{Education} + \beta_2 \text{Treatment} \color{white}{\beta_0 \text{E}^2}
$$
]

---

.smaller[
$$
\text{Outcome} = \beta_0 + \beta_1 \text{Education} + \beta_2 \text{Education}^2 + \beta_3 \text{Treatment}
$$
]

---

---

# General process for matching

.box-inv-1.medium[Step 1. Preprocessing]

.box-1.sp-after[Use what you know about the DAG to inform this guessing!]

.box-inv-1.medium[Step 2. Estimation]

.box-1[Use the new trimmed/preprocessed data to build a model,<br>calculate difference in means, etc.]

---

---

# Different methods

.box-inv-1.medium[Nearest neighbor matching (NN)]

.box-1.small.sp-after[Mahalanobis distance / Euclidean distance]

.box-inv-1.medium.sp-after-half[~~Propensity score matching (PSM)~~]

.box-inv-1.medium.sp-after-half[Inverse probability weighting (IPW)]

.box-inv-1.small[(and lots of other methods we're not covering!)]

---

# Nearest neighbor matching

.box-inv-1.medium.sp-after[Find untreated observations that are<br>very close/similar to treated<br>observations based on confounders]

.box-inv-1.medium[Lots of mathy ways to measure distance]

---

???

https://www.cnbc.com/2020/02/05/70percent-chance-of-recession-in-next-six-months-study-from-mit-and-state-street-finds.html

---

---

# Matching and eugenics

.box-inv-1.medium[Prasanta Chandra Mahalanobis]

.pull-left.center[
<figure>
  <img src="img/07/mahalanobis.png" alt="Prasanta Chandra Mahalanobis" title="Prasanta Chandra Mahalanobis" width="50%">
</figure>
]

???

https://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis#/media/File:PCMahalanobis.png

---

---

---

---

---

# Potential problems with matching

.center[
<figure>
  <img src="07-slides_files/figure-html/edu-age-matched-1.png" alt="Lost data" title="Lost data" width="50%%">
</figure>
]

---

# Propensity scores

.box-inv-1.medium[Predict the probability of<br>assignment to treatment using a model]

.box-1.sp-after[Logistic regression, probit regression, machine learning, etc.]

.box-1.smaller[Here's logistic regression:]

`$$\operatorname{log} \frac{p_\text{Treated}}{1 - p_\text{Treated}} = \beta_0 + \beta_1 \text{Education} + \beta_2 \text{Age}$$`

---

.smaller[
`$$\operatorname{log} \frac{p_\text{Manual}}{1 - p_\text{Manual}} = \beta_0 + \beta_1 \text{MPG}$$`
]

```r
model_transmission <- glm(am ~ mpg, data = mtcars, family = binomial(link = "logit"))
```
]

---

.box-1[Odds ratios .tiny[(e<sup>β</sup>; centered around 1: 1.5 means 50% more likely; 0.75 means 25% less likely)]]

.pull-left-narrow.small-code[

```r
tidy(model_transmission)

tidy(model_transmission, 
     exponentiate = TRUE)
```
]

.pull-right-wide.small-code[

```
## # A tibble: 2 × 5
##   term        estimate std.error statistic p.value
##   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
## 1 (Intercept)   -6.60      2.35      -2.81 0.00498
## 2 mpg            0.307     0.115      2.67 0.00751
```

```
## # A tibble: 2 × 5
##   term        estimate std.error statistic p.value
##   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
## 1 (Intercept)  0.00136     2.35      -2.81 0.00498
## 2 mpg          1.36        0.115      2.67 0.00751
```
]

---

.box-inv-1.smaller[Plug all the values of MPG into the model and find the predicted probability of manual transmission]

```r
augment(model_transmission, data = mtcars, type.predict = "response")
```
]

.pull-left.small-code[

```r
## # A tibble: 32 x 3
##      mpg    am .fitted
##    <dbl> <dbl>   <dbl>
##  1  21       1  0.461 
##  2  21       1  0.461 
##  3  22.8     1  0.598 
##  4  21.4     0  0.492 
##  5  18.7     0  0.297 
##  6  18.1     0  0.260 
*##  7  14.3     0  0.0986
*##  8  24.4     0  0.708
##  9  22.8     0  0.598 
## 10  19.2     0  0.330 
## # … with 22 more rows
```

]

&nbsp;

.box-1.smaller[Row 7 is highly unlikely to be manual (1)]

.box-1.smaller[Row 8 is highly likely to be manual]
]

---

# Propensity score matching

.box-inv-1.medium.sp-after-half[Super popular method]

.box-inv-1.medium.sp-after-half[There are mathy reasons why it's not great<br>for matching *for identification purposes*]

.box-inv-1.medium.sp-after-half[Propensity scores are fine!<br>Using them for matching isn't!]

---

???

https://gking.harvard.edu/files/gking/files/pan1900011_rev.pdf

---

---

# Weighting

---

# Weighting

<table>
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:center;"> Young </th>
   <th style="text-align:center;"> Middle </th>
   <th style="text-align:center;"> Old </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Population </td>
   <td style="text-align:center;"> 30% </td>
   <td style="text-align:center;"> 40% </td>
   <td style="text-align:center;"> 30% </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Sample </td>
   <td style="text-align:center;"> 60% </td>
   <td style="text-align:center;"> 30% </td>
   <td style="text-align:center;"> 10% </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Weight </td>
   <td style="text-align:center;"> &amp;ensp;30 / 60&amp;ensp;&lt;br&gt;**0.5** </td>
   <td style="text-align:center;"> &amp;ensp;40 / 30&amp;ensp;&lt;br&gt;**1.333** </td>
   <td style="text-align:center;"> &amp;ensp;30 / 10&amp;ensp;&lt;br&gt;**3** </td>
  </tr>
</tbody>
</table>

---

# Inverse probability weighting

.box-inv-1.medium[Use propensity scores to weight<br>observations by how "weird" they are]

.box-1.small[Observations with high probability of treatment<br>who don't get it (and vice versa) have higher weight]

$$
\frac{\text{Treatment}}{\text{Propensity}} + \frac{1 - \text{Treatment}}{1 - \text{Propensity}}
$$

---

```r
augment(model_transmission, data = mtcars, type.predict = "response") %>%
  select(mpg, am, propensity = .fitted) %>%
*  mutate(ip_weight = (am / propensity) + ((1 - am) / (1 - propensity)))
```
]

.pull-left.small-code[

```r
## # A tibble: 32 x 4
##      mpg    am propensity ip_weight
##    <dbl> <dbl>      <dbl>     <dbl>
##  1  21       1     0.461       2.17
##  2  21       1     0.461       2.17
##  3  22.8     1     0.598       1.67
##  4  21.4     0     0.492       1.97
##  5  18.7     0     0.297       1.42
##  6  18.1     0     0.260       1.35
*##  7  14.3     0     0.0986      1.11
*##  8  24.4     0     0.708       3.43
##  9  22.8     0     0.598       2.49
## 10  19.2     0     0.330       1.49
### … with 22 more rows
```

]

&nbsp;

.box-1.smaller[Row 7 is highly unlikely to be manual and isn't.<br>**Boring! Low IPW.**]

.box-1.smaller[Row 8 is highly likely to be manual, but isn't.<br>**That's weird! High IPW.**]
]

---

---

.box-1.huge[Examples!]