Regression discontinuity I

# Regression discontinuity I

**Session 10**

]

---

# Plan for today

.box-2.medium.sp-after-half[Arbitrary cutoffs and causal inference]

.box-4.medium.sp-after-half[Drawing lines and measuring gaps]

.box-6.medium.sp-after-half[Main RDD concerns]

---

name: arbitrary-cutoffs
class: center middle section-title section-title-2 animated fadeIn

# Arbitrary cutoffs<br>and causal inference

---

---

# Quasi-experiments again

.box-inv-2.sp-after[Instead of using carefully adjusted DAGs,<br>we can use *context* to isolate/identify the pathway between<br>treatment and outcome in observational data]

.box-2.sp-after[Treatment/control + before/after]

---

# Rules to access programs

.box-inv-2.medium[Lots of policies and programs are<br>based on arbitrary rules and thresholds]

.box-2[If you're above the threshold, you're in the program;<br>if you're below, you're not (or vice versa)]

---

# Key terms

.box-inv-2.medium[Running / forcing variable]

.box-2.sp-after[Index or measure that determines eligibility]

.box-inv-2.medium[Cutoff / cutpoint / threshold]

---

---

---

# Discontinuities everywhere!

.pull-left-wide.small[
<table>
 <thead>
  <tr>
   <th style="text-align:center;"> Size </th>
   <th style="text-align:center;"> Annual </th>
   <th style="text-align:center;"> Monthly </th>
   <th style="text-align:center;"> 138% </th>
   <th style="text-align:center;"> 150% </th>
   <th style="text-align:center;"> 200% </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> 1 </td>
   <td style="text-align:center;"> $12,760 </td>
   <td style="text-align:center;"> $1,063 </td>
   <td style="text-align:center;"> $17,609 </td>
   <td style="text-align:center;"> $19,140 </td>
   <td style="text-align:center;"> $25,520 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 2 </td>
   <td style="text-align:center;"> $17,240 </td>
   <td style="text-align:center;"> $1,437 </td>
   <td style="text-align:center;"> $23,791 </td>
   <td style="text-align:center;"> $25,860 </td>
   <td style="text-align:center;"> $34,480 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 3 </td>
   <td style="text-align:center;"> $21,720 </td>
   <td style="text-align:center;"> $1,810 </td>
   <td style="text-align:center;"> $29,974 </td>
   <td style="text-align:center;"> $32,580 </td>
   <td style="text-align:center;"> $43,440 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 4 </td>
   <td style="text-align:center;"> $26,200 </td>
   <td style="text-align:center;"> $2,183 </td>
   <td style="text-align:center;"> $36,156 </td>
   <td style="text-align:center;"> $39,300 </td>
   <td style="text-align:center;"> $52,400 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 5 </td>
   <td style="text-align:center;"> $30,680 </td>
   <td style="text-align:center;"> $2,557 </td>
   <td style="text-align:center;"> $42,338 </td>
   <td style="text-align:center;"> $46,020 </td>
   <td style="text-align:center;"> $61,360 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 6 </td>
   <td style="text-align:center;"> $35,160 </td>
   <td style="text-align:center;"> $2,930 </td>
   <td style="text-align:center;"> $48,521 </td>
   <td style="text-align:center;"> $52,740 </td>
   <td style="text-align:center;"> $70,320 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 7 </td>
   <td style="text-align:center;"> $39,640 </td>
   <td style="text-align:center;"> $3,303 </td>
   <td style="text-align:center;"> $54,703 </td>
   <td style="text-align:center;"> $59,460 </td>
   <td style="text-align:center;"> $79,280 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 8 </td>
   <td style="text-align:center;"> $44,120 </td>
   <td style="text-align:center;"> $3,677 </td>
   <td style="text-align:center;"> $60,886 </td>
   <td style="text-align:center;"> $66,180 </td>
   <td style="text-align:center;"> $88,240 </td>
  </tr>
</tbody>
</table>
]

.box-inv-2.smaller[**ACA subsidies**<br>138–400%*]

.box-inv-2.smaller[**CHIP**<br>200%]

.box-inv-2.smaller[**SNAP/Free lunch**<br>130%]

.box-inv-2.smaller[**Reduced lunch**<br>130–185%]
]

---

# Hypothetical tutoring program

.box-inv-2.medium[Students take an entrance exam]

.box-inv-2.medium[Those who score 70 or lower<br>get a free tutor for the year]

.box-inv-2.medium[Students then take an exit exam<br>at the end of the year]

---

---

# Causal inference intuition

.box-inv-2.medium[The people right before and right after the threshold are essentially the same]

---

---

---

# Causal inference intuition

.box-inv-2.medium[The people right before and right after the threshold are essentially the same]

.box-2.medium[Pseudo treatment and control groups!]

.box-inv-2.medium[Compare outcomes for those<br>right before/after, calculate difference]

---

---

---

---

---

---

# Geographic discontinuities

.center[
<figure>
  <img src="img/10/timezones-1.png" alt="Holbein time zones" title="Holbein time zones" width="100%">
</figure>
]

---

# Geographic discontinuities

.pull-left-wide.center[
<figure>
  <img src="img/10/timezones-2.png" alt="Holbein time zones" title="Holbein time zones" width="100%">
</figure>
]

---

# Time discontinuities

.pull-left-wide.center[
<figure>
  <img src="img/10/hospitals-1.png" alt="Hospital stays title" title="Hospital stays title" width="90%">
</figure>
]

.pull-right-narrow[
.box-inv-2[California requires that insurance cover two days of post-partum hospitalization]

---

# Time discontinuities

.center[
<figure>
  <img src="img/10/hospitals-2.png" alt="Hospital stays duration" title="Hospital stays duration" width="100%">
</figure>
]

---

# Time discontinuities

.pull-left-wide.center[
<figure>
  <img src="img/10/hospitals-3.png" alt="Hospital stays outcomes" title="Hospital stays outcomes" width="65%">
</figure>
]

---

# Test score discontinuities

.pull-left-wide.center[
<figure>
  <img src="img/10/flagship-1.png" alt="Flagship universities" title="Flagship universities" width="100%">
</figure>
]

.pull-right-narrow[
.box-inv-2[Does going to the main state university (e.g. UGA) make you earn more money?]

---

# Test score discontinuities

.pull-left.center[
<figure>
  <img src="img/10/flagship-2.png" alt="Flagship cutoff" title="Flagship cutoff" width="100%">
</figure>

.pull-right.center[
<figure>
  <img src="img/10/flagship-3.png" alt="Flagship outcome" title="Flagship outcome" width="100%">
</figure>

---

# RDDs are all the rage

.box-inv-2.medium[People love these things!]

.pull-left.center[
<figure>
  <img src="img/10/rdd-p-hacking.png" alt="RDD p-hacking" title="RDD p-hacking" width="80%">
</figure>
]

---

layout: false
name: lines-gaps
class: center middle section-title section-title-4 animated fadeIn

# Drawing lines<br>and measuring gaps

---

# Main goal of RD

.box-inv-4.medium[Measure the gap in outcome for<br>people on both sides of the cutpoint]

.box-inv-4.medium[Gap = **δ** =<br>local average treatment effect (LATE)]

---

![](10-slides_files/figure-html/tutoring-outcome-delta-1.png)

---

---

# Drawing lines

.box-inv-4.medium[The size of the gap depends on how<br>you draw the lines on each side of the cutoff]

.box-inv-4.medium.sp-after[The type of lines you choose can<br>change the estimate of δ—sometimes by a lot!]

.box-4.medium[There's no one right way to draw lines!]

---

# Line-drawing considerations

.box-inv-4.medium[Parametric vs. non-parametric lines]

.box-inv-4.medium[Measuring the gap]

.box-inv-4.medium[Bandwidths]

.box-inv-4.medium[Kernels]

---

# Parametric lines

.box-inv-4.medium[Formulas with *parameters*]

`$$y = \beta_0 + \beta_1 x_1 + \beta_2 x_2$$`
]

---

---

# Parametric lines

.box-inv-4.medium[Not just for straight lines!<br>Make curvy with exponents or trigonometry]

`$$y = \beta_0 + \beta_1 x + \beta_2 \sin(x)$$`
]

---

---

---

---

# Parametric lines

&nbsp;

.box-inv-4.medium.sp-after[It's important to get the parameters right!]

.box-inv-4.medium[Line should fit the data pretty well]

---

---

---

# Nonparametric lines

.box-inv-4.medium[Lines without parameters]

.box-4[<span style="color: #F6D645;">Lo</span>cally <span style="color: #F6D645;">e</span>stimated/<span style="color: #F6D645;">we</span>ighted <span style="color: #F6D645;">s</span>catterplot <span style="color: #F6D645;">s</span>moothing (LOESS/LOWESS)<br>is a common method (but not the only one!)]

---

---

---

---

---

# Measuring gap with parametric lines

.center[
<figure>
  <img src="10-slides_files/figure-html/tutoring-outcome-lines-1.png" alt="Parametric gap" title="Parametric gap" width="85%">
</figure>
]

---

# Measuring gap with parametric lines

.small[
<table>
 <thead>
  <tr>
   <th style="text-align:center;"> id </th>
   <th style="text-align:center;"> exit_exam </th>
   <th style="text-align:center;"> entrance_exam </th>
   <th style="text-align:center;"> entrance_centered </th>
   <th style="text-align:center;"> tutoring </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> 1 </td>
   <td style="text-align:center;"> 78 </td>
   <td style="text-align:center;"> 92 </td>
   <td style="text-align:center;"> 22 </td>
   <td style="text-align:center;"> FALSE </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 2 </td>
   <td style="text-align:center;"> 58 </td>
   <td style="text-align:center;"> 73 </td>
   <td style="text-align:center;"> 3 </td>
   <td style="text-align:center;"> FALSE </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 3 </td>
   <td style="text-align:center;"> 62 </td>
   <td style="text-align:center;"> 54 </td>
   <td style="text-align:center;"> -16 </td>
   <td style="text-align:center;"> TRUE </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 4 </td>
   <td style="text-align:center;"> 67 </td>
   <td style="text-align:center;"> 98 </td>
   <td style="text-align:center;"> 28 </td>
   <td style="text-align:center;"> FALSE </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 5 </td>
   <td style="text-align:center;"> 54 </td>
   <td style="text-align:center;"> 70 </td>
   <td style="text-align:center;"> 0 </td>
   <td style="text-align:center;"> TRUE </td>
  </tr>
</tbody>
</table>
]

.small[
`$$y = \beta_0 + \beta_1 \text{Running variable (centered)} + \beta_2 \text{Indicator for treatment}$$`
]

---

# Measuring gap with parametric lines

.center[
<figure>
  <img src="10-slides_files/figure-html/tutoring-outcome-lines-1.png" alt="Parametric gap" title="Parametric gap" width="35%">
</figure>
]

```r
program_data <- tutoring %>% 
  mutate(entrance_centered = 
           entrance_exam - 70)

model1 <- lm(exit_exam ~ 
               entrance_centered + tutoring,
             data = program_data)
```
]

```r
tidy(model1)
```

```
## # A tibble: 3 × 3
##   term              estimate std.error
##   <chr>                <dbl>     <dbl>
## 1 (Intercept)         59.3      0.440 
## 2 entrance_centered    0.514    0.0268
## 3 tutoringTRUE        11.0      0.802
```
]

---

# Measuring gap with nonparametric lines

.center[
<img src="10-slides_files/figure-html/tutoring-outcome-loess-1.png" width="80%" style="display: block; margin: auto;" />
]

---

# Measuring gap with nonparametric lines

.center[
<figure>
  <img src="10-slides_files/figure-html/tutoring-outcome-loess-1.png" alt="Nonparametric gap" title="Nonparametric gap" width="40%">
</figure>
]

```r
rdrobust(y = tutoring$exit_exam, x = tutoring$entrance_exam, c = 70)
```
]

```
## =============================================================================
##         Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
## =============================================================================
##   Conventional    -9.992     1.708    -5.852     0.000   [-13.339 , -6.646]    
##         Robust         -         -    -4.992     0.000   [-14.244 , -6.212]    
## =============================================================================
```
]

---

# Bandwidths

.box-inv-4.medium[All you really care about is the<br>area right around the cutoff]

.box-4.sp-after[Observations far away don't matter<br>because they're not comparable]

.box-inv-4.medium[Bandwidth = window around cutoff]

---

---

---

# Bandwidths

.box-inv-4.medium.sp-after[Algorithms exist to choose optimal width]

.box-inv-4.medium[Also use common sense]

.box-4.sp-after[Maybe ±5 for the entrance exam?]

.box-inv-4.medium[For robustness, check what happens<br>if you double and halve the bandwidth]

---

# Kernels

.box-inv-4.medium[Because we care the most about<br>observations right by the cutoff,<br>give more distant ones less weight]

.box-inv-4.medium[Kernel = method for assigning importance to<br>observations based on distance to the cutoff]

---

---

---

# Try everything!

.box-inv-4.medium[Your estimate of δ depends on all these:]

&nbsp;

.box-4.medium[Try lots of different combinations!]

---

---

---

layout: false
name: main-concerns
class: center middle section-title section-title-6 animated fadeIn

# Main RDD concerns

---

---

# It's greedy!

.box-inv-6.medium[You need *lots* of data,<br>since you're throwing most of it away]

.center[
<figure>
  <img src="10-slides_files/figure-html/bandwidth-plots-1.png" alt="Different bandwidths" title="Different bandwidths" width="60%">
</figure>
]

---

# It's limited in scope!

.box-inv-6.medium[You're only measuring the ATE<br>for people in the bandwidth]

.box-6.medium[Local Average Treatment Effect (LATE)]

---

# It's limited in scope!

.box-inv-6.medium[You can't make population-level<br>claims with a LATE]

.box-inv-6.smaller[*(But can you really do that with RCTs or diff-in-diff?)*]

.box-6.medium["The realistic conclusion to draw is that<br>all quantitative empirical results<br>that we encounter are 'local'"]

.box-6.small[Angrist and Pischke, *Mostly Harmless Econometrics*, pp. 23–24]

---

# Graphics are neat!

---

# Which gaps are significant?

---

# All of them!

---

# Don't rely *only* on graphics

.box-inv-6.medium[Make graphs,<br>but also find the<br>actual δ value]
]

.pull-right[
<img src="10-slides_files/figure-html/too-graphical-plot-3-single-1.png" width="100%" style="display: block; margin: auto;" />
]

---

# Manipulation!

.box-inv-6.medium[People might know about the cutoff<br>and change their behavior]

---

???

Data from <https://faculty.chicagobooth.edu/george.wu/research/marathon/data.htm> and <https://doi.org/10.1287/mnsc.2015.2417>

---

.center[
<figure>
  <img src="img/10/basketball.png" alt="NBA shot locations, 2014-15" title="NBA shot locations, 2014-15" width="60%">
</figure>
]

???

<https://fivethirtyeight.com/features/how-mapping-shots-in-the-nba-changed-it-forever/>

---

---

# Manipulation!

.box-inv-6.medium[Check with a McCrary density test]

.box-6.small[`rddensity::rdplotdensity()` in R]

---

# Noncompliance!

.box-inv-6.medium[People on the margin of the cutoff<br>might end up in/out of the program]

.box-6.sp-after[The ACA, subsidies, Medicaid, and 138% of the poverty line]

.box-inv-6.medium[Sharp vs. fuzzy discontinuities]

---

# Sharp discontinuity

---

# Fuzzy discontinuity

---

# Fuzzy discontinuities

.box-inv-6.medium[Address noncompliance with<br>instrumental variables<br>(more on this later!)]

.box-6.sp-after[Use an instrument for which side<br>of the cutoff people should be on]