Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Regression discontinuity I

Session 10

PMAP 8521: Program evaluation
Andrew Young School of Policy Studies

1 / 74

Plan for today

2 / 74

Plan for today

Arbitrary cutoffs and causal inference

2 / 74

Plan for today

Arbitrary cutoffs and causal inference

Drawing lines and measuring gaps

2 / 74

Plan for today

Arbitrary cutoffs and causal inference

Drawing lines and measuring gaps

Main RDD concerns

2 / 74

Arbitrary cutoffs
and causal inference

3 / 74

Quasi-experiments again

4 / 74

Quasi-experiments again

Instead of using carefully adjusted DAGs,
we can use context to isolate/identify the pathway between
treatment and outcome in observational data

4 / 74

Quasi-experiments again

Instead of using carefully adjusted DAGs,
we can use context to isolate/identify the pathway between
treatment and outcome in observational data

Diff-in-diff was one kind of quasi-experiment

Treatment/control + before/after

4 / 74

Quasi-experiments again

Instead of using carefully adjusted DAGs,
we can use context to isolate/identify the pathway between
treatment and outcome in observational data

Diff-in-diff was one kind of quasi-experiment

Treatment/control + before/after

Regression discontinuity designs (RDD) are another

Arbitrary rules determine access to programs

4 / 74

Rules to access programs

Lots of policies and programs are
based on arbitrary rules and thresholds

5 / 74

Rules to access programs

Lots of policies and programs are
based on arbitrary rules and thresholds

If you're above the threshold, you're in the program;
if you're below, you're not (or vice versa)

5 / 74

Key terms

6 / 74

Key terms

Running / forcing variable

Index or measure that determines eligibility

6 / 74

Key terms

Running / forcing variable

Index or measure that determines eligibility

Cutoff / cutpoint / threshold

Number that formally assigns access to program

6 / 74

7 / 74

Discontinuities everywhere!

Size Annual Monthly 138% 150% 200%
1 $12,760 $1,063 $17,609 $19,140 $25,520
2 $17,240 $1,437 $23,791 $25,860 $34,480
3 $21,720 $1,810 $29,974 $32,580 $43,440
4 $26,200 $2,183 $36,156 $39,300 $52,400
5 $30,680 $2,557 $42,338 $46,020 $61,360
6 $35,160 $2,930 $48,521 $52,740 $70,320
7 $39,640 $3,303 $54,703 $59,460 $79,280
8 $44,120 $3,677 $60,886 $66,180 $88,240

Medicaid
138%*

ACA subsidies
138–400%*

CHIP
200%

SNAP/Free lunch
130%

Reduced lunch
130–185%

8 / 74

Hypothetical tutoring program

9 / 74

Hypothetical tutoring program

Students take an entrance exam

9 / 74

Hypothetical tutoring program

Students take an entrance exam

Those who score 70 or lower
get a free tutor for the year

9 / 74

Hypothetical tutoring program

Students take an entrance exam

Those who score 70 or lower
get a free tutor for the year

Students then take an exit exam
at the end of the year

9 / 74

10 / 74

Causal inference intuition

The people right before and right after the threshold are essentially the same

11 / 74

12 / 74

13 / 74

Causal inference intuition

14 / 74

Causal inference intuition

The people right before and right after the threshold are essentially the same

14 / 74

Causal inference intuition

The people right before and right after the threshold are essentially the same

Pseudo treatment and control groups!

14 / 74

Causal inference intuition

The people right before and right after the threshold are essentially the same

Pseudo treatment and control groups!

Compare outcomes for those
right before/after, calculate difference

14 / 74

15 / 74

16 / 74

17 / 74

18 / 74

Geographic discontinuities

Holbein time zones
19 / 74

Geographic discontinuities

Holbein time zones

Lower turnout in counties on the eastern side of the boundary

Election schedules cause fluctuations in turnout

20 / 74

Time discontinuities

Hospital stays title

California requires that insurance cover two days of post-partum hospitalization

Does extra time in the hospital improve health outcomes?

21 / 74

Time discontinuities

Hospital stays duration

Delivering at 12:01 AM makes you stay longer in the hospital…

22 / 74

Time discontinuities

Hospital stays outcomes

 

…but delivering at 12:01 AM has no effect on readmission rates or mortality rates

23 / 74

Test score discontinuities

Flagship universities

Does going to the main state university (e.g. UGA) make you earn more money?

SAT scores are an arbitrary cutoff for accessing the university

24 / 74

Test score discontinuities

Flagship cutoff

Cutoff seems rule-based

25 / 74

Test score discontinuities

Flagship cutoff

Cutoff seems rule-based

Flagship outcome

Earnings are slightly higher

25 / 74

RDDs are all the rage

People love these things!

26 / 74

RDDs are all the rage

People love these things!

They're intuitive, compelling, and highly graphical

26 / 74

RDDs are all the rage

People love these things!

They're intuitive, compelling, and highly graphical

RDD p-hacking

RDD less susceptible to p-hacking and selective publication than DID or IV

26 / 74

Drawing lines
and measuring gaps

27 / 74

Main goal of RD

28 / 74

Main goal of RD

Measure the gap in outcome for
people on both sides of the cutpoint

28 / 74

Main goal of RD

Measure the gap in outcome for
people on both sides of the cutpoint

Gap = δ =
local average treatment effect (LATE)

28 / 74

29 / 74

Drawing lines

30 / 74

Drawing lines

The size of the gap depends on how
you draw the lines on each side of the cutoff

30 / 74

Drawing lines

The size of the gap depends on how
you draw the lines on each side of the cutoff

The type of lines you choose can
change the estimate of δ—sometimes by a lot!

30 / 74

Drawing lines

The size of the gap depends on how
you draw the lines on each side of the cutoff

The type of lines you choose can
change the estimate of δ—sometimes by a lot!

There's no one right way to draw lines!

30 / 74

Line-drawing considerations

31 / 74

Line-drawing considerations

Parametric vs. non-parametric lines

31 / 74

Line-drawing considerations

Parametric vs. non-parametric lines

Measuring the gap

31 / 74

Line-drawing considerations

Parametric vs. non-parametric lines

Measuring the gap

Bandwidths

31 / 74

Line-drawing considerations

Parametric vs. non-parametric lines

Measuring the gap

Bandwidths

Kernels

31 / 74

Parametric lines

Formulas with parameters

32 / 74

Parametric lines

Formulas with parameters

y=mx+b

y=β0+β1x1+β2x2

32 / 74

y=10+4x

33 / 74

Parametric lines

Not just for straight lines!
Make curvy with exponents or trigonometry

34 / 74

Parametric lines

Not just for straight lines!
Make curvy with exponents or trigonometry

y=β0+β1x+β2x2+β3x7

y=β0+β1x+β2sin(x)

34 / 74

y=1203x+0.07x2

35 / 74

y=30025x+0.65x20.004x3

36 / 74

y=10+4x+50×sin(x4)

37 / 74

Parametric lines

 

38 / 74

Parametric lines

 

It's important to get the parameters right!

38 / 74

Parametric lines

 

It's important to get the parameters right!

Line should fit the data pretty well

38 / 74

39 / 74

40 / 74

Nonparametric lines

41 / 74

Nonparametric lines

Lines without parameters

41 / 74

Nonparametric lines

Lines without parameters

Use the data to find the best line,
often with windows and moving averages

41 / 74

Nonparametric lines

Lines without parameters

Use the data to find the best line,
often with windows and moving averages

Locally estimated/weighted scatterplot smoothing (LOESS/LOWESS)
is a common method (but not the only one!)

41 / 74

y=who knows?

42 / 74
43 / 74

44 / 74

Measuring gap with parametric lines

Parametric gap
45 / 74

Measuring gap with parametric lines

Easiest way: center the running variable around the threshold

id exit_exam entrance_exam entrance_centered tutoring
1 78 92 22 FALSE
2 58 73 3 FALSE
3 62 54 -16 TRUE
4 67 98 28 FALSE
5 54 70 0 TRUE

y=β0+β1Running variable (centered)+β2Indicator for treatment

46 / 74

Measuring gap with parametric lines

Parametric gap
program_data <- tutoring %>%
mutate(entrance_centered =
entrance_exam - 70)
model1 <- lm(exit_exam ~
entrance_centered + tutoring,
data = program_data)
tidy(model1)
## # A tibble: 3 × 3
## term estimate std.error
## <chr> <dbl> <dbl>
## 1 (Intercept) 59.3 0.440
## 2 entrance_centered 0.514 0.0268
## 3 tutoringTRUE 11.0 0.802
47 / 74

Measuring gap with nonparametric lines

Can't use regression; use rdrobust R package

48 / 74

Measuring gap with nonparametric lines

Nonparametric gap
rdrobust(y = tutoring$exit_exam, x = tutoring$entrance_exam, c = 70)
## =============================================================================
## Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
## =============================================================================
## Conventional -9.992 1.708 -5.852 0.000 [-13.339 , -6.646]
## Robust - - -4.992 0.000 [-14.244 , -6.212]
## =============================================================================
49 / 74

Bandwidths

50 / 74

Bandwidths

All you really care about is the
area right around the cutoff

Observations far away don't matter
because they're not comparable

50 / 74

Bandwidths

All you really care about is the
area right around the cutoff

Observations far away don't matter
because they're not comparable

Bandwidth = window around cutoff

50 / 74

51 / 74

Bandwidths

52 / 74

Bandwidths

Algorithms exist to choose optimal width

52 / 74

Bandwidths

Algorithms exist to choose optimal width

Also use common sense

Maybe ±5 for the entrance exam?

52 / 74

Bandwidths

Algorithms exist to choose optimal width

Also use common sense

Maybe ±5 for the entrance exam?

For robustness, check what happens
if you double and halve the bandwidth

52 / 74

Kernels

53 / 74

Kernels

Because we care the most about
observations right by the cutoff,
give more distant ones less weight

53 / 74

Kernels

Because we care the most about
observations right by the cutoff,
give more distant ones less weight

Kernel = method for assigning importance to
observations based on distance to the cutoff

53 / 74

54 / 74

55 / 74

Try everything!

56 / 74

Try everything!

Your estimate of δ depends on all these:

56 / 74

Try everything!

Your estimate of δ depends on all these:

Line type (parametric vs. nonparametric)

Bandwidth (wide vs. narrow)Kernel weighting

56 / 74

Try everything!

Your estimate of δ depends on all these:

Line type (parametric vs. nonparametric)

Bandwidth (wide vs. narrow)Kernel weighting

 

Try lots of different combinations!

56 / 74

57 / 74

58 / 74

Main RDD concerns

59 / 74

It's greedy!

You need lots of data,
since you're throwing most of it away

Different bandwidths
60 / 74

It's limited in scope!

You're only measuring the ATE
for people in the bandwidth

61 / 74

It's limited in scope!

You're only measuring the ATE
for people in the bandwidth

Local Average Treatment Effect (LATE)

61 / 74

It's limited in scope!

You can't make population-level
claims with a LATE

62 / 74

It's limited in scope!

You can't make population-level
claims with a LATE

(But can you really do that with RCTs or diff-in-diff?)

62 / 74

It's limited in scope!

You can't make population-level
claims with a LATE

(But can you really do that with RCTs or diff-in-diff?)

"The realistic conclusion to draw is that
all quantitative empirical results
that we encounter are 'local'"

Angrist and Pischke, Mostly Harmless Econometrics, pp. 23–24

62 / 74

Graphics are neat!

63 / 74

Which gaps are significant?

64 / 74

All of them!

65 / 74

Don't rely only on graphics

Super clear breaks are uncommon

Make graphs,
but also find the
actual δ value

66 / 74

Manipulation!

67 / 74

Manipulation!

People might know about the cutoff
and change their behavior

67 / 74

Manipulation!

People might know about the cutoff
and change their behavior

People might fudge numbers or work to
cross the threshold to get in/out of program

67 / 74

Manipulation!

People might know about the cutoff
and change their behavior

People might fudge numbers or work to
cross the threshold to get in/out of program

If so, those right next to the cutoff are
no longer comparable treatment/control groups

67 / 74
68 / 74
NBA shot locations, 2014-15
69 / 74

Manipulation!

Check with a McCrary density test

rddensity::rdplotdensity() in R

70 / 74

Noncompliance!

71 / 74

Noncompliance!

People on the margin of the cutoff
might end up in/out of the program

71 / 74

Noncompliance!

People on the margin of the cutoff
might end up in/out of the program

The ACA, subsidies, Medicaid, and 138% of the poverty line

71 / 74

Noncompliance!

People on the margin of the cutoff
might end up in/out of the program

The ACA, subsidies, Medicaid, and 138% of the poverty line

Sharp vs. fuzzy discontinuities

71 / 74

Sharp discontinuity

Perfect compliance

72 / 74

Fuzzy discontinuity

Imperfect compliance

73 / 74

Fuzzy discontinuities

Address noncompliance with
instrumental variables
(more on this later!)

74 / 74

Fuzzy discontinuities

Address noncompliance with
instrumental variables
(more on this later!)

Use an instrument for which side
of the cutoff people should be on

74 / 74

Fuzzy discontinuities

Address noncompliance with
instrumental variables
(more on this later!)

Use an instrument for which side
of the cutoff people should be on

Effect is only for compliers near the cutoff
(complier LATE; doubly local effect)

74 / 74

Plan for today

2 / 74
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
Esc Back to slideshow