After discussing the data and presenting descriptive statistics, you normally
turn to discussing
your empirical framework, that is, the research design
you use to empirically answer your research question.
An empirical framework consists of two related components: (i) an
estimation strategy (i.e., what is estimated, how it is estimated, and how
statistical inference is conducted), and (ii) an identification strategy (i.e.,
what feature of the data allows making a causal statement or, if that is not
possible, how we know we are getting close to making such a statement).
2.4.1 Estimation Strategy
An estimation strategy typically consists of the equations to be estimated in
an effort to answer a research question. Though it may be possible for a
savvy reader to recover the estimated equations in a paper by looking at the
tables therein, that is not always possible. At any rate, the amount of work a
reader should have to do should be kept to a minimum, so presenting the
equations to be estimated is very much the norm.
Ideally, those equations will be as parsimonious as possible. Although a
regression might include 10 to 15 control variables, it is best to put all of
those into a vector
x of control variables. What deserves its own variable in
an equation to be shown in an estimation framework? For starters, the
dependent variable (labeled
y) should be included along with the treatment
variable (labeled either
D or
T), the (vector of) controls (labeled
x), an
intercept term (labeled
α), and the error term (labeled
ϵ).
Here are,
in no particular order, a few other norms that are best followed:
• All variables should have the proper subscripts, usually labeled
i,
j,
k,
l, and so forth, from the smallest (e.g., individual) to the largest
level (e.g., region).
• Latin letters should denote variables. Greek letters should denote
coefficients.
• If the estimation strategy subsection features several different
specifications of the same equation, coefficients should also have
subscripts. In other words, one should not reuse estimand
notation. If
β is used to denote the coefficient of interest in a
regression of
y on
D, it should
not be reused to denote the
coefficient of interest in a regression of
y on
D and
x as well—the
two estimands being different, the notation used to denote them
should also be different. This is best done by adding numerical
subscripts to each coefficient, so that in the former specification,
the coefficient on
D would be denoted
β
0
and in the latter,
β
1
. Or
it can be done by adding letter subscripts to each coefficient, so
that for example
β
r
and
β
s
can respectively refer to reduced-form
and structural estimates of the same coefficient.
• The estimation strategy subsection should also specify what
estimation method is used to estimate each estimable equation.
We
are generally interested in E(
y|x), but
E(
y|x) could be
estimated in a number of different ways parametrically,
semiparametrically, or nonparametrically. With a binary outcome
variable, the reader needs to know whether a linear probability
model, a probit, or a logit is estimated.
In cases where it is
ambiguous, the estimator (e.g., least squares, maximum
likelihood, or generalized method of moments) also needs to be
specified.
• After presenting the estimable equations,
it is a good idea to
discuss the relevant hypothesis tests. In a regression of the form
y =
α +
γD +
βx +
ϵ,
(2.1)
for instance, the relevant hypothesis test would be of the form
H
0
:
γ = 0 versus
H
A
:
γ ≠ 0. Here, note that a hypothesis test
always tests for an equality sign. So while a paper might test the
(theoretical) hypothesis that changing
D from 0 to 1 causes an
increase in
y (and
further assesses by how much y increases in
response to the change in
D), statistically speaking, the same
paper tests the (null) hypothesis
that the association between D
and
y is not statistically significantly different from zero.
• The estimation strategy subsection also needs to discuss inference,
meaning whether and how the standard errors are robust (and if
so, robust to what; it is not enough to say that the standard errors
are robust if the Huber-Sandwich-White correction is used, but it
is warranted to say that they are robust to heteroskedasticity),
whether and how they are clustered (and if so, at what level and
why; see Abadie et al. 2017 for a primer), and whether sampling
weights were used to bring the sample closer to the population of
interest (and if so, how they were constructed; see Solon et al.
2015 for a primer).
Dostları ilə paylaş: