Review Article - (2015) Volume 6, Issue 4
Instrumental variables (IV)analysis seems an attractive method to control for unmeasured confounding in observational epidemiological studies. Here, we provide an overview of the estimation methods of IVanalysis and indicate their possible advantages and limitations.We found that two-stage least squares is the method of first choice if exposure and outcome are both continuous and show a linear relation. In case of a nonlinear relation, two-stage residual inclusion may be a suitable alternative. In settings with binary outcomes as well as nonlinear relations between exposure and outcome, generalized method of moments (GMM), structural mean models (SMM), and bivariate probit models perform well, yet GMM and SMM are generally more robust. The standard errors of the IVestimate can be estimated using a robust or bootstrap method. All estimation methods are prone to bias when the IVassumptions are violated. Researchers should be aware of the underlying assumptions of the estimation methods as well as the key assumptions of the IVwhen interpreting the exposure effects estimated through IV analysis.
Keywords: Instrumental variables; Estimation method; Unobserved confounding; Epidemiology; Statistical methods; Observational studies; Causal inference
Instrumental variable (IV) analysis has primarily been used in economics and social science research, as a tool for causal inference, but has begun to appear in epidemiologic research over the last decade to control for unmeasured confounding [1-6]. An IV is a variable that can be considered to mimic the treatment assignment process in a randomized study [7-10]. IVanalysis generally involves in a two-stage modelling approach to estimate the exposure effects. In the first stage, the effect of the IVon exposure is estimated, whereas in the second stage, outcomes are compared in terms of predicted exposure rather than the actual exposure [11]. To value the estimates obtained through IVanalysis, it is important to understand the underlying methodology of the estimation methods in the IV analysis.
Over the last decade several reviews of IVanalysis were published, covering various aspects including the key assumptions, estimating parameters, possible IVs, estimation methods, reporting of the results, and the use of IVs in comparative effectiveness research [3,4,12-23]. We summarized these reviews in Table 1. However, none of these articles included all possible estimation methods of IVanalysis. Hence, we aimed to provide an overview of the estimation methods and to indicate their possible advantages and limitations. After a general introduction to the assumptions underlying IVanalysis, we will describe the methods that have been used in IVstudies in medical research.
Author | Publication year | Journal name | Title | Main features | |
---|---|---|---|---|---|
Greenland | 2000 | International Journal of Epidemiology | An introduction to instrumental variables for epidemiologists | -basic introduction with an empirical example -link with randomized studies with non-compliance -estimated bound for the exposure effects |
|
Martens et al. | 2006 | Epidemiology | Instrumental variables: application and limitations | -fundamental issues are described with several practical details using graphical representation | |
Hernan and Robins | 2006 | Epidemiology | Instruments for causal inference: an epidemiologists dream? | -overview of IV analysis with explanation of several key assumptions -highlights limitations and emphasis on estimating parameters of IV analysis |
|
Rassen et al. | 2009 | Journal of Clinical Epidemiology | Instrumental variables I: instrumental variables exploit natural variation in nonexperimental data to estimate causal relationships | -demonstrates how IV analysis arises from an analogous but potentially impossible RCT design -shows estimation of effects with an empirical example |
|
Rassen et al. | 2009 | Journal of Clinical Epidemiology | Instrumental variables II: instrumental variable application—in 25 variations, the physician prescribing preference generally was strong and reduced covariate imbalance | -assesses the overall relationship between strength and imbalance of confounders between IV categories with an empirical example -assesses several possible IVs |
|
Rassen et al. | 2009 | American Journal of Epidemiology | Instrumental variable analysis for estimation of treatment effects with dichotomous outcomes | -reviews commonly used IV estimation methods for binary outcome and compared them in empirical examples | |
Brookhart et al. | 2010 | Pharmacoepidemiology and Drug Safety | Instrumental variable methods in comparative safety and effectiveness research | -guidance on reporting of IV analysis with an empirical example | |
Clarke and Windmeijer | 2010 | Journal of American Statistical Association | Instrumental variable estimators for binary outcomes | -estimation methods of IV analysis for binary outcome with mathematical descriptions | |
Chen and Briesacher | 2011 | Journal of Clinical Epidemiology | Use of instrumental variable in prescription drug research with observational data: a systematic review | -review of practice of IV analysis in epidemiology | |
Palmer et al. | 2011 | American Journal of Epidemiology | Instrumental variable estimation of causal risk ratios and causal odds ratios in Mendelian randomization | -overview of commonly used IV estimation methods for continuous exposure -empirical example of Mendelian randomization study |
|
Davies et al. | 2013 | Epidemiology | Issues in the reporting and conduct of instrumental variable studies: a systematic review | - review of practice of IV analysis in epidemiology -focus on target parameter (e.g. RD, OR) -reviews methods used to estimate standard errors - proposes a checklist of information to be reported by studies using instrumental variables |
|
Swanson and Hernan | 2013 | Epidemiology | Commentary: How to report instrumental variable analyses (suggestions welcome) | -provided flow chart for reporting of IV analyses | |
Baiocchi et al. | 2014 | Statistics in Med | Instrumental variable methods for causal inference | -generic tutorial and guidelines of IV analysis with an empirical example | |
Garabedian et al. | 2014 | Annals of Internal Medicine | Potential Bias of Instrumental Variable Analyses for Observational Comparative Effectiveness Research | -this review found that the results of IV analyses may be biased substantially if the IV and outcome are related through an unadjusted third variable: an “IV–outcome confounder” - the authors caution against overreliance on IV studies comparative effectiveness research |
Table 1: Introductory and review articles of instrumental variable analysis in Epidemiologic studies (2000-2014)
Instrumental variables
The IV is an observed variable, which is related to exposure and only related to the outcome through exposure. This resembles a randomized trial, in which treatment allocation typically almost perfectly coincides with the actual treatment received and (in case of a double blind trial) treatment assignment only affects the outcome through the received treatment (hence the term pseudo-randomisation that is used for IVmethods). This implies that an IVis neither directly nor indirectly (e.g. through observed or unobserved confounders) associated with the outcome [6,18,24]. Therefore, all observed and unobserved confounders should on average be equally distributed among different levels of the IV(similar to a randomized trial). These assumptions are illustrated in Figure 1. Along with these basic assumptions , there are other assumptions (i.e., homogeneous treatment effects, monotonicity) that are needed for point identification of IVestimates [14,19].
Figure 1: Schematic presentation of valid and invalid instrumental variables X, Y, Z, and U denote the exposure, outcome, IV, and confounders (observed or unobserved), respectively. a) Z is associated with X and only related to Y through X (valid IV), b) Z is not associated with X (first IV assumption is violated), c) Z is not independent of confounders, i.e. Z has an indirect effect on Y (second IV assumption is violated), d) Z is not independent of Y given X and U, i.e. Z has a direct effect on Y (third IV assumption violated)
Notation
Throughout this article, we use the following notation: Y denotes the outcome, X denotes exposure, and Z denotes the IV. C and U denote the (one or more) observed and unobserved confounding variables, respectively. denotes the predicted value of exposure. Finally,
indicates the IV estimator, i.e., the estimator of the causal relation between exposure and outcome.
Estimation method of IVanalysis
Ratio estimator (RE)
In a study with a single binary IV, the RE (also called Wald [25] or grouping estimator) can be applied and which is expressed as:
(1)
(2)
(3)
where are the mean of y and x, respectively, when Z=0 and
, when Z=0;
is the difference in probability of being exposed for Z=1 and Z=0; and
is the risk difference of an event between Z=1 and Z=0. Equation (1) is suitable for settings with continuous exposure and continuous outcome, equation (2) for binary exposure and continuous outcome [26,27], and equation (3) for binary exposure and binary outcome.
The RE is a simple estimation method to estimate the exposure effects from the IVanalysis. However, it is not suitable for multiple IVs or in a situation when measured confounders need to be adjusted for in the analysis.
Two-stage least squares method (2SLS)
The best known two-stage method for IVanalysis is the2SLS method which is traditionally used in IVanalyses [10,28,29]. Unlike ratio estimators, this method is able to adjust any possible measured confounders. The 2SLS estimator can be obtained by the following models:
(4)
(5)
The first model estimates the effect of the IVon exposure, whereas in the second model outcomes are compared in terms of predicted exposure rather than the actual exposure. The latter model yields the estimated parameter, , which is the IVestimator. For a single IV, the
is equivalent to the estimators in the equations (1), (2), and (3).In case of multiple IVs, information on these IVs can be simultaneously incorporated in model (4). Then,
is the weighted average of the ratio estimators [30]. For multiple IVs, 2SLS provides biased estimates [30-32] and another method, e.g., limited information maximum likelihood (LIML), [33] can be an alternative. One of the conditions of this method is that the error term should be homoscedastic (homogeneity of variance). However, in case of heteroscedasticity, other methods (e.g., generalized method of moments) can be considered [34]. Moreover, the 2SLS may produce biased results in the case of binary variables or non-linear relation between exposure and outcome (Table 2).
Method | Basic notion | Exposure effects | Strength | Limitation |
---|---|---|---|---|
Ratio estimator (RE) | -the RE is appropriate when only one IV | -RD, RR, OR | -simple estimation method -with a single binary IV and no other confounders, 2SLS = RE | |
Two-stage least squares (2SLS) | -linear models without making parametric assumptions on the error terms -for multiple IVs, IV estimator is the weighted average of the ratio estimators | -estimator similar as classical regression | -natural starting point of IV analysis -the estimate asymptotically unbiased -widely used for binary exposure and outcome and provides the exposure effect on risk difference scale -unlike RE, it is able to adjust the possible measured confounders | -show biased results in binary cases or in the case of non-linear models -for multiple IVs, 2SLS estimator is biased and hence limited information of maximum likelihood method would be an alternative -for smaller sample sizes, limited information maximum likelihood estimator is more efficient and consistent than 2SLS -IV and 2SLS are a special case of GMM; however both yield the same results in the case of homoscedastic errors variance |
Linear probability models (LPM) | -applied for binary outcome, exposure, and IV, the data are modelled using linear functions -for a single binary IV, the estimator equivalent to the RE | RD | -simple to estimate and interpret as the regression coefficients -the RD is consistent for the ACE | - sometimes predicted probabilities outside of the 0–1 range and for rare outcomes this may become negative - assumes the marginal/incremental effect of exposure remains constant which is logically impossible for binary outcome |
Two-stage predictor substitution (2SPS) | -the rote extension to nonlinear models of the linear IV models -targets a marginal (population-averaged) odds ratio -it is the mimic of 2SLS -non-linear least squares is used to estimates the parameter -for a linear model, 2SPS = 2SLS | -RD, RR, OR | -suitable for non-linear association between exposure and outcome | -in practice, 2SPS in non-linear model does not always yield consistent exposure effects on the outcome - parameter estimation process is more difficult than 2SLS -under a logistic regression model, 2SPS may not provide causal OR |
Two- stage residual inclusion (2SRI) | -include the estimated unobservable confounder (residual) from the first-stage as an additional variable along with the exposure in the second-stage model - also called control function estimator -under a linear model, 2SRI = 2SLS = 2SPS | -RD, RR, OR | -yields consistent estimates for linear and non-linear models -performs better than 2SPS -possible to apply in the specific case of a binary exposure with a binary or count outcome -for a log-linear model in the stage-two, 2SRI estimator provides CRR | -it may give biased estimates when there is strong unmeasured confounding, as is usually the case in an IV analysis -under a logistic regression model, 2SRI estimator may not provide causal OR -generally require the exposure to be continuous, rather than binary, discrete, or censored |
Two-stage logistic regression (2SLR) | -when outcome and exposure are binary and interest to estimate OR -fully parametric, maximum likelihood technique is used to estimate the parameters | -OR | -parallel to 2SLS using LRM in both stages instead of linear models | -if the first-stage logistic model is not correctly specified then second-stage parameter estimates might be biased -estimator does not provide COR |
Three-stage least squares (3SLS) | -an extension of 2SLS but unlike the 2SLS, all coefficients are estimated simultaneously, requires three steps -in 2SLS, if the errors in the two equations are correlated, the 3SLS can be an suitable alternative | -RD, RR | -more information is used and hence the estimators are likely to be more efficient than 2SLS | -more vulnerable to a misspecification of the error terms -very rarely applied in epidemiologic studies -estimation process is more complicated than 2SLS -3SLS becomes inconsistent if errors are heteroskedastic |
Structural mean models (SMM) | -SMMs use IVs via G-estimation and involves the assumption of conditional mean independence -additive SMMs use continuous outcome and multiplicative SMMs use positive-valued outcomes -MSMM assumed log-linear model to measure the risk ratio -LSMM assumes logistic regression model which is fitted by maximum likelihood technique | RD, RR, OR | -it relaxes several of the modelling restrictions (constant treatment effects) required by ratio estimator/two-stage methods -can be used in the case of time-dependent instruments, exposures, and confounders -provides average treatment effects for the treated subjects | -the assumption of no effect modification is impossible to verify -with a binary outcome, additive SMMs and MSMM suffer from the limitations of linear and log-linear models (e.g., predicted response probabilities may outside of the interval [0, ])) |
Generalized method of moments (GMM) | -a non-linear analogue of 2SLS -the standard IV (2SLS) estimator is a special case of a GMM estimator -making assumptions about the moments of the error term -allows estimation of parameters inover-identified model (number of IV greater than number of exposure variable) -the parameters are estimated in an iterative process | RD, RR, OR | -it requires specification only of certain moment conditions -applicable for the linear and non-linear models -non-linear GMM estimator is asymptotically more efficient than 2SLS -more robust and less sensitive to parametric conditions -works better than 2SLR when exposure and outcome are binary -in case of heteroskedasticity, this is more efficient than the linear IV estimators | -GMM estimator with logistic regression model is not consistent for the COR due to non-collapsibility of the OR |
Bivariate probit models (BPM) | -two-stage method, but as different to 2SLS and model the probabilities directly and are restricted on [0,1] -full information maximum likelihood is used to estimate the parameter -accounts for the correlation between the errors | Probit coefficient* | -for binary outcome and exposure, BPM perform better than linear IV methods -the estimator of BPM have no interpretation like OR. However, by multiplying a probit coefficient by approximately 1.6, the estimator can be made to approximate OR | -when the distribution of error terms are not normal or the average probability of the outcome variable is close to one or zero, the BPM estimator may not be consistent for ACE |
Table 2: Overview of commonly used estimation methods for IV analysis (basic notions, estimator, strengths, and limitations)
Linear probability model (LPM)
This method is a particular form of the 2SLS in which the outcome, exposure, and IV are binary and provides exposure effects on the risk difference scale. When there is a single binary IV, the estimator can be expressed as in equation (3) [13,35-37].
LPM is a simple technique to estimate the parameter and interpret as the regression coefficients based on linear regression. However, in linear IVanalysis, LPM may provide ambiguous results because the common technique of linear IVis designed for a continuous response [38]. It should be noted that the LPM of binary exposure and outcome may produce predicted values outside of the 0–1 range [28]. Hence, for rare binary outcomes, some predicted probabilities may become negative [39]. In addition, the probability of success increases linearly with exposure, that is, the marginal or incremental effect of exposure remains constant [37], which is logically impossible for binary outcomes [14].
Two-stage predictor substitution (2SPS)
The two-stage predictor substation is an extension of the 2SLS to nonlinear models, which targets a marginal (population-averaged) odds ratio [36,40-42]. In the first-stage, a nonlinear least squares method (NLS) or any other consistent estimation technique is used to estimate the relation between the IVand exposure [43]. Then, the predicted exposure status from the first-stage model replaces the observed exposure as the principal covariate in the second-stage model on the outcome [43,44]. For a continuous exposure and outcome, 2SPS and 2SLS show similar results [24,36].
Two-stage residual inclusion (2SRI)
2SRI (also called control function estimator) [45] is another twostage method and was first suggested by Hausman [46]. The general notion of the 2SRI is to include the error terms (residuals) from the first-stage model as an additional variable along with the exposure in the second-stage model [47]. The models in the first and second-stage can be either linear or nonlinear models. In case of linear models, the 2SRI estimate is equivalent to the 2SLS and 2SPS estimates [44,48]. However, for logistic regression model (LRM), 2SRI estimator may not provide causal odds ratio due to non-collapsibility of the odds ratio.
2SRI yields consistent estimates for both linear and nonlinear models [49,50]. The advantage of 2SRI over 2SLS is that 2SLS is only consistent when the second-stage model is linear, whereas this restriction does not hold for 2SRI [43,51]. Moreover, this method shows more precise estimates than 2SPS [52].
Two-stage logistic regression (2SLR)
When both the outcome and exposure are binary and the interest is to use IV to estimate odds ratios, 2SLRcan be applied. It is similar to 2SLS, but instead of linear models using logistic models in both stages [4,53]. This method is fully parametric and maximum likelihood estimation is used to estimate the parameters. If the first-stage logistic model is not correctly specified, the estimates from the second-stage can be biased [54,55]. Also, note that this method may not provide the causal odds ratio due to the non-collapsibility of the OR [19].
Three-stage least squares method (3SLS)
The 3SLS generalizesthe 2SLS. Possible correlation of the errors (ε2 and ε2) in equations (4) and (5) is not taken into account by 2SLS. 3SLS accounts for the possible correlations between errors and may improve the efficiency of the estimator [56,57]. Unlike 2SLS, in which the coefficients of the two equations are estimated separately, in 3SLS all coefficients are estimated simultaneously. This requires three steps. The first-stage is similar to the 2SLS, i.e., a linear regression of X on Z to get X. In the second-stage, the residuals of the secondstage 2SLS model are obtained to estimate the cross-model correlation matrix (correlation between error terms in both models). Finally, in the third-stage the estimated correlation matrix is used to obtain the IVestimator. When there is no correlation between the error terms of the 2SLS models, the 3SLS reduces to a 2SLS. However, 3SLS is more vulnerable to misspecification error since misspecification of one of the models in the first or second will affect the third stage model [58].
Structural mean models (SMMs)
SMMs explicitly use counterfactuals or potential outcomes [52], which were originally proposed by Robins [59] in the context of randomized trials with non-compliance to estimate the causal effects for the treated (exposed) individuals. SMMs are semi-parametric models and use IVs via G-estimation for identification and estimation of the causal parameter. This method involves the assumption of a conditional mean independence [14,19,60-62] and does not make distributional assumptions about the exposure [19]. SMMs with an identity link is sometimes called additive SMMs and can be used for continuous outcomes and multiplicative SMMs with log-linear model can be used for positive-valued/binary outcomes in order to estimate the causal risk ratio [19,63]. Additionally, the logistic structural mean model (LSMM) developed by Vansteelandt and Goetghebeur [64] and Robins and Rotnitzky [65] can also be used for binary outcome in order to estimate causal odds ratio [19,63].
To handle continuous outcome data, the IV estimator from the additive SMMs can be expressed as equation (2) given that the assumptions of CMI and no effect modification by Z are fulfilled [14,62,66,67]. This estimator provides the average treatment effect (ATT) for the treated individuals [19,68].
The advantage of this method is that it relaxes several of the modelling restrictions such as homogeneous treatment effects required by more classical methods such as RE/two-stage IV methods [14,19]. One of the key assumptions of this method is no effect modification, which is difficult to verify in practical situations [67].
SMMs have been extended by Robins [60] to a general setting of structural nested mean models (SNMM) for repeated measures at multiple time points. The SMMs are a subclass of the SNMM [59,69]. When instruments, exposures, and confounders are time-dependent, SNMM can be used to estimate causal effects of exposure on the outcome [14]. Details and mathematical formulations of SMMs are described elsewhere [14,19,63].
Generalized method of moments (GMM)
When applying the GMM a system of equations is set up, which is then solved numerically using computer algorithms. This technique was formalized by Hansen [70] and is a broad class of estimation methods that allow for a larger number of equations (moment conditions) than parameters [4,53,71] that are not possible in the MSMM and LSMM [19]. More clearly, the GMM allows for estimation of parameters in an over-identified model (number of IVs greater than the number of exposures). GMM with linear model can be similar to the ones used in 2SLS [72] but GMM is also a non-linear analogue of 2SLS [17], which is called multiplicative GMM. Detailed explanations can be found elsewhere [4,19,53].
In general, the nonlinear optimum GMM estimator is asymptotically more efficient than 2SLS [73]. Since GMM is a moment based method without parametric assumptions , it is less prone to model misspecification than 2SLR or bivariate probit models when exposure and outcome are binary [4]. In case of a linear model and single IV, the GMM estimator is equivalent to 2SLS, additive SMM, and LIML [53,66,74]. On the other hand, with log-linear model, (i.e., MGMM) [19], it is equivalent with MSMM and provides the population causal risk ratio [19]. However, this estimator with logistic regression model is not consistent for the causal odds ratio due to non-collapsibility of the odds ratio [17].
In case of a binary or count outcome, Palmer et al. [75] suggested a two-stage IV method where the first-stage is a linear regression and the second stage-model is a logistic or log-linear model [19]. Since IV analysis with logistic regression may not provide a consistent exposure effect, in order to estimate causal risk ratio, GMM with log-linear model is preferable. Moreover, 2SRI [48] is also applicable in the setting of count outcome.
Bivariate probit models (BPM)
When the outcome of interest is binary, so-called probit models can be applied for IV analysis. In contrast to 2SLS, probit models directly model probabilities (i.e., are restricted on (0, 1)) [4,30]. BPM can be applied in two-stages, but unlike common two-stage estimation methods, this method is estimated via full-information maximum likelihood, which takes into account the correlation between the error terms in the two equations [24]. A more detailed model description can be found elsewhere [4,30].
The interpretation of BPM parameters is not like those of ordinary regression model parameters (e.g., logarithm of odds ratio from a logistic model). However, by multiplying a probit coefficient by approximately 1.6 or 1.8, probit coefficients approximate the coefficients obtained through logistic regression [4].
In case of binary outcome, linear IV methods may yield biased results and BPM may be preferable [30,47]. Furthermore, the estimates are more efficient than 2SLS, whereas 2SLS models are more robust to incorrect modelling assumptions regarding the bivariate normal distribution of the error terms [76,77]. However, when the distribution of error terms is not normal or the average probability of the outcome variable is close to one or zero, or if there is more than one exposure, the estimates from the BPM are generally not consistent for the average causal effect [30,77].
Other estimation methods
Apart from the methods discussed above, the outcome variable in epidemiologic research may also be a time-to-event. Also in case of these outcome variables, IV analysis has been applied with two-stage method. In that case, the second-stage model could be a Cox proportional hazards model [78-80]. However, Brookhart et al. [3] stated that this approach for IV analysis is not motivated by a theoretical model and, therefore, parameters that are obtained from this approach may not be causally interpretable. Examples of this approach are a study of the effect of rosiglitazone on (time to) cardiovascular hospitalization and all-cause mortality using facility-prescribing patterns as an IV [78], and a study of the effect of adjuvant chemotherapy on (time to) breast cancer recurrence using physician preference as an IV [79].
Standard error and characteristics of IVestimators
Consider two-stage models for IV analysis, in which the predicted value of exposure from the first-stage model is included in the secondstage model. The uncertainty around this prediction is not taken into account in the latter model, which therefore may result in incorrect precision. Typically, standard errors (SEs) of the IV estimate from the second-stage model are too small [24,30,44,45]. An alternative method to estimate a correct SE is the so-called sandwich variance estimator (robust SE), which involves cross products of the predicted treatment and a dispersion factor based on the observed treatment [49]. Most statistical software packages provide this sandwich variance estimate [10]. Angrist and Krueger [10] noticed that these SEs are asymptotically valid, but in practice (with finite sample size) they are only approximately valid.
An alternative way of estimating SEs is the bootstrap method [81]. Here bootstrap samples of the original data can be used to estimate the variation in the IV estimates and hence its SE [4,6,82-84]. It should be stressed that one of the weaknesses of the IV estimator is that it tends to display large SEs relative to the conventional regression estimator [13,85]. It is also noted that the IV estimator can perform poorly in finite samples and show biased results [31] and this bias is amplified when the IV is weak [14,31].
Interpretation of exposure effects from IV analysis
Researchers may be interested to estimate the average treatment effects over the entire study population [27]. However, it has been argued that the basic assumptions of IV analysis are not sufficient to achieve point estimates for the causal effect of exposure on the outcome, but only estimate upper and lower bounds of this parameter [14,86,87]. To achieve a point estimate of the average causal effect (ACE) over the entire study population, the additional strong assumption of homogeneity of exposure across levels of the IV should be satisfied [52]. Moreover, IVanalysis captures the ATT under the assumption of no effect modification by IV [52]. When exposure effects are not homogeneous across IV levels, under the monotonicity assumption (i.e., the IV affects the treatment deterministically in one direction), the IV estimate quantifies the local average treatment effect (LATE) [88], which is only informative for a subset of the study population, namely those who comply with the IV [27,89-91].
Assessment of IVassumptions
As noted, IV analysis must satisfy three basic assumptions and if these assumptions do not hold, results may be severely biased [3,13]. The first assumption (i.e., the IV is related to exposure) is generally easier to check using available statistical methods than the other two assumptions. The second (IV has no direct effect on outcome) and third (IVis independent of confounders) assumptions are unverifiable or not directly testable as they involve unobservable variables [1,13,18,19,68,76,92]. Some authors proposed circumstantial evidence to support these assumptions [2,5,93,94]. Alternatively, for the third assumption a falsification test based on the standardized difference can be applied [95].
In order to check the first assumption, the F-statistic value from the first-stage linear regression model is widely used although this statistic highly affected by sample size [76,83,85]. There is a rule of thumb that if the F-statistic value is greater than 10, the first assumption holds [13,96,97]. Other measures for the strength of the association between IVand exposure include the first-stage regression coefficient of the IV [50,98] or the R2 of a linear first-stage model [15,78,83], the odds ratio [6,93], or pseudo-R-squared of the first-stage model [76]. When the correlation between IVand exposure is not strong enough, IVanalysis is likely to be biased (weak IV bias, which increases with the weakness of the IV). A weak IV will provide large SEs for the IV estimator [3,13,31,47,99].
We provided an overview of estimation methods of IV analyses, highlighting their strengths and limitations for epidemiological research. These methods share aspects, yet also have some particularities. However, when the IV assumptions are violated, the sample size is small, or IV models are not correctly specified, all methods tend to perform poorly and show biased results.
The methods can be categorized as moments based and semiparametric (e.g. 2SLS, GMM, SMM) or likelihood based (e.g. BPM, 2SLR, LSMM) methods. The moment based methods or semiparametric method are in general less efficient than likelihood based methods. However, likelihood methods are more vulnerable to incorrect modelling assumptions, in which case moment based methods are more robust. In empirical data, although several IV methods can be applicable in the same combination of IV, exposure, and outcome, considering different methods’ assumptions, target parameters being estimated are different, so the interpretations of exposure effects appear different [45]. Therefore, choosing an appropriate IV method requires attention [76].
In order to obtain ACE or LATE or ATT, along with basic assumptions, extra assumptions such as homogeneous exposure effect or monotonicity in case of heterogeneous exposure effect or no effect modification by IV, respectively should be fulfilled. These different assumptions result in the estimation of different causal effects, and hence, researchers should be aware for interpretations of the IVestimates [14].
In randomized trials, the IV of treatment assignment satisfies the assumptions by design, but in observational studies, this is not the case. In the latter situation, subject matter knowledge and theoretical motivations (why is an IVrelated to treatment and unrelated to patients’ characteristics and outcome?) should be given especially regarding the second and third condition underlying the IV method. If the IV is weakly related to exposure and correlated with unmeasured variables, IVmethods may yield biased results [100]. In addition, the main critique of any IV analysis is that the IV may affect the outcome through some pathway other than through the exposure of interest [32]. This condition cannot be verified empirically.
From a methodological perspective, the IV method is a powerful statistical tool, given that a valid IV is present and IV analysis correctly applied. In that case, it can provide a valid estimate in the presence of measured and unmeasured confounding. However, if there is strong confounding effect, it is difficult to find an appropriate IV [13].
A limitation of our study is that we restricted ourselves to IV methods that are commonly used in epidemiologic research. We did not discuss nonparametric and Bayesian IV methods. We refer to the literature for examples of the methods [12,38,86,101-104]. Because of limited space, we did not describe mathematical models with detailed derivation of IVestimators for all methods.
In conclusion, IV analysis is potentially powerful methods to control for confounding (both measured and unmeasured). Some estimation methods (e.g., 2SLS, 2SRI) can be applied in many situations, whereas others (e.g., RE, BPM, 2SLR) can only be applied in a limited number of situations. Irrespective of the methods that are used in a particular study, in order to provide valid interpretation of the exposure effect on the outcome, researchers should be aware of the underlying methodology of the estimation method as well as key assumptions of the IV.
Running Head: Methods for IV estimation
FUNDING
The PROTECT project is supported by the Innovative Medicine Initiative Joint Undertaking (www.imi.europa.eu) under Grant Agreement no 115004, resources of which are composed of financial contribution from the European Union’s Seventh Framework Programme (FP7/2007-2013) and EFPIA companies’ in kind contribution. In the context of the IMI Joint Undertaking (IMI JU), the Department of Pharmacoepidemiology, Utrecht University, also received a direct financial contribution from Pfizer. The views expressed are those of the authors only and not of their respective institution or company.
Conflicts Of Interest: “Olaf Klungel had received unrestricted funding for pharmaco epidemiological research from the Dutch private-public funded Top Institute Pharma (TI Pharma Grant T6.101 Mondriaan).”