Instrumental Cutsets

Efficient Identification in Linear SCM

September 06, 2019

We’re going to be presenting our paper, “Efficient Identification in Linear Structural Causal Models with Instrumental Cutsets” (with Bryant Chen and Elias Bareinboim), at NeurIPS 2019. The work is quite technical, requiring specialized background knowledge, so I’m offering this post as a lightweight introduction to one of our core results.

Let’s start simple. What do the words in the title even mean?

SCM and Identification

Suppose you’re trying to decide on the effectiveness of a drug by observing all patients who took it, and seeing if they were cured. You are comparing this with patients who took an older drug, to make sure that the new medicine is actually superior. We can draw a “causal graph”, which basically says that the medicine can cause patients to be cured, and being cured does not cause patients to have taken the medicine. This is denoted with an arrow from the medicine to the patient’s status:

However, that’s not the full picture. You see, this new drug is very expensive, so only rich people can afford it (this story is based in the USA). It turns out that wealthy people are also more likely to be cured even without treatment, simply because they do not need to keep working while sick, and can relax. This is called latent confounding - the amount of money a patient has, which was not gathered as part of the dataset, affects both whether the patient gets medicine, and whether the patient is cured. We denote such an effect with a dashed bidirected arrow between the two nodes in our graph:

The problem of identification is therefore the question: Given this causal graph, and the dataset, is it possible to find out the causal effect of taking the drug?

In the graph with latent confounding, the answer is no [1] - it is not possible to find out how much of the effect on health was due to taking the drug, and how much was due to the fact that those who were taking it were rich, and therefore more likely to get better anyways!

The goal of randomized clinical trials is therefore to get rid of this confounding - if people get the drug through the flip of a coin rather than the contents of their wallet, we have the situation in the first graph, which is identifiable, meaning that we can compute the effect of the drug on cure rates.

Unfortunately, a clinical trial is not always possible, whether due to ethical, monetary, or even feasibility issues. By creating algorithms that determine identifiability in arbitrary graphs, we allow researchers to simply gather as much data as they can, as well as the hypothesized causal relations between the data, and then stuff it all in a computer. The computer will either compute the causal effect, or tell the researchers that they’re out of luck [2]!

As an example, the graph above represents the situation where a doctor makes a treatment recommendation to the patient, sometimes recommending the new drug, and other times the old drug, independently of the patient’s financial status. It turns out that this model is not identifiable without further assumptions - if all the variables are binary, the effect can only be bounded, but cannot be uniquely determined [3]. That is, we might be able to say that the actual causal effect lies somewhere between a 50% increase in cure rate and a 10% decrease, meaning that we can’t even conclude that the drug isn’t harmful!

Linear SCM

The model shown above is called the “Instrumental Variable”, or IV, and becomes identifiable once we add a few assumptions about the mechanisms underlying the graph [4]. To demonstrate, let’s modify the above example:

Previously, the doctor recommended one drug or the other, the patient likewise took a binary dose (drug 1 or 2), and was either cured or not cured. Instead, suppose the doctor’s recommendation is continuous: the doctor can advocate the drug either lightly, strongly, or anywhere in between. Let’s assign a number to the strength of this advocacy:

D=how strongly the doctor advocated for the new drug

Similarly, now the patient chooses how much of the treatment regimen to take:

M=how long the patient took the new medicine

Next, instead of “cured” and “not cured”, let’s make the result be the count of a biomarker in the patient’s blood:

B=count of a target biomarker in the patient’s blood

Finally, let’s assume that the effects are linear. That is, the amount of medicine the patient takes is linearly related to how strongly the doctor believed the medicine would work. Similarly, the biomarkers in the patient’s blood are linearly related to the amount of medicine taken. Our job, then, is to solve for the amount that the blood marker will decrease per day of the treatment regimen. This is represented by λmb in the following system of equations:

D=ϵDM=λdmD+ϵmB=λmbM+ϵb ϵm,ϵb correlated

Here, ϵ represent latent error terms, and have different values for each patient (they are random variables), and λ are the causal effects, which are assumed to be identical for all patients. Of course, the only data we have is samples of (D,M,B) - we don’t actually know what any of the ϵ or λ are! Can we solve for λmb? Yes we can! We can use the dataset to find the covariances between the variables:

σab=E[(AE[A])(BE[B])]=E[AB]E[A]E[B]

To simplify the math (without loss of generality), let’s assume that D,M,B are normalized, meaning that the data has mean 0. This makes σab=E[AB].

In effect, we have turned a dataset of samples (D,M,B) into known values for the covariances σdm,σdb,σmb. Now, we can do a bit of mathematical magic:

σdb=E[DB]=E[D(λmbM+ϵb)]=λmbE[DM]+E[Dϵb]

We can use the fact that D is uncorrelated with ϵb, and that the variables have mean 0 to conclude:

σdb=λmbσdmλmb=σdbσdm

As you can see, despite the patients’ choice of drug and their biomarkers being confounded by wealth, we were able to extract the drug’s causal effect anyways. This property has made the instrumental variable a cornerstone of observational studies since their introduction, all the way back in 1928 [5].

In general, when you want to find the causal effect between M and B, you look for a node (D) called the “instrument”, which has all of its paths to B pass through M.

Unconditioned Instrumental Sets

Of course, if a researcher does not find an instrument in their causal graph, it does not mean that their target effect is not identifiable! An example of this is the instrumental set [6]:

Z1=ϵz1Z2=ϵz2X1=λz1x1Z1+λz2x1Z2+ϵx1X2=λz1x2Z1+λz2x2Z2+ϵx2Y=λx1yX1+λx2yX2+ϵy ϵx1,ϵy correlatedϵx2,ϵy correlated

Notice that x1y has no instruments - both z1 and z2 have paths to y through x2. Nevertheless, using the same math as in the previous example, we can get:

σz1y=σz1x1λx1y+σz1x2λx2yσz2y=σz2x1λx1y+σz2x2λx2y

This system of equations is solvable for λx1y if det(σz1x1σz1x2σz2x1σz2x2)0. An example where this requirement is not satisfied is:

Z1=ϵz1Z2=λz1z2Z1+ϵz2X1=λz2x1Z2+ϵx1X2=λz2x2Z2+ϵx2Y=λx1yX1+λx2yX2+ϵy ϵx1,ϵy correlatedϵx2,ϵy correlated

In this example, we can create an identical system of equations, but this system will now be degenerate, meaning that (σz1x1,σz1x2)=λz1z2(σz2x1,σz2x2). In fact, λx1y is not identifiable here.

Thankfully, there is a simple condition that is sufficient to determine whether the system is generically solvable with unconditioned instrumental sets. We choose two sets of the same size: one of candidate instruments, and one of the parents of Y. If all paths from each of the instruments to Y cross one of the chosen parents, and there is a way to set up non-intersecting paths from all the chosen instruments to the chosen parents, then we know that all chosen edges incoming to Y are solvable. To demonstrate, let’s look at a couple examples:

In the first example, z1 and z2 have non-intersecting paths (z1x1 and z2x2), and z1,z2 have all their paths to y cross either x1 or x2. This means that z1,z2 can be used as an instrumental set to solve for λx1y and λx2y.

In the second example, z2 has a path to x1, but it can still get to y through x2. z1 cannot help here, because all paths from z1 to x2 would cross z2! This means that we don’t have a instrumental set.

In the final example, both z1 and z2 have nonintersecting paths to x1 and x2, but z2 can still get to y by crossing through x3. Once again, there is no instrumental set here.

The Match-Block

This is where our paper starts: it isn’t clear how to efficiently find an unconditioned instrumental set. If we know which parents of the target node (y) are part of the set, then it is possible to find associated instruments using a max-flow [7]. Unfortunately, we don’t generally have this knowledge. Look at the graph here:

In this graph, only {x1,x2,x5} has an instrumental subset {z1,z2,z4}. How can we make a computer find it?

One way of approaching this is enumerating all subsets of edges incoming into the target node (i.e. all subsets of {x1,x2,x3,x4,x5}), and then checking if an instrumental set exists to those nodes. This enumeration takes 2N time, where N is the number of edges incoming into that node - a node with 50 parents would require checking 1,125,899,906,842,624 different combinations. In other words, this requires exponential-time, which is equivalent to “can’t be used for large problems”.

We developed a very simple algorithm that quickly finds the set without needing to enumerate all possibilities. Here’s how it works. First, we find a max-flow from all the candidate instruments z to all the x:

Now, we notice that x4 had no flow to it. We proved that this is sufficient to guarantee that x4 is not part of any instrumental set. Furthermore, if x4 isn’t part of an instrumental set, then none of its ancestors can be instruments (all of them would have a path to x4). We therefore remove x4 and z3,z5 from candidacy, and run another max-flow:

Once again, x3 had no flow to it, so we disable it, and all of its ancestors (here, its ancestors are already disabled). Finally, we do one more max-flow:

All the remaining x values are still active, meaning that we have found the desired instrumental set!

Instrumental Cutsets

It turns out that we can get a lot of mileage out of this simple procedure. You see, most state-of-the-art identification algorithms have a similar structure to the instrumental set, where they needed to enumerate all subsets of edges incoming into a target node. We were able to adapt the match-block to these algorithms, and even extend it into a new method, which can solve for certain edges that no other efficient method we know of can solve! We called our method the “Instrumental Cutset”, and you can read about the details in our full paper on arXiv.


References

  1. Pearl, J. (2000). Causality: Models, Reasoning and Inference.
  2. Tian, J., & Pearl, J. (2002). A General Identification Condition for Causal Effects. 7.
  3. Pearl, J. (1993). Aspects Of Graphical Models Connected With Causality.
  4. Angrist, J. D., Imbens, G. W., & Rubin, D. B. (1996). Identification of Causal Effects Using Instrumental Variables. Journal of the American Statistical Association, 91(434), 444–455. https://doi.org/10.2307/2291629
  5. Wright, P. G. (1928). Tariff on Animal and Vegetable Oils. Macmillan Company, New York.
  6. Brito, C., & Pearl, J. (2002). Generalized Instrumental Variables. Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence, 85–93.
  7. Foygel, R., Draisma, J., & Drton, M. (2012). Half-Trek Criterion for Generic Identifiability of Linear Structural Equation Models. The Annals of Statistics, 1682–1713.