## Instrumental Cutsets

### Efficient Identification in Linear SCM

We’re going to be presenting our paper, “Efficient Identification in Linear Structural Causal Models with Instrumental Cutsets”, at NeurIPS 2019. The work is extremely technical, requiring a huge amount of specialized background knowledge, so I’m offering this post as a lightweight introduction to one of our core results that does not require you to dedicate a year of your life towards the nitty-gritty of linear identification.

Let’s start simple. What do the words in the title even mean?

## SCM and Identification

Suppose you’re trying to decide on the effectiveness of a drug by observing all patients who took it, and seeing if they were cured. You are comparing this with patients who took an older drug, to make sure that the new medicine is actually superior. We can draw a “causal graph”, which basically says that the medicine can *cause* patients to be cured, and not the other way around. This is denoted with an arrow from the medicine to the patient’s status:

However, that’s not the full picture. You see, this new drug is very expensive, so only rich people can afford it (this story is based in the USA). It turns out that wealthy people are also more likely to be cured even without treatment, simply because they do not need to keep working while sick, and can relax. This is called *latent confounding* - the amount of money a patient has, which was not gathered as part of the dataset, affects both whether the patient gets medicine, and whether the patient is cured. We denote such an effect with a dashed bidirected arrow between the two nodes in our graph:

The problem of identification is therefore the question: **Given this causal graph, and the dataset, is it possible to find out the causal effect of taking the drug?**

In the graph with latent confounding, the answer is *no* [1] - it is not possible to find out how much of the effect on health was due to taking the drug, and how much was due to the fact that those who were taking it were rich, and therefore more likely to get better anyways!

The goal of randomized clinical trials is therefore to get rid of this confounding - if people get the drug through the flip of a coin rather than the contents of their wallet, we have the situation in the first graph, which is *identifiable*, meaning that we can compute the effect of the drug on cure rates.

Unfortunately, a clinical trial is not always possible, whether due to ethical, monetary, or even feasibility issues. By creating algorithms that determine identifiability in arbitrary graphs, we allow researchers to simply gather as much data as they can, as well as the hypothesized causal relations between the data, and then stuff it all in a computer. The computer will either compute the causal effect, or tell the researchers that they’re out of luck [2]!

As an example, the graph above represents the situation where a doctor makes a treatment recommendation to the patient, sometimes recommending the new drug, and other times the old drug, independently of the patient’s financial status. It turns out that this model is not identifiable without further assumptions - if all the variables are binary, the effect can only be *bounded*, but cannot be uniquely determined [3]. That is, we might be able to say that the actual causal effect lies somewhere between a 50% increase in cure rate and a 10% *decrease*, meaning that we can’t even conclude that the drug isn’t harmful!

## Linear SCM

The model shown above is called the “Instrumental Variable”, or IV, and becomes identifiable once we add a few assumptions about the mechanisms underlying the graph [4]. To demonstrate, we will modify the mechanisms of the example.

Previously, the doctor recommended one drug or the other, the patient likewise took a binary dose (drug 1 or 2), and was either cured or not cured. Instead, suppose the doctor’s recommendation is continuous: the doctor can advocate the drug either lightly, strongly, or anywhere in between. Let’s assign a number to the strength of this advocacy:

Similarly, now the patient chooses how much of the treatment regimen to take:

Next, instead of “cured” and “not cured”, let’s make the result be the count of a biomarker in the patient’s blood:

Finally, let’s assume that the effects are linear. That is, the amount of medicine the patient takes is linearly related to how strongly the doctor believed the medicine would work. Similarly, the biomarkers in the patient’s blood are linearly related to the amount of medicine taken. Our job, then, is to solve for the amount that the blood marker will decrease per day of the treatment regimen. This is represented by $\lambda_{mb}$ in the following system of equations:

Here, $\epsilon$ represent latent error terms, and have different values for each patient (they are random variables), and $\lambda$ are the causal effects, which are constant for all patients. Of course, the only data we have is samples of $(D,M,B)$ - we don’t actually know what any of the $\epsilon$ or $\lambda$ are! Can we solve for $\lambda_{mb}$? Yes we can! We can use the dataset to find the covariances between the variables:

To simplify the math (without loss of generality), let’s assume that $D,M,B$ are normalized, meaning that the data has mean 0. This makes $\sigma_{ab} = \mathbb{E}[AB]$.

In effect, we have turned a dataset of samples $(D,M,B)$ into known values for the covariances $\sigma_{dm},\sigma_{db},\sigma_{mb}$. Now, we can do a bit of mathematical magic:

We can use the fact that $D$ is uncorrelated with $\epsilon_b$, and that the variables have mean 0 to conclude:

As you can see, despite the patients’ choice of drug and their biomarkers being confounded by wealth, we were able to extract the drug’s causal effect *anyways*. This property has made the instrumental variable a cornerstone of observational studies since their introduction, all the way back in 1928 [5].

In general, when you want to find the causal effect between $M$ and $B$, you look for a node ($D$) called the “instrument”, which has all of its paths to $B$ pass through $M$.

## Unconditioned Instrumental Sets

Of course, if a researcher does not find an instrument in their causal graph, it does not mean that their target effect is not identifiable! An example of this is the instrumental set [6]:

Notice that $x_1\rightarrow y$ has no instruments - both $z_1$ and $z_2$ have paths to $y$ through $x_2$. Nevertheless, using the same math as in the previous example, we can get:

This system of equations is solvable for $\lambda_{x_1y}$ if $\det\begin{pmatrix} \sigma_{z_1x_1} & \sigma_{z_1x_2}\\ \sigma_{z_2x_1} & \sigma_{z_2x_2} \end{pmatrix} \ne 0$. An example where this requirement is not satisfied is:

In this example, we can create an identical system of equations, but this system will now be degenerate, meaning that $(\sigma_{z_1x_1},\sigma_{z_1x_2}) = \lambda_{z_1z_2}(\sigma_{z_2x_1},\sigma_{z_2x_2})$. In fact, $\lambda_{x_1y}$ is not identifiable here.

Thankfully, there is a simple condition that is sufficient to determine whether the system is generically solvable with unconditioned instrumental sets. We choose a set of $K$ candidate instruments, as well as $K$ parents of $Y$. If all paths from each of the instruments cross one of the $K$ chosen parents, and we also have a set of $K$ non-intersecting paths from instruments to parents, then we know that all $K$ chosen edges incoming to $Y$ are solvable. To demonstrate, let’s look at a couple examples:

In the first example, $z_1$ and $z_2$ have non-intersecting paths ($z_1\rightarrow x_1$ and $z_2\rightarrow x_2$), and $z_1,z_2$ have all their paths to $y$ cross either $x_1$ or $x_2$. This means that $z_1,z_2$ can be used as an instrumental set to solve for $\lambda_{x_1y}$ and $\lambda_{x_2y}$.

In the second example, $z_2$ has a path to $x_1$, but it can still get to $y$ through $x_2$. $z_1$ cannot help here, because all paths from $z_1$ to $x_2$ would cross $z_2$! This means that we don’t have a instrumental set.

In the final example, both $z_1$ and $z_2$ have nonintersecting paths to $x_1$ and $x_2$, but $z_2$ can still get to $y$ by crossing through $x_3$. Once again, there is no instrumental set here.

## The Match-Block

This is where our paper starts: it isn’t clear how to *efficiently* find an unconditioned instrumental set. If we know *which* parents of the target node ($y$) are part of the subset, then it is possible to find associated instruments using a max-flow [7]. Unfortunately, we don’t generally have this knowledge. Look at the graph here:

In this graph, *only* $\{x_1,x_2,x_5\}$ has an instrumental subset $\{z_1,z_2,z_4\}$. How can we make a computer find it?

One way of approaching this is enumerating all subsets of edges incoming into the target node (i.e. all subsets of $\{x_1,x_2,x_3,x_4,x_5\}$), and then checking if an instrumental set exists to those nodes. This enumeration takes $2^N$ time, where $N$ is the number of edges incoming into that node. In other words, this requires *exponential-time*, and that simply won’t do.

We developed a very simple algorithm that quickly finds the set. Here’s how it works. First, we find a max-flow from *all* the candidate instruments $z$ to *all* the $x$:

Now, we realize that $x_4$ had no flow to it. This means that $x_4$ is not part of *any* instrumental set. Furthermore, if $x_4$ isn’t part of an instrumental set, then none of its ancestors can be instruments (all of them would have a path to $x_4$). We therefore remove $x_4$ and $z_3,z_5$ from candidacy, and run another max-flow:

Once again, $x_3$ had no flow to it, so we disable it, and all of its ancestors (here, its ancestors are already disabled). Finally, we do one more max-flow:

All the remaining $x$ values are still active, meaning that we have found the desired instrumental set!

## Instrumental Cutsets

It turns out that we can get a lot of mileage out of this simple algorithm. You see, most state-of-the-art identification algorithms have a similar structure to the instrumental set, where they needed to enumerate all subsets of edges incoming into a target node. We were able to adapt the match-block to these algorithms, and even extend it into a new method, which can solve for certain edges that no other efficient method we know of can solve! We called our method the “Instrumental Cutset”, and you can read about the details in our full paper on arXiv (forthcoming).

### References

- P. G. Wright,
*Tariff on Animal and Vegetable Oils*. Macmillan Company, New York, 1928. - J. Pearl,
*Aspects Of Graphical Models Connected With Causality*. 1993. - J. Pearl,
*Causality: Models, Reasoning and Inference*. 2000. - C. Brito and J. Pearl, “Generalized Instrumental Variables,” in
*Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence*, 2002, pp. 85–93. - J. D. Angrist, G. W. Imbens, and D. B. Rubin, “Identification of Causal Effects Using Instrumental Variables,”
*Journal of the American Statistical Association*, vol. 91, no. 434, pp. 444–455, 1996. - R. Foygel, J. Draisma, and M. Drton, “Half-Trek Criterion for Generic Identifiability of Linear Structural Equation Models,”
*The Annals of Statistics*, pp. 1682–1713, 2012. - J. Tian and J. Pearl, “A General Identification Condition for Causal Effects,” p. 7, 2002.