July 17, 2024

Seminar Series

November 5, 1999

Topic: Causal Discovery from Non-Experimental Data
Speaker: David Heckerman, Microsoft Research


Statisticians, in large part, make observations and use these observations to make predictions. For example, based on a statistical study, one can conclude that, if you smoke, then it is more likely that you will get lung cancer than if you don't smoke. Unfortunatley, this sort of information is not all that useful to--say--health care professionals. What they want to know is, if you CHANGE your behavior and start smoking, will you increase your chances of getting lung cancer? It turns out that the notion of cause and effect lies at the heart of such questions. The tricky thing about cause is that it is not correlation. Statisticians have been saying this for over a hundred years. So how do we discover causal relationships? One method that has been used for almost a century is the randomized trial. If we want to figure out whether or not smoking causes lung cancer, we take--say--one hundred people, make half of them smoke, the other half not smoke, and see how many in each group get lung cancer. Of course, we can't really do this because it's unethical. But doctors, patients, and politicians are beginning to realize that randomized trials to test drugs and new surgical procedures are just about as unethical. After all, if you go to all the expense of testing a new drug, you probably think it is better than whatever is available already. Why should we take half the patients that need that drug, and prevent them from taking it? In my talk, I will discuss statistically oriented methods for discovering cause and effect without the need for randomized trials. These approaches are based on graphical models called Bayes nets or DAGs. I will illustrate the methods on several real examples.