http://bayes.cs.ucla.edu/BOOK-2K/
In Causality, Judea Pearl puts forth a technical definition of causality, being the knowledge needed to predict the consequences of changing the world locally. Pearl uses graphs to describe locality – each observable is connected to a fixed number of other observables in such a graph. Seen this way, the world is a huge web, and controlled experiments snip off bits of that web to create closed systems that can then be fully characterized. Pearl shows how, even without any snipping, there are scenarios where sub-webs can be fully characterized from just observational studies. Two central concepts, d-separation and the back-door criterion, underlie most of the power of causal analysis, and would not take more than an hour for anyone familiar with probability.
edit (1/19/2009):
I just finished the book, and have to say I am pretty happy with it for acquainting me with the concept of exogeneity, and the existence of the field of epidemiology. I am also imbued with a new respect for randomized experiments, because of their amazing ability to control for factors not even known.
I do, however, believe that even though Bayesian networks have more fundamental consequences for causality than for sampling, that in noisy environments its significance still lies in its sampling behavior. The type of factorisable distribution represented by Bayesian networks can be sampled by sampling its constituents, which are lower dimension and converge faster.
Good theories take believable assumptions and with them generate unexpected consequences. The form which assumptions can take is varied, and most of Causality is based on assumptions of locality of mechanism, which characterize the knowledge we have quite well. Near the end of the book, however, more time is spent talking about two more assumptions which are easy to make and yet quite useful – monotonicity and exogeneity.