Bayesian Belief Networks

Definition

A Bayesian network is a probabilistic model that represents a set of random variables and their conditional dependencies. They have extensive applications in areas such as

Law
For example, we know the probabilities of finding a certain piece of evidence given the defendant is guilty Pr(E|G) and innocent Pr(E|~G). But we need to infer the probability he's guilty given the evidence Pr(G|E).
Medicine
For example, we know the probabilities of a false positive and a false negative result on a test for a disease: Pr(test|~disease) and Pr(~test|disease). But we need to infer the probability the patient has the disease given a positive test result Pr(disease|test). - Many other diverse areas such as biology, sports betting, engineering and risk analysis.

History

Judea Pearl, professor of computer science and statistics at UCLA, introduced the modern idea of a Bayesian net in his 1985 paper: Bayesian Networks: A Model of Self-Activated Memory for Evidential Reasoning.

Seventy two years prior, the US Jurist John Henry Wigmore developed an informal version of the Bayesian belief network for analyzing evidence in court trials in the June 1913 article "The problem of proof" found in the Illinois law review, Volume 8 Number 2, pp. 77-103. Wigmore charts are still used to this day by lawyers to analyze evidence in court cases.

Harvard Professor Joel Blitzstein said, "The soul of statistics is conditioning." The precise mathematical method for "inverting" a conditional probability to determine the probability of an unobserved variable is Bayes Law, developed by the Reverend Thomas Bayes in his Essay towards solving a Problem in the Doctrine of Chances in 1763.

More than two millenium prior, the rhethoritician Gorgias (483—375 B.C.E.) presented the archetype of a legal argument based on the conditional probabilities connecting evidence to guilt. This speech is entitled The Defense of Palamedes. I have an essay on it: On Gorgias' Defense of Palamedes. The centrality of Palamedes in this seminal work is yet another important reason this software bears his name!

For modern, up-to-date approaches to graphically analyzing legal arguments using Bayesian networks, see

Fallacies

One motivation for automating Bayesian inference is that many people find it difficult to reason about conditional probabilities, and there are many common fallacies. Two prominent fallacies are

In the paper Probabilistic reasoning in clinical medicine: Problems and opportunities by David M. Eddy (1982), a hundred physicians were asked about the probability of having a disease based on a positive test result; 95% of these doctors committed the inverse fallacy in their answer. The Sally Clark case is a recent example of an expert witness committing the prosecutor's fallacy. See

Random variables

Palamedes allows a user to input relationships among a set of random variables. It then can answer probabilistic queries about them.

Right now, Palamedes' Bayes net model only supports binary random variables, e.g., guilty and ~guilty (meaning "not guilty" or "innocent"). In a future version, random variables will be able to take arbitrary values, e.g., Weather=snow, Weather=rain, or Weather=sunny.

Probability statements

Probabilities are input or queried into the system via statements of the form Pr(Conjunction1 of random variables | Conjunction2 of random variables). The bar | (meaning "given") and the second conjunction are optional. Here are a few examples:

CommandNotes
Pr(guilty) gets value0
Example of inputting an unconditional probability
Pr(evidence1|~guilty) gets value1
Example of inputting a conditional probability
Pr(guilty|evidence1,evidence2,evidence3)
Example of querying a conditional probability involving a conjunction of random variables

Conjunctions between random variables may be input using commas; the word "and" or the symbol &&; or the word "intersect" or the symbol .

Implementation

Exact and approximate methods exist for inferring unobserved variables in Bayesian networks. Common exact methods include exponential enumeration and variable elimination. Approximate methods include importance sampling, belief propagation, and stochastic MCMC simulation.

Palamedes uses an exact method that takes an AI/logic programming approach. It uses pattern matching to match user inputs against mathematical rules for making deductions until the current goal is achieved. The rules are simply what you would find in any probability and statistics textbook about conditional probabilities:

To see a table of all of the deductions made by the system, use the command bayes(). Note: These deductions are memoized. If you need to clear them because you've added new info or changed values, simply pass an argument to the bayes function, e.g., bayes(0).

This implementation is inspired by Ivan Bratko's interpreter for belief networks, figure 15.11, in the book Prolog Programming for Artificial Intelligence. It has comparable performance to variable elimination.

Comparison

Palamedes is one of many software applications available for Bayesian analysis.

Alternative Software

There are two main groups of alternative software:

How Palamedes stacks up against the competition

As a Bayesian calculator, Palamedes doesn't have graphical input or output. Its main limitation stems from the exact inference method it implements, which has computational complexity O(k*2^k). Approximate methods are needed to deal with large belief networks. In a future version, Palamedes will implement an MCMC method for dealing with large nets. Currenty, Palamedes only supports binary random variables, another limitation that will get relaxed in a future version.

As a Bayesian calculator, Palamedes still has several advantages. The input uses the same syntax you might see in a math textbook. And it will actually show you the derivations it made to reach its conclusions if you call the bayes() function. That's because the algorithm it uses is a rule-based AI system.

But Palamedes doesn't focus on being merely a Bayesian inference engine. It also has a dice roller similar to Troll. And it has many built-in probability and statistics functions, though nothing like R. It can even be used as a desktop calculator. And in the future, it will get computer algebra capabilities.

Additionally, Palamedes is Turing Complete. The language itself is array-based, rather like APL, though nowhere near as cryptic hopefully. It's as easy to manipulate an array or matrix as it is to manipulate a number.

Palamedes' main advantage is that it is written in pure JavaScript, so it runs in a web browser or at a command-prompt using Node.js. It is 100% free and open-source. And it has a relatively small footprint, only a couple hundred KB as opposed to 11.5-13.0MB for OpenBUGS or OpenMarkov.

Examples

Example 1

The following example of legal reasoning in a court case example comes from the paper Teaching an Application of Bayes' Rule for Legal Decision-Making: Measuring the Strength of Evidence, pp. 10-12.

CommandResults
/* 3.1.1 Calculation of Posterior Probability after Evidence 1 (E₁), p. 10 */
Pr( G)                  ← 0.5
Pr(~G)                  NB. Should be 0.5
Pr(E₁|G)                ← 1
Pr(E₁|~G)               ← 0.45
Pr(G|E₁)                NB. Should be 0.69
/* 3.2.1 Calculation of Posterior Probability after the Second Piece of Evidence (E₂), p. 11 */
Pr(~G|E₁)               NB. Should be 0.31
Pr(E₂|G and E₁)         ← 1
Pr(E₂|~G and E₁)        ← 0.21
Pr(G|E₁ and E₂)         NB. Should be 0.914
/* 3.3.1 Calculation of Posterior Probability after Evidence 3 (E₃), p. 12 */
Pr(~G|E₁ and E₂)        NB. Should be 0.086
Pr(E₃|G and E₁ and E₂)  ← 1
Pr(E₃|~G and E₁ and E₂) ← 0.017
Pr(G|E₁ and E₂ and E₃)  NB. Should be 0.998
Pr(G) ← 0.5
Pr(~G) → 0.5
Pr(E₁ | G) ← 1
Pr(E₁ | ~G) ← 0.45
Pr(G | E₁) → 0.6896551724137931

Pr(~G | E₁) → 0.31034482758620685
Pr(E₂ | E₁ ∩ G) ← 1
Pr(E₂ | E₁ ∩ ~G) ← 0.21
Pr(G | E₁ ∩ E₂) → 0.9136592051164916

Pr(~G | E₁ ∩ E₂) → 0.08634079488350843
Pr(E₃ | E₁ ∩ E₂ ∩ G) ← 1
Pr(E₃ | E₁ ∩ E₂ ∩ ~G) ← 0.017
Pr(G | E₁ ∩ E₂ ∩ E₃) → 0.998396076702777

Example 2

The following example comes from Figure 14.2 on page 512 in the book Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig.

Figure 14.2 from Artificial Intelligence--A Modern Approach (AIMA) by Norvig

CommandResults
/* Input probability tables */
Pr(JohnCalls| Alarm) gets 0.90
Pr(JohnCalls|~Alarm) gets 0.05
Pr(MaryCalls| Alarm) gets 0.70
Pr(MaryCalls|~Alarm) gets 0.01
Pr(Alarm| Burglary and  Earthquake) gets 0.95
Pr(Alarm| Burglary and ~Earthquake) gets 0.94
Pr(Alarm|~Burglary and  Earthquake) gets 0.29
Pr(Alarm|~Burglary and ~Earthquake) gets 0.001
Pr(Burglary) gets 0.001
Pr(Earthquake) gets 0.002

/* Query the system */
Pr(JohnCalls, MaryCalls, Alarm, ~Burglary, ~Earthquake)  NB. Should be 0.000628
Pr(Burglary | JohnCalls, MaryCalls)  NB. Should be 0.28

Pr(JohnCalls | Alarm) ← 0.90
Pr(JohnCalls | ~Alarm) ← 0.05
Pr(MaryCalls | Alarm) ← 0.70
Pr(MaryCalls | ~Alarm) ← 0.01
Pr(Alarm | Burglary ∩ Earthquake) ← 0.95
Pr(Alarm | Burglary ∩ ~Earthquake) ← 0.94
Pr(Alarm | Earthquake ∩ ~Burglary) ← 0.29
Pr(Alarm | ~Burglary ∩ ~Earthquake) ← 0.001
Pr(Burglary) ← 0.001
Pr(Earthquake) ← 0.002

Pr(Alarm ∩ JohnCalls ∩ MaryCalls ∩ ~Burglary ∩ ~Earthquake) → 0.0006281112599999999
Pr(Burglary | JohnCalls ∩ MaryCalls) → 0.28417183536439294

In order to see the deductions made by the system in Example 2, use the bayes() command. You'll see the following table:

bayes() → 
                                                         key     value
    Alarm ∩ JohnCalls ∩ MaryCalls ∩ ~Burglary ∩ ~Earthquake|     Pr(JohnCalls) * Pr(Alarm ∩ MaryCalls ∩ ~Burglary ∩ ~Earthquake | JohnCalls)
       Alarm ∩ MaryCalls ∩ ~Burglary ∩ ~Earthquake|JohnCalls     Pr(MaryCalls | JohnCalls) * Pr(Alarm ∩ ~Burglary ∩ ~Earthquake | JohnCalls ∩ MaryCalls)
       Alarm ∩ ~Burglary ∩ ~Earthquake|JohnCalls ∩ MaryCalls     Pr(Alarm | JohnCalls ∩ MaryCalls) * Pr(~Burglary ∩ ~Earthquake | Alarm ∩ JohnCalls ∩ MaryCalls)
                                                      Alarm|     Pr(Alarm | Burglary ∩ Earthquake) * Pr(Burglary ∩ Earthquake) + Pr(Alarm | Burglary ∩ ~Earthquake) * Pr(Burglary ∩ ~Earthquake) + Pr(Alarm | Earthquake ∩ ~Burglary) * Pr(Earthquake ∩ ~Burglary) + Pr(Alarm | ~Burglary ∩ ~Earthquake) * Pr(~Burglary ∩ ~Earthquake)
                                      Alarm|Alarm ∩ Burglary     1
                                     Alarm|Alarm ∩ JohnCalls     1
                         Alarm|Alarm ∩ JohnCalls ∩ ~Burglary     1
           Alarm|Alarm ∩ JohnCalls ∩ ~Burglary ∩ ~Earthquake     1
                                     Alarm|Alarm ∩ ~Burglary     1
                       Alarm|Alarm ∩ ~Burglary ∩ ~Earthquake     1
                                              Alarm|Burglary     Pr(Alarm | Burglary ∩ Earthquake) * Pr(Burglary ∩ Earthquake | Burglary) + Pr(Alarm | Burglary ∩ ~Earthquake) * Pr(Burglary ∩ ~Earthquake | Burglary) + Pr(Alarm | Earthquake ∩ ~Burglary) * Pr(Earthquake ∩ ~Burglary | Burglary) + Pr(Alarm | ~Burglary ∩ ~Earthquake) * Pr(~Burglary ∩ ~Earthquake | Burglary)
                                 Alarm|Burglary ∩ Earthquake     0.95
                                  Alarm|Burglary ∩ JohnCalls     Pr(Alarm | Burglary) * Pr(JohnCalls | Alarm ∩ Burglary) / Pr(JohnCalls | Burglary)
                                Alarm|Burglary ∩ ~Earthquake     0.94
                                Alarm|Earthquake ∩ ~Burglary     0.29
                                             Alarm|JohnCalls     Pr(Alarm) * Pr(JohnCalls | Alarm) / Pr(JohnCalls)
                                 Alarm|JohnCalls ∩ MaryCalls     Pr(Alarm | JohnCalls) * Pr(MaryCalls | Alarm ∩ JohnCalls) / Pr(MaryCalls | JohnCalls)
                                             Alarm|~Burglary     Pr(Alarm | Burglary ∩ Earthquake) * Pr(Burglary ∩ Earthquake | ~Burglary) + Pr(Alarm | Burglary ∩ ~Earthquake) * Pr(Burglary ∩ ~Earthquake | ~Burglary) + Pr(Alarm | Earthquake ∩ ~Burglary) * Pr(Earthquake ∩ ~Burglary | ~Burglary) + Pr(Alarm | ~Burglary ∩ ~Earthquake) * Pr(~Burglary ∩ ~Earthquake | ~Burglary)
                               Alarm|~Burglary ∩ ~Earthquake     0.001
                                      Burglary ∩ Earthquake|     Pr(Burglary) * Pr(Earthquake | Burglary)
                              Burglary ∩ Earthquake|Burglary     Pr(Burglary | Burglary) * Pr(Earthquake | Burglary)
                             Burglary ∩ Earthquake|~Burglary     Pr(Burglary | ~Burglary) * Pr(Earthquake | Burglary ∩ ~Burglary)
                                     Burglary ∩ ~Earthquake|     Pr(Burglary) * Pr(~Earthquake | Burglary)
                             Burglary ∩ ~Earthquake|Burglary     Pr(Burglary | Burglary) * Pr(~Earthquake | Burglary)
                            Burglary ∩ ~Earthquake|~Burglary     Pr(Burglary | ~Burglary) * Pr(~Earthquake | Burglary ∩ ~Burglary)
                                                   Burglary|     0.001
                                           Burglary|Burglary     1
                                          Burglary|JohnCalls     Pr(Burglary) * Pr(JohnCalls | Burglary) / Pr(JohnCalls)
                              Burglary|JohnCalls ∩ MaryCalls     Pr(Burglary | JohnCalls) * Pr(MaryCalls | Burglary ∩ JohnCalls) / Pr(MaryCalls | JohnCalls)
                                          Burglary|~Burglary     0
                                     Earthquake ∩ ~Burglary|     Pr(~Burglary) * Pr(Earthquake | ~Burglary)
                             Earthquake ∩ ~Burglary|Burglary     Pr(~Burglary | Burglary) * Pr(Earthquake | Burglary ∩ ~Burglary)
                            Earthquake ∩ ~Burglary|~Burglary     Pr(~Burglary | ~Burglary) * Pr(Earthquake | ~Burglary)
                                                 Earthquake|     0.002
                                         Earthquake|Burglary     Pr(Earthquake)
                             Earthquake|Burglary ∩ ~Burglary     Pr(Earthquake)
                                        Earthquake|~Burglary     1 - Pr(~Earthquake | ~Burglary)
                                                  JohnCalls|     Pr(JohnCalls | Alarm) * Pr(Alarm) + Pr(JohnCalls | ~Alarm) * Pr(~Alarm)
                                             JohnCalls|Alarm     0.90
                                  JohnCalls|Alarm ∩ Burglary     Pr(JohnCalls | Alarm) * Pr(Alarm | Alarm ∩ Burglary) + Pr(JohnCalls | ~Alarm) * Pr(~Alarm | Alarm ∩ Burglary)
                                 JohnCalls|Alarm ∩ ~Burglary     Pr(JohnCalls | Alarm) * Pr(Alarm | Alarm ∩ ~Burglary) + Pr(JohnCalls | ~Alarm) * Pr(~Alarm | Alarm ∩ ~Burglary)
                   JohnCalls|Alarm ∩ ~Burglary ∩ ~Earthquake     Pr(JohnCalls | Alarm) * Pr(Alarm | Alarm ∩ ~Burglary ∩ ~Earthquake) + Pr(JohnCalls | ~Alarm) * Pr(~Alarm | Alarm ∩ ~Burglary ∩ ~Earthquake)
                                          JohnCalls|Burglary     Pr(JohnCalls | Alarm) * Pr(Alarm | Burglary) + Pr(JohnCalls | ~Alarm) * Pr(~Alarm | Burglary)
                                            JohnCalls|~Alarm     0.05
                                             MaryCalls|Alarm     0.70
                                 MaryCalls|Alarm ∩ JohnCalls     Pr(MaryCalls | Alarm) * Pr(Alarm | Alarm ∩ JohnCalls) + Pr(MaryCalls | ~Alarm) * Pr(~Alarm | Alarm ∩ JohnCalls)
                     MaryCalls|Alarm ∩ JohnCalls ∩ ~Burglary     Pr(MaryCalls | Alarm) * Pr(Alarm | Alarm ∩ JohnCalls ∩ ~Burglary) + Pr(MaryCalls | ~Alarm) * Pr(~Alarm | Alarm ∩ JohnCalls ∩ ~Burglary)
       MaryCalls|Alarm ∩ JohnCalls ∩ ~Burglary ∩ ~Earthquake     Pr(MaryCalls | Alarm) * Pr(Alarm | Alarm ∩ JohnCalls ∩ ~Burglary ∩ ~Earthquake) + Pr(MaryCalls | ~Alarm) * Pr(~Alarm | Alarm ∩ JohnCalls ∩ ~Burglary ∩ ~Earthquake)
                              MaryCalls|Burglary ∩ JohnCalls     Pr(MaryCalls | Alarm) * Pr(Alarm | Burglary ∩ JohnCalls) + Pr(MaryCalls | ~Alarm) * Pr(~Alarm | Burglary ∩ JohnCalls)
                                         MaryCalls|JohnCalls     Pr(MaryCalls | Alarm) * Pr(Alarm | JohnCalls) + Pr(MaryCalls | ~Alarm) * Pr(~Alarm | JohnCalls)
                                            MaryCalls|~Alarm     0.01
                                                     ~Alarm|     1 - Pr(Alarm)
                                     ~Alarm|Alarm ∩ Burglary     0
                                    ~Alarm|Alarm ∩ JohnCalls     0
                        ~Alarm|Alarm ∩ JohnCalls ∩ ~Burglary     0
          ~Alarm|Alarm ∩ JohnCalls ∩ ~Burglary ∩ ~Earthquake     0
                                    ~Alarm|Alarm ∩ ~Burglary     0
                      ~Alarm|Alarm ∩ ~Burglary ∩ ~Earthquake     0
                                             ~Alarm|Burglary     1 - Pr(Alarm | Burglary)
                                 ~Alarm|Burglary ∩ JohnCalls     1 - Pr(Alarm | Burglary ∩ JohnCalls)
                                            ~Alarm|JohnCalls     1 - Pr(Alarm | JohnCalls)
                                    ~Burglary ∩ ~Earthquake|     Pr(~Burglary) * Pr(~Earthquake | ~Burglary)
       ~Burglary ∩ ~Earthquake|Alarm ∩ JohnCalls ∩ MaryCalls     Pr(~Burglary | Alarm ∩ JohnCalls ∩ MaryCalls) * Pr(~Earthquake | Alarm ∩ JohnCalls ∩ MaryCalls ∩ ~Burglary)
                            ~Burglary ∩ ~Earthquake|Burglary     Pr(~Burglary | Burglary) * Pr(~Earthquake | Burglary ∩ ~Burglary)
                           ~Burglary ∩ ~Earthquake|~Burglary     Pr(~Burglary | ~Burglary) * Pr(~Earthquake | ~Burglary)
                                                  ~Burglary|     1 - Pr(Burglary)
                                             ~Burglary|Alarm     Pr(~Burglary) * Pr(Alarm | ~Burglary) / Pr(Alarm)
                                 ~Burglary|Alarm ∩ JohnCalls     Pr(~Burglary | Alarm) * Pr(JohnCalls | Alarm ∩ ~Burglary) / Pr(JohnCalls | Alarm)
                     ~Burglary|Alarm ∩ JohnCalls ∩ MaryCalls     Pr(~Burglary | Alarm ∩ JohnCalls) * Pr(MaryCalls | Alarm ∩ JohnCalls ∩ ~Burglary) / Pr(MaryCalls | Alarm ∩ JohnCalls)
                                          ~Burglary|Burglary     0
                                         ~Burglary|~Burglary     1
       ~Earthquake|Alarm ∩ JohnCalls ∩ MaryCalls ∩ ~Burglary     Pr(~Earthquake | Alarm ∩ JohnCalls ∩ ~Burglary) * Pr(MaryCalls | Alarm ∩ JohnCalls ∩ ~Burglary ∩ ~Earthquake) / Pr(MaryCalls | Alarm ∩ JohnCalls ∩ ~Burglary)
                   ~Earthquake|Alarm ∩ JohnCalls ∩ ~Burglary     Pr(~Earthquake | Alarm ∩ ~Burglary) * Pr(JohnCalls | Alarm ∩ ~Burglary ∩ ~Earthquake) / Pr(JohnCalls | Alarm ∩ ~Burglary)
                               ~Earthquake|Alarm ∩ ~Burglary     Pr(~Earthquake | ~Burglary) * Pr(Alarm | ~Burglary ∩ ~Earthquake) / Pr(Alarm | ~Burglary)
                                        ~Earthquake|Burglary     1 - Pr(Earthquake | Burglary)
                            ~Earthquake|Burglary ∩ ~Burglary     1 - Pr(Earthquake | Burglary ∩ ~Burglary)
                                       ~Earthquake|~Burglary     1 - Pr(Earthquake)

To clear these deductions, use the command bayes(0).

Example 3

This example comes from an AI Lecture on YouTube about Bayesian networks, namely UNH CS 730 by Wheeler Ruml. Here is a diagram:

Bayes net headache example, explaining away by Wheeler Ruml UNH CS 730

And here is the corresponding Palamedes code:

/* Bayes net headache example, explaining away by Wheeler Ruml UNH CS 730 */
/* https://www.youtube.com/watch?v=uM34RlzxfOk */
/* Note that "F" stands for Fudge dice in Palamedes, so we spell out the random variable "Fighting" */
Pr(Fighting) gets 0.02
Pr(Cancer) gets 0.1
Pr(Bruised|Fighting) gets 0.9
Pr(Bruised|~Fighting) gets 0.001
Pr(Headache|Fighting,Cancer) gets 0.99
Pr(Headache|Fighting,~Cancer) gets 0.9
Pr(Headache|~Fighting,Cancer) gets 0.75
Pr(Headache|~Fighting,~Cancer) gets 0.05
Pr(Cancer|Headache) NB. about 0.5525 from video at 16 min 34 sec
Pr(Cancer|Headache,Bruised) NB. about 0.112 from video at 26 min 27 sec

The system responds with the correct answers:

Pr(Cancer | Headache) → 0.5558992487847989
Pr(Cancer | Bruised ∩ Headache) → 0.11259375227554067


AUTHORS BUGS DISCUSSION LICENSE NAME NEWS README SOURCE TODO TRY VERSION

Last update: Fri Sep 23 2016