A recreational exploration of Naive Bayes Classifier for email spam filter using C++.
Inspired by Tsoding Daily's video video on the subject.
I am using boost library to access standard algorithms.
The general probability formula is give as:
Where:
-
$P(A | B)$ is the probability of$A$ happening given$B$ has happened -
$P(B | A)$ is the probability of$B$ happening given$A$ has happened -
$P(A)$ is the probability of$A$ happening alone -
$P(B)$ is the probability of$B$ happening alone
For a spam filter, we have a set $ C =
With this, the general formula can be transformed as follows:
Where:
-
$D$ stands for document -
$P(C | D)$ is the probability of$C$ happening given$D$ has happened -
$P(D | C)$ is the probability of$D$ happening given$C$ has happened -
$P(C)$ is the probability of$C$ happening alone -
$P(D)$ is the probability of$D$ happening alone