
Probability is the branch of pure mathematics which corresponds to statistics in applied mathematics. It began as the study of chance, and especially of games of chance, in 16th century France with Blaise Pascal (1623  1662), who was paid by a nobleman to find out why he always lost when he made a particular bet (the answer was that he was giving odds of evens on a bet where the odds were really only 47% in his favour). Pascal went on to discover â€˜hisâ€™ famous triangle (which was in fact known in other cultures long before), which makes calculations of this sort of probability very easy (calculations of the probability distribution of repetitions of a single event, like the probability of throwing exactly six sixes in twentyone throws of a die). The problem with Pascal\'s work, and that of the probabilists who followed him, was that their work was purely empirical; they used â€˜common senseâ€™ ideas of probabilities in mathematical models of real situations (the idea that an unbiased coin should have equal probability of showing heads or tails when tossed, for example), so that what they were doing was not really probability or statistics, but a mixture of both. This dependence on observation made many 19thcentury mathematicians reluctant to allow probability a place as a branch of mathematics, rather than part of physics.
This did not mean that probability was without its successes. Among these were Bayes\'s theorem, the basis of Bayesian statistics, which was a result long considered controversial. (Because probability was a science rather than a branch of mathematics, his results were open to question in a way that no mathematical theorem would be.) Probability also made possible advances in calculation; an analysis of the patterns made by throwing sticks on a floor marked with parallel lines gave a better way, for example, to calculate the number pi than had ever been known before. Advances were made in the applications of probability to the world of insurance, where it was needed to calculate the ratio of premiums to claims needed for companies to make a profit; thus actuarial statistics was born.
Probability was finally given a position in the field of mathematics when it was given an axiomatization by , Pafnutiy Lvovitch Chebyshev (1821  1894). His analysis was quite revolutionary. Probability all takes place within â€˜probability spacesâ€™, which have three parts. There is the â€˜sample spaceâ€™, which is the set of all possible outcomes, then the set of combinations of events, and the â€˜probability measureâ€™, which assigns a number p(A) between 0 and 1 to each combination A of events from the sample space. The numbers that are assigned have to obey certain rules; for example, the probability that none of the events in the sample space happens is 0, and if A is a set of events, and ~ A is the set of all events not in A, then p(~A) = 1  p(A). Also, if A and B are mutually exclusive events (that is, A and B cannot both happen) p(A or B) = p(A) + p(B). From these simple rules the whole of probability can be derived. As usual in pure mathematics, no questions are asked about the nature of the objects in the probability space: they could be obtained by experiment, they could be given in a problem or be the products of thought and hypothesis; the only important thing is that they obey the rules. (Deciding what the objects in the sample space should be is really the role of statistics.)
One of the most important consequences of the axioms of probability is the Law of Large Numbers, which probabilists had used for many years before Chebyshev without actually realizing that it needed to be proved. In concrete terms, it states that in a large number of identical tests, the proportion of successful results approaches the probability of a successful result in each test. So, for example, tossing a coin two million times will result in about one million heads. (It is not true for small numbers; for nine or ten tosses, seven heads could quite often happen.) This is the result that justifies actuarial statistics, which relies on the data provided by a large number of tests (out of a population of seventy million, for example, the proportion who die in one year from lung cancer) to calculate premiums using the probabilities which are accurate because of the law of large numbers. SMcL
Further reading I. Hacking, The Emergence of Probability. 
