- Formula and equations
- Differences with the binomial distribution
- Examples
- Practical applications
- Approximating the binomial distribution with the Poisson distribution
- Solved exercises
- Exercise 1
- Solution c)
- Exercise 2
- Solution to)
- References
The Poisson distribution is a discrete probability distribution, through which it is possible to know the probability that, within a large sample size and during a certain interval, an event whose probability is small will occur.
The Poisson distribution can often be used in place of the binomial distribution, as long as the following conditions are met: large sample and small probability.
Figure 1. Graph of the Poisson distribution for different parameters. Source: Wikimedia Commons.
Siméon-Denis Poisson (1781-1840) created this distribution that bears his name, very useful when it comes to unpredictable events. Poisson published his results in 1837, a work of investigation on the probability of occurrence of erroneous criminal sentences.
Later, other researchers adapted the distribution in other areas, for example, the number of stars that could be found in a certain volume of space, or the probability that a soldier would die from the kick of a horse.
Formula and equations
The mathematical form of the Poisson distribution is as follows:
- μ (also sometimes denoted as λ) is the mean or parameter of the distribution
- Euler number: e = 2.71828
- The probability of obtaining y = k is P
- k is the number of successes 0, 1,2,3…
- n is the number of tests or events (the sample size)
Discrete random variables, as their name implies, depend on chance and only take discrete values: 0, 1, 2, 3, 4…, k.
The mean of the distribution is given by:
The variance σ, which measures the spread of the data, is another important parameter. For the Poisson distribution it is:
σ = μ
Poisson determined that when n → ∞, and p → 0, the mean μ - also called the expected value - tends to a constant:
-The events or events considered are independent of each other and occur randomly.
-The probability P of a certain event occurring during a specific period of time is very small: P → 0.
-The probability of more than one event occurring in the time interval is 0.
-The average value approximates a constant given by: μ = np (n is the sample size)
-Since the dispersion σ is equal to μ, as it adopts larger values, the variability also becomes greater.
-Events must be evenly distributed in the time interval used.
-The set of possible values of the event y is: 0,1,2,3,4….
-The sum of i variables that follow a Poisson distribution is also another Poisson variable. Its average value is the sum of the average values of these variables.
Differences with the binomial distribution
The Poisson distribution differs from the binomial distribution in the following important ways:
-The binomial distribution is affected by both the sample size n and the probability P, but the Poisson distribution is only affected by the mean μ.
-In a binomial distribution, the possible values of the random variable y are 0,1,2,…, N, whereas in the Poisson distribution there is no upper limit for these values.
Examples
Poisson initially applied his famous distribution to legal cases, but on an industrial level, one of his earliest uses was in brewing beer. In this process yeast cultures are used for fermentation.
Yeast consists of living cells, the population of which is variable over time. In the manufacture of beer it is necessary to add the necessary amount, therefore it is necessary to know the amount of cells that there are per unit of volume.
During World War II the Poisson distribution was used to find out if the Germans were actually aiming at London from Calais, or just firing at random. This was important for the Allies to determine how good was the technology available to the Nazis.
Practical applications
The applications of the Poisson distribution always refer to counts in time or counts in space. And since the probability of occurrence is small, it is also known as the "law of rare events."
Here is a list of events that fall into one of these categories:
-Registration of the particles in a radioactive decay, which, like the growth of yeast cells, is an exponential function.
-Number of visits to a certain website.
-Arrival of people to a line to pay or be attended (queue theory).
-Number of cars that pass a certain point on a road, during a given time interval.
Figure 2. The number of cars passing through a point roughly follows a Poisson distribution. Source: Pixabay.
-Mutations suffered in a certain DNA chain after receiving exposure to radiation.
-Number of meteorites with a diameter greater than 1 m fallen in a year.
-Defects per square meter of a fabric.
-Quantity of blood cells in 1 cubic centimeter.
-Calls per minute to a telephone exchange.
-Chocolate chips present in 1 kg of cake batter.
-Number of trees infected by a certain parasite in 1 hectare of forest.
Note that these random variables represent the number of times an event occurs during a fixed period of time (calls per minute to the telephone exchange), or a given region of space (fabric defects per square meter).
These events, as has already been established, are independent of the time that has passed since the last occurrence.
Approximating the binomial distribution with the Poisson distribution
The Poisson distribution is a good approximation to the binomial distribution as long as:
-The size of the sample is large: n ≥ 100
-The probability p is small: p ≤ 0.1
- μ is in the order of: np ≤ 10
In such cases the Poisson distribution is an excellent tool, since the binomial distribution can be difficult to apply in these cases.
Solved exercises
Exercise 1
A seismological study determined that during the last 100 years, there were 93 large earthquakes around the world, with at least 6.0 on the Richter scale -logarithmic-. Suppose that the Poisson distribution is a suitable model in this case. Find:
a) The average occurrence of large earthquakes per year.
b) If P (y) is the probability of earthquakes occurring during a randomly selected year, find the following probabilities:
It is quite less than P (2).
The results are listed below:
P (0) = 0.395, P (1) = 0.367, P (2) = 0.171, P (3) = 0.0529, P (4) = 0.0123, P (5) = 0.00229, P (6) = 0.000355, P (7) = 0.0000471.
For example, we could say that there is a 39.5% probability that no major earthquake will occur in a given year. Or that there are 5.29% of 3 large earthquakes occurring in that year.
Solution c)
c) The frequencies are analyzed, multiplying by n = 100 years:
39.5; 36.7; 17.1; 5.29; 1.23; 0.229; 0.0355 and 0.00471.
For example:
- A frequency of 39.5 indicates that, in 39.5 out of 100 years, 0 large earthquakes occur, we could say that it is quite close to the actual result of 47 years without any major earthquake.
Let's compare another Poisson result with the actual results:
- The value obtained of 36.7 means that in a period of 37 years there is 1 great earthquake. The actual result is that in 31 years there was 1 major earthquake, a good match with the model.
- 17.1 years are expected with 2 large earthquakes and it is known that in 13 years, which is a close value, there were indeed 2 large earthquakes.
Therefore the Poisson model is acceptable for this case.
Exercise 2
One company estimates that the number of components that fail before reaching 100 operating hours follows a Poisson distribution. If the average number of failures is 8 in that time, find the following probabilities:
a) That a component fails in 25 hours.
b) Failure of less than two components, in 50 hours.
c) At least three components fail in 125 hours.
Solution to)
a) It is known that the average of failures in 100 hours is 8, therefore in 25 hours a quarter of failures is expected, that is, 2 failures. This will be the μ parameter.
The probability that 1 component fails is requested, the random variable is "components that fail before 25 hours" and its value is y = 1. By substituting in the probability function:
However, the question is the probability that fewer than two components fail in 50 hours, not that exactly 2 components fail in 50 hours, therefore we must add the probabilities that:
-None fail
- Failure only 1
The parameter μ of the distribution in this case is:
μ = 8 + 2 = 10 failures in 125 hours.
P (3 or more components fail) = 1- P (0) - P (1) - P (2) =
References
- MathWorks. Poisson distribution. Recovered from: es.mathworks.com
- Mendenhall, W. 1981. Statistics for Management and Economics. 3rd. edition. Grupo Editorial Iberoamérica.
- Stat Trek. Teach yourself Statistics. Poisson Distribution. Recovered from: stattrek.com,
- Triola, M. 2012. Elementary Statistics. 11th. Ed. Pearson Education.
- Wikipedia. Poisson distribution. Recovered from: en.wikipedia.org