|
Probability and Statistics
|
Probability and statistics deal with events or experiments where outcomes are uncertain, and they assess the likelihood of possible outcomes. Probability began in an effort to assess outcomes in gambling. We know from experience that if we toss a coin enough times, heads will come up about half the time and tails about half the time. The more trials, the more closely the outcome approaches y—that is, as the limit of trials approaches infinity, the probability is y.
In simple situations, such as the toss of a coin, it is relatively easy to assign probabilities based on intuition. When we consider more complicated events, intuition becomes less reliable. Various methods of calculation then come into play to assign mathematical probabilities to outcomes. For example, permutations and combinations—arrangements of the outcomes involved—are used to analyze many problems in probability. Probability has become an indispensable tool in statistics, physics, biology, social science, business, and many other fields.
Statistics is the organization and analysis of data for the purpose of simplification, comparison, and prediction. Statistical methods are used throughout most branches of human knowledge. A scientist may use statistics to bolster a theory, design an experiment, or test the significance of experimental results. Someone in business uses statistics to estimate sales and to control quality. A scholar may apply statistical methods to literary works. For example, he or she may use data on the frequency of particular words in order to determine the unknown author of a poem.
One of the best-known uses of statistics is as a predictor. The data collected from a sample group are used to predict the results from a larger group. Politicians use polls to evaluate their campaigns; biologists study animal populations by banding small numbers of captured animals; manufacturers maintain quality control on production lines by examining small samples of the manufactured products. The results of statistics are often given in the form of estimates together with some probability about how good the estimate is.
The great usefulness of statistics as a predictor is possible because of the regularity exhibited by many natural processes and populations that at first glance appear to be highly irregular. If we measured the heights of North American adults, for example, and presented the results in a bar graph, certain regularities would begin to appear as the number of people being measured grew. The bar graph would become more and more regular, symmetrical, and bell-shaped. This curve has many names, including the normal distribution curve, the Gaussian distribution, and the bell-shaped curve.
Statisticians use the term random variable to describe the outcome of an event that is unpredictable in advance, such as the percentage of adults who measure 5 ft 8 in or the effect of a lifetime of smoking on health. Statisticians are concerned with the variability of their data—that is, by how much it deviates from the expected distribution found in a normal distribution curve. They ask whether most of the outcomes cluster around the middle, forming a high curve, or scatter, forming a low curve. One measure of variability is called the standard deviation. Statisticians determine whether different variables increase together, such as packs of cigarettes smoked daily and likelihood of lung cancer, or whether they lack correlation. The study of the behavior of random variables is known as statistics.