Descriptive Statistics


Definition


Descriptive statistics form the basis of quantitative data analysis and offer researchers sample data summaries across one variable (univariate).[1] While inferential statistics can associate variables (correlation) or account for randomness in any population or process being studied, descriptive statistics simply describe what the data shows and easily translates results into a distribution of frequency and percents and overall averages.

The strength of descriptive statistics is its ability to collect, organize and compare vast amounts of discreet categorical and continuous non-discreet (numerically infinite) data in a more manageable form. Unlike inferential statistics, gathering descriptive statistics is a fairly straightforward process that describes and summarizes a data set but does not correlate data or create any type of statistical modeling relationship among multiple variables (multivariates) that might lead to inferred conclusions or a hypothesis.

Overall, descriptive statistics are used in systematic observations of central tendency and aim to describe subject data information in a manner that can be less subjectively evaluated by others. For students, descriptive statistics are highly relevant to academics since quantitative data engages the mind in novel ways of thinking about phenomena and information. Additionally, most professionals and their organizations report findings in statistical terms.[2]


Relevant Characteristics


In the hilarious introduction to Discovering Statistics through SPSS, Dr. Andy Field assures us that other than "evil professors" making us do it, stats are about learning how to answer interesting questions during the research process. Take, for example, the following questions: Does Mickey eat more cereal after watching TV? Before school or during weekends? Does Mickey eat the entire contents of his cereal box by himself or does he share a portion with Chi-chi the household parrot? To answer the questions, you need to gather chunks of information: first, you collect relevant data (and to do that you need to identify things that can be measured) and then you analyze those data. Simple data analysis will describe what your data is doing (descriptive) and may inspire deeper statistical analysis (inferential) that will: (1) Generate a new theory, and (2) support your existing theory or give you cause to modify the theory (all features of inferential stats). As such, the process of data collection and analysis and generating theories are intrinsically linked: Theories lead to data collection/analysis and data collection/analysis informs theories.[3] In Mickey's example, the researcher's initial job will be to: (1) Pre-measure contents of unopened cereal box designated only for Mickey, (2) measure cereal consumption at designated intervals, (3) if possible (and if Mickey is being honest about it), measure how much cereal Mickey is giving away to the parrot, and (4) compare overall measured results to obtain summarized answers. That is descriptive statistics -- setting out question parameters (yes/no or specific subtype question sets) and recording results over time.

"Since problem-oriented studies often involve the systemic collection of quantitative data, the need for statistical methods of descriptions and inference become apparent." [4] This evidence is emphasized in modern anthropology, as an academic discipline. Through the use of statistics, anthropology has grown from simple cultural comparisons to allow for great depths of significant research. The use of statistics also provides additional explanatory benefits, which assist the researcher in survey design and project development.

Statistical compilation is really a measurement of subject data behavior. Regardless of the discipline, data compilation first begins with a population (or process) sample deliberately anchored to an observational or experimental setting. Next, descriptive stats are used to identify discreet or finite categorical things, such as sex, age, and race; and, to calculate for continuous non-discreet things that can infinitely vary by amount or degree, such as blood pressure, shoe size, and IQ. In step three, the descriptive process identifies data behavior and measures the range of variable and central score locations within a sample group.

For researchers, descriptive statistics measure and record the immediate behavior of data, but also reflect variability and central tendency of scores over a given distribution. The arithmetic operations that can be performed on the behavior of variables over a distribution are referred to as mean, median, and mode. The most common measure used to obtain central tendency in research is called the mean, which is calculated by arithmetically adding all the scores in a distribution and dividing that total by the number of scores obtained. When scores fall in a normal distribution, the mean is the most useful average because it gives a good sense of typical scores researchers can use in subsequent sophisticated statistics tests. To account for extreme scores that do not reflect typicality in a distribution, researchers calculate central tendency by ordering numerical values by size and identifying the most median or middle score.[5] [6] Mode is the least sophisticated measure of data dispersal and simply refers to the most frequently occurring score in a distribution.

In descriptive statistics, range simply refers to the difference of low-to-high scores while standard deviation calculates in z-score units how far any score is likely to fall, or "deviate" from the mean, or average score, in our data set. In other words, looking at an x-y axis, a standard deviation would appear as a steep bell shape (a small derivation) or like a relatively flat bell-shape (a large derivation) curve surrounding the overall mean, or average score, computed for the data set. If a score falls:
  • one standard deviation above the mean, its standardized z-score on a chart would be +1.00 from the overall average score of the data set
  • half a standard deviation below the mean, its standardized z-score on a chart would be 0.50 from the overall average score of the data set[7]

Over the course of research, descriptive statistics can evolve from a basic method of summarizing sets of categorical (discreet) and continuous (numerically infinite) data into using standard deviation to infer a hypothesis from sample data. According to Trochim in his Research Methods Knowledge Base website for social scientists, inferential statistics are trying to reach conclusions that extend beyond the groundwork data summaries provided by descriptive statistics:
  1. With inferential statistics, you are trying to reach conclusions that extend beyond the immediate data alone
  2. We use inferential statistics to make judgments of the probability that an observed difference between groups is a dependable one or one that might have happened by chance in a study
  3. We use inferential statistics to make inferences from our data to more general conditions; we use descriptive statistics simply to describe what is going on with our data
  4. Most of the major inferential statistics come from a general family of statistical models known as the General Linear Model. This includes the t-test, Analysis of Variance (ANOVA), Analysis of Covariance (ANCOVA), regression analysis, and many of the multivariate methods like factor analysis, multidimensional scaling, cluster analysis, discriminant function analysis, and so on
  5. Given the importance of the General Linear Model, it is a good idea for any serious social researcher to become familiar with its workings[8]


Method Made Easy


A simple way of illustrating how descriptive statistics works is to think of how many base hits a player gets during a baseball game in combination with a running tally of total number of strikes, fouls and walks. In this example, the baseball game is the categorical data (finite information). Base hits, number of strikes, fouls and walks are all continuous and infinite numerical data chunks that can be measured over the innings so that mean, central frequency and standard deviation can be calculated and summarized by game's end.

In medical anthropology, a descriptive statistics sample might include measuring the number of lactating mothers aged 18-25 living within five miles of Quito (categorical, finite data), and whether they chose to breast feed their young fully, partially or not at all over the course of one year (continuous numerical data chunks). By year's end, the medical anthropologist would be able to tally (1) his/her population sample of lactating women aged 18-25 in the designated area, (2) divide the mean to obtain central tendency, and (3) determine standard deviation. The mean and central tendency of the data set is calculated by adding all the scores in a distribution and dividing by the number of scores obtained. The standard deviation will then calculate in z-units how far any score is likely to deviate (+/-) or fall away from the mean or average score obtained in the data set.

Recommendation: After gaining some working knowledge of basic statistics, numerous software packages (various free versions ) and innovative fingertip technology, such as Percentally, can simplify the sometimes arduous task of data collection. Available software and pocket technologies can be great time savers and make plugging in values and grappling with data a far-less complex process.
Time-saving tip: Forgo paper in the field and use technology to immediately send all statistical data gathered up to the cloud for later use. Using stats with technology aids is good for your intellectual and mental health!


Advantages

Descriptive statistics is a powerful beast of burden: (1) it collects and summarizes vast amounts of data and information in a manageable and organized manner, (2) it is a fairly straightforward process that can easily translate results into a distribution of frequency, percents and overall averages, (3) it establishes standard deviation, (4) are used when it may not be desirable to develop complex research models, (5) deals with immediate data and single variables rather than trying to establish conclusions, (6) can identify further ideas of research, (7) is a good primer to learn about statistical processes, and (8) can lay the groundwork for more complex statistical analysis.

Essentially, descriptive statistics are ideal for research models where simple observations and experimental data must be recorded. In general, descriptive stats are constructed from survey or subtype observations. Typical uses might be for (1) recording sample sizes, (2) gathering data from research/experiment subgroups, (3) collecting cultural, geographic, institutional, political or demographic data, and (3) tallying characteristics (e.g., average clinical age and proportional height and weight among research subjects). In research, the objective of descriptive stats is to gather and quantify observable information and to summarize the bulk of data findings (using bars or histograms) for other complex studies that might spawn from original descriptive information. Bottom line: data summaries provided by descriptive statistics do the grunt-work of gathering basic observational data and deliver control group (experiment) information, but inferential stats rise from descriptive statistics as elegant and more sophisticated instruments of thought trained to hypothesize and infer conclusions beyond obvious data.

When researchers remain curious about unearthing deeper questions regarding hidden and meaningful relationships among variables and want to learn more, they usually must migrate to using inferential statistics; but without the fundamental data gathered by descriptive statistics, complex research processes, such as data mining, as well as related probability and hypothesis-building theories, would not advance in society.

  • Differential stats used in an inferential statistics example:
Question: Does drinking Coke make you lose sperm count?
In our Coke-as-potential-spermicide example, the primary researcher's job will have two phases: Phase I-- the descriptive stage, and Phase II-- the inferential stage. During the descriptive stage, the primary research will measure sperm motility, or movement, in a sample of sperm swimming in a petri dish doused with a pre-measured amount of Coke. S/he will then compare the results to other replicated sperm-to-Coke motility experiences and record/tally the measurements obtained, thus creating a statistic. In the inferential stage, an associate research will use the original statistics gathered by the primary research to craft a hypothesis to explain the variability within the experiment. S/he concluded the variations may have resulted from the following scenarios:
    • a) an experimenter conducting the same test increased the amount of Coke used in testing the swimming sperm
    • b) experiments who previously conducted the experiment measured the sperm swimming in the Coke in dissimilar ways
    • c) the motility test were performed on old, rather than fresh, healthy sperm swimmers.
After all appropriate data mining procedures used in inferential statistics are performed, the original primary and associate researchers will be able to analyze whether or not drinking Coke results in lowered sperm motility (adapted from Dr. Andy Field)
Conclusion: If you ask research questions that interest you, learning how to do statistics will become second nature.


Limitations

Descriptive statistics lacks the ability to identify the cause behind the phenomenon because it only describes and reports observations, nor does it correlate/associate data or create any type of statistical relationship modeling relationship among variables. As a method, descriptive statistics does not account for randomness or provide statistical calculations that can lead to hypothesis or theories of populations studied.

For example, you can use descriptive statistics to calculate a raw GPA score, but a raw GPA does not reflect how difficult the courses were or identify major fields and disciplines in which courses were taken. Therefore, every time you try to describe a large set of observations with a single descriptive statistics indicator, you run the risk of distorting the original data or losing important details.[9] [10]


Method in Context


A public radio commentator once noted that "statistics are facts but not necessarily the truth." Any data set can be skewed or over-interpreted to enhance opinions, but numbers and technology can reveal, compare, and archive information in ways that drawings or conversations cannot. John Phillips gives students a good explanation of the value inherent to doing statistics:

  • To understand the meaning of any measurement in the social sciences, you must know at least two things about it. First, you must be able to describe the operation by which it was obtained, and second, you must be able to compare it with other measurements that have been obtained in the same way. [I am primarily concerned] with the second kind of knowledge. Statistical thinking deals with multiple measurements. It analyzes the relation between how many and how much – of frequencies to scores. If the basic element in measurement is a score, the corresponding concept in statistics is a distribution – in short, a frequency distribution. Such a distribution can be described by drawing a picture of it. [But] the method is cumbersome, however; so many ways have been devised to achieve roughly the same result through the use of numbers rather than diagrams. The most important advantage of numbers over diagrams is that they can be manipulated in ways that diagrams cannot.[11]


Online Resources


Basic Descriptive Statistics
Advanced Statistics
National Center for Health Statistics
WHO Global Health Statistics
Descriptive Stats YouTube Tutorial


Further Reading


Field, A. (2009). Discovering statistics using SPSS (3rd. ed.). Thousand Oaks, CA: Sage
Harris, M., Taylor, G. (2004). Medical statistics made easy. London, England: Taylor & Francis
Phillips, J.L. (2000). (6th ed.). How to think about statistics. New York, NY: Henry Holt and Company
Rubin, H., Rubin, I. (2005). (2nd ed.). Qualitative interviewing: The art of hearing data. Thousand Oaks, CA: Sage

Salkind, N. (2000). Statistics for people who think they hate statistics. Thousand Oaks, CA: Sage


References


  1. ^ Trochim, W. (2006). Descriptive Statistics. In Research Methods Knowledge Base (Analysis: Descriptive Statistics). Retrieved from http://www.socialresearchmethods.net/kb/statdesc.php
  2. ^ Philips, J.L. (2000). How to think about statistics (6th ed.). New York, NY: Henry Holt and Company.
  3. ^ Field, A. (2009). Discovering statistics using SPSS (3rd. ed.). Thousand Oaks, CA: Sage.
  4. ^ Chibnik, M. (1985). The use of statistics in sociocultural anthropology. Annual Review of Anthropology, 14, 135-157. doi: 10.1146/annurev.an.14.100185.001031
  5. ^ Beins, B.C. (2008). Statistics in the social sciences. In International encyclopedia of the social sciences. Retrieved from http://www.encyclopedia.com/doc/1G2-3045302603.html#
  6. ^ Beins, B.C. (2008). Statistics in the social sciences. In International encyclopedia of the social sciences. Retrieved from http://www.encyclopedia.com/doc/1G2-3045302603.html#G
  7. ^ Beins, B.C. (2008) Statistics in the social sciences. In International encyclopedia of the social sciences. Retrieved from http://www.encyclopedia.com/doc/1G2-3045302603.html#
  8. ^ Trochim, W. (2006). Descriptive Statistics. In Research Methods Knowledge Base (Analysis: Descriptive Statistics).Retrieved from http://www.socialresearchmethods.net/kb/statdesc.php
  9. ^ Trochim, W. (2006). Descriptive Statistics. In Research Methods Knowledge Base (Analysis: Descriptive Statistics).Retrieved from http://www.socialresearchmethods.net/kb/statdesc.php
  10. ^ Trochim, W. (2006). Descriptive Statistics. In Research Methods Knowledge Base (Analysis: Descriptive Statistics).Retrieved from http://www.socialresearchmethods.net/kb/statdesc.php
  11. ^ Philips, J.L. (2000). How to think about statistics (6th ed.). New York, NY: Henry Holt and Company.