Skip to main content

Data-Planet: Introduction to Data-Planet

Data-Planet Tutorial

Watch a video tutorial on Data-Planet below!

Or, click here for an excellent introduction to Data-Planet by Amanda Howell, Business Librarian, Andersen Library, University of Wisconsin-Whitewater. Thanks Amanda! 

Data and Statistics: Terminology and Examples

Terminology Basics 

Below you will find simple definitions of the basic terminology associated with data and statistics. The examples will link into DataSheets from Data-Planet Statistical Ready Reference. From the DataSheets, you can link into Data-Planet Statistical Datasets to explore the millions of time series available in the repository.

Data: Fundamentally, data=information. We typically use the term to refer to numeric files that are created and organized for analysis. There are two types of data: aggregate and microdata.microdata.

  • Aggregate data are statistical summaries of data, meaning that the data have been analyzed in some way.  The Data-Planet repository is an excellent resource for obtaining aggregated data. 
  • Microdata: Individual response data obtained in surveys and censuses - these are data points directly observed or collected from a specific unit of observation. Also known as raw data. ICPSR is an excellent resource for obtaining microdata files.microdata files.

Data point or datum: Singular of data. Refers to a single point of data. Example: the amount of aviation gasoline consumed by the transportation sector in the US in 2012

Quantitative data/variables: Information that can be handled numerically. Example: spending by US consumers on personal care products and services

Qualitative data/variables: Information that refers to the quality of something. Ethnographic research, participant observation, open-ended interviews, etc., may collect qualitative data. However, often there is some element of the results obtained via qualitative research that can be handled numerically, eg, how many observations, number of interviews conducted, etc. Example: periods when the US was in, vs was not in, a recession, 1850-2014 The quality of being in a recession is assigned a value of .01 and not in a recession .0, which makes it possible to display as a chart.

Indicator: Typically used as a synonym for statistics that describe variables that describe something about the socioeconomic environment of a society, eg, per capita income, unemployment rate, median years of education.

Statistic: A number that describes some characteristic, or status, of a variable, eg, a count or a percentage. Example: total nonfarm job starts in August 2014

Statistics: Numerical summaries of data that has been analyzed in some way. Example: ranking of airlines by percentage of flights arriving on-time into Huntsville International Airport in Alabama in 2013

Time series data: Any data arranged in chronological order. Example: Gross Domestic Product of Greece, 2000-2013

Variable: Any finding that can change or vary. Examples include anything that can be measured, such as  the number of logging operations in Alabama.

  • Numerical variable: Usually referring to a variable whose possible values are numbers. Example: Bank Prime Loan Rate
  • Categorical variable: A variable whose that distinguishes among subjects by putting them in categories (eg, gender). Also called discrete or nominal variables. Example: Female Infant Mortality Rate of Belarus (the mortality rate is numerical - the age and gender characteristic is categorical)

Terminology Used with Collections of Data

Data aggregation: A collection of datapoints and datasets. Example: a search on the broad category "energy resources and industries" retrieves results from multiple sources

Dataset: A collection of related data items, eg, the responses of survey participants. This term is used very loosely – the entire Census 2010 Summary File 1 can be considered a dataset as can any individual time series included in the Census 2010, eg, Table P20. Households by Presence of People Under 18 Years by Household Type by Age of People Under 18 Years

Database: A collection of data organized for research and retrieval. Example: OECD Factbook. Example: American Community Survey.

Time series: A set of measures of a single variable recorded over a period of time. Example: Hourly Mean Earnings of Civilian Workers – Mining Management, Professional, and Related Workers

"Big Data" Terminology

Big data: A popular term used to describe the exponential growth and availability of structured and unstructured data that derived from the increasing sophistication of operational and transactional systems, mobile media, and the Internet. Big data and its analysis have become key components of obtaining business intelligence in particular.

Data analytics: Generally used to refer to the analytical techniques and tools required to analyze massive amounts of data. Closely related to data mining, which refers to the extraction of information from business systems.   

Definition References:

School of Data. School of Data Handbook. What is Data? Accessed January 5, 2015, http://schoolofdata.org/handbook/courses/what-is-data/.

Troester, Mark (SAS). Big Data Meets Big Data Analytics. Accessed January 5, 2015, http://www.sas.com/content/dam/SAS/en_us/doc/whitepaper1/big-data-meets-big-data-analytics-105777.pdf

Upton, Graham, and Ian Cook. 2008. A Dictionary of Statistics. Oxford University Press.

Vogt, W. Paul. 1999. Dictionary of Statistics & Methodology: A Nontechnical Guide for the Social Science, 2nd edition. SAGE Publications, Inc.

Western Libraries. Data and Statistics. Accessed January 5, 2015, http://www.lib.uwo.ca/madgic/dataandstatistics.html .