STATS: FINDING RAW DATA
FREE WEB TOOLS
GENERAL DATA SETS
The US Government pledged last year to make all government data available freely online. This site is the first stage and acts as a portal to all sorts of amazing information on everything from climate to crime.
Social science datasets provided by the University of California in conjunction with their research and courses.
Browse data sets about Health, Criminal Justice, Education, Politics, Business, Transportation, Military, Environment, or Finance.
The NLS, sponsored by the U.S. Bureau of Labor Statistics, are nationally representative surveys that follow the same sample of individuals from specific birth cohorts over time. The surveys collect data on labor market activity, schooling, sexual activity & fertility, program participation, health, crime, and much more. Raw datasets available.
IPUMS provides census and survey data from around the world integrated across time and space.
Compilation of data from sources including the World Health Organization and World Bank covering economic, medical and social statistics from around the world.
DEMOGRAPHIC DATA
The U.S. Census Bureau publishes reams of demographic data at the state, city, and even zip code level. It is a fantastic data set for students interested in creating geographic data visualizations.
An easy-to-use tool for investigating U.S. demographic trends, brought to you by the Social Science Data Analysis Network (SSDAN) at the University of Michigan. With graphics and exportable trend data, CensusScope is designed for both generalists and specialists.
If data about the lives of children around the world is of interest, UNICEF is the most credible source. The organization’s public data sets touch upon nutrition, immunization, and education, among others.
ECONOMIC DATA
Many important economic indicators for the United States (like unemployment and inflation) can be found on the Bureau of Labor Statistics website. Most of the data can be segmented both by time and by geography.
USDA SALES TRENDS (RETAIL)
ERS provides information on food store sales and sales growth, the share of food sales by retail segment, and industry structure.
Federal Statistical Agencies and the private sector develop and maintain a wide variety of data on various aspects of food markets and consumer food choice behavior. Each data source is designed to cover some, but not all, of the information needed to support key policy issues facing the Nation.
This houses national and regional economic data, including gross domestic product and exchange rates.
Access to world economic data.
This includes data from world development indicators, OECD, and human development indicators, mostly related to economics data and the world.
HEALTH DATA
125 years of US healthcare data including claim-level Medicare data, epidemiology and population statistics.
DATA RESOURCE CENTER FOR CHILD AND ADOLESCENT HEALTH
Includes national and state-level data on hundreds of child health indicators from the National Survey of Children’s Health, National Health Interview Survey Child Component, Survey of Pathways to Diagnosis and Services, and National Survey of Children with Special Health Care Needs.
WHO offers world hunger, health, and disease statistics.
NATIONAL CENTER FOR HEALTH STATISTICS(CDC)
NCHS is the Federal Government's principal vital and health statistics agency. NCHS data systems include data on vital events as well as information on health status, lifestyle and exposure to unhealthy influences, the onset and diagnosis of illness and disability, and the use of health care.
Provides access to a variety Center of Disease Control reports, guidelines, and dozens of text-based and numeric databases, including vital statistics, environment, population, disease and disability, immunization, behavior and health, injuries, occupational health, and more. Export data: tab-delimited text file.
CENTERS FOR DISEASE CONTROL + PREVENTION
Data sets on cause of death by many different topic areas. You can also search by age, race, year, and more.
SUBSTANCE ABUSE AND MENTAL HEALTH DATA ARCHIVE (US DHHS)
SAMHDA provides public-use data files, file documentation, and access to restricted-use data files from numerous series on substance abuse and mental health in the US. Datasets are available via links provided in each series/survey.
SEER NATIONAL CANCER INSTITUTE
The U.S. government also has data about cancer incidence, again segmented by age, race, gender, year, and other factors. It comes from the National Cancer Institute’s Surveillance, Epidemiology, and End Results Program.
THE BROAD INSTITUTE CANCER DATA
Some of the institute's cancer data sets are freely available online.
A set of Medicare serving hospitals and data on their quality of care.
CRIME DATA
Data published annually on criminal victimization; populations under correctional supervision; and mortality in prisons. Periodic data series include administration of law enforcement agencies and correctional facilities; characteristics of correctional populations; and special studies on other criminal justice topics.
If you’re interested in analyzing time series data, you can use it to chart changes in crime rates at the national level over a 20-year period. Alternatively, you can look at the data geographically.
CLIMATE DATA
Huge collection of environmental, meteorological and climate data sets from the US National Climatic Data Center. The world’s largest archive of weather data.
A number of datasets related to land, climate, solar radiance, and more.
MISCELLANEOUS
Offers a number of data sets on American life, technology, and the internet. Organized by year.
Reddit released a really interesting data set of every comment that has ever been made on the site. It’s over a terabyte of data uncompressed, so if you want a smaller data set to work with Kaggle has hosted the comments from May 2015 on their site.