STATS: FINDING RAW DATA

FREE WEB TOOLS

GENERAL DATA SETS

DATA.GOV

The US Government pledged last year to make all government data available freely online. This site is the first stage and acts as a portal to all sorts of amazing information on everything from climate to crime.

UC DATA

Social science datasets provided by the University of California in conjunction with their research and courses.

PROPUBLICA

Browse data sets about Health, Criminal Justice, Education, Politics, Business, Transportation, Military, Environment, or Finance.

NATIONAL LONGITUDINAL SURVEYS

The NLS, sponsored by the U.S. Bureau of Labor Statistics, are nationally representative surveys that follow the same sample of individuals from specific birth cohorts over time. The surveys collect data on labor market activity, schooling, sexual activity & fertility, program participation, health, crime, and much more. Raw datasets available.

IMPUMS DATA

IPUMS provides census and survey data from around the world integrated across time and space.

GAPMINDER

Compilation of data from sources including the World Health Organization and World Bank covering economic, medical and social statistics from around the world.

DEMOGRAPHIC DATA

UNITED STATES CENSUS DATA

The U.S. Census Bureau publishes reams of demographic data at the state, city, and even zip code level. It is a fantastic data set for students interested in creating geographic data visualizations.

CENSUS SCOPE

An easy-to-use tool for investigating U.S. demographic trends, brought to you by the Social Science Data Analysis Network (SSDAN) at the University of Michigan. With graphics and exportable trend data, CensusScope is designed for both generalists and specialists.

UNICEF

If data about the lives of children around the world is of interest, UNICEF is the most credible source. The organization’s public data sets touch upon nutrition, immunization, and education, among others.

ECONOMIC DATA

US BUREAU OF LABOR STATISTICS

Many important economic indicators for the United States (like unemployment and inflation) can be found on the Bureau of Labor Statistics website. Most of the data can be segmented both by time and by geography.

USDA SALES TRENDS (RETAIL)

ERS provides information on food store sales and sales growth, the share of food sales by retail segment, and industry structure.

USDA FOOD DATA

Federal Statistical Agencies and the private sector develop and maintain a wide variety of data on various aspects of food markets and consumer food choice behavior. Each data source is designed to cover some, but not all, of the information needed to support key policy issues facing the Nation.

BUREAU OF ECONOMIC ANALYSIS

This houses national and regional economic data, including gross domestic product and exchange rates.

IMF ECONOMIC DATA

Access to world economic data.

GOOGLE PUBLIC DATA EXPLORER

This includes data from world development indicators, OECD, and human development indicators, mostly related to economics data and the world.

HEALTH DATA

HEALTHDATA.GOV

125 years of US healthcare data including claim-level Medicare data, epidemiology and population statistics.

DATA RESOURCE CENTER FOR CHILD AND ADOLESCENT HEALTH

Includes national and state-level data on hundreds of child health indicators from the National Survey of Children’s Health, National Health Interview Survey Child Component, Survey of Pathways to Diagnosis and Services, and National Survey of Children with Special Health Care Needs.

WORLD HEALTH ORGANIZATION

WHO offers world hunger, health, and disease statistics.

NATIONAL CENTER FOR HEALTH STATISTICS(CDC)

NCHS is the Federal Government's principal vital and health statistics agency. NCHS data systems include data on vital events as well as information on health status, lifestyle and exposure to unhealthy influences, the onset and diagnosis of illness and disability, and the use of health care.

CDC WONDER

Provides access to a variety Center of Disease Control reports, guidelines, and dozens of text-based and numeric databases, including vital statistics, environment, population, disease and disability, immunization, behavior and health, injuries, occupational health, and more. Export data: tab-delimited text file.

CENTERS FOR DISEASE CONTROL + PREVENTION

Data sets on cause of death by many different topic areas. You can also search by age, race, year, and more.

SUBSTANCE ABUSE AND MENTAL HEALTH DATA ARCHIVE (US DHHS)

SAMHDA provides public-use data files, file documentation, and access to restricted-use data files from numerous series on substance abuse and mental health in the US. Datasets are available via links provided in each series/survey.

SEER NATIONAL CANCER INSTITUTE

The U.S. government also has data about cancer incidence, again segmented by age, race, gender, year, and other factors. It comes from the National Cancer Institute’s Surveillance, Epidemiology, and End Results Program.

THE BROAD INSTITUTE CANCER DATA

Some of the institute's cancer data sets are freely available online.

HOSPITAL CARE

A set of Medicare serving hospitals and data on their quality of care.

CRIME DATA

BUREAU OF JUSTICE STATISTICS

Data published annually on criminal victimization; populations under correctional supervision; and mortality in prisons. Periodic data series include administration of law enforcement agencies and correctional facilities; characteristics of correctional populations; and special studies on other criminal justice topics.

FBI CRIME DATA

If you’re interested in analyzing time series data, you can use it to chart changes in crime rates at the national level over a 20-year period. Alternatively, you can look at the data geographically.

CLIMATE DATA

NATIONAL CLIMATIC DATA CENTER

Huge collection of environmental, meteorological and climate data sets from the US National Climatic Data Center. The world’s largest archive of weather data.

EARTH DATA BY NASA

A number of datasets related to land, climate, solar radiance, and more.

MISCELLANEOUS

PEW RESEARCH CENTER

Offers a number of data sets on American life, technology, and the internet. Organized by year.

REDDIT

Reddit released a really interesting data set of every comment that has ever been made on the site. It’s over a terabyte of data uncompressed, so if you want a smaller data set to work with Kaggle has hosted the comments from May 2015 on their site.