10 Great Places To Find Open, Free Datasets [2024 Guide] (2024)

Wondering where to find free and open datasets for your next data project? Look no further…

If you’re looking for a job in data analytics, you’ll need a portfolio to demonstrate your expertise. Of course, if you’re new to data analytics, you probably don’t have much expertise! Not to worry. The fact you might not have worked on a paid project yet doesn’t mean you can’t whip up a compelling portfolio using some practice datasets.

Fortunately, the Internet is awash with these, most of which are completely free to download (thanks to the open data initiative). In this post, we’ll highlight a few first-rate repositories where you can find data on everything from business to finance, planetary science and crime.

Prefer to watch this information over reading it? Check out this video on dataset resources, presented by our very own in-house data scientist, Tom!

Prepare to geek out, and here we go:

1. Google Dataset Search

Type of data: Miscellaneous
Data compiled by: Google
Access: Free to search, but does include some fee-based search results
Sample dataset: Global price of coffee, 1990-present

It seems we turn to Google for everything these days, and data is no exception. Launched in 2018, Google Dataset Search is like Google’s standard search engine, but strictly for data.

While it’s not the best tool if you prefer to browse, if you have a particular topic or keyword in mind, it won’t disappoint. Google Dataset Search aggregates data from external sources, providing a clear summary of what’s available, a description of the data, who it’s provided by, and when it was last updated. It’s an excellent place to start.

2. Kaggle

Type of data: Miscellaneous
Data compiled by: Kaggle
Access: Free, but registration required
Sample dataset: Daily temperature of major cities

Like Google Dataset Search, Kaggle offers aggregated datasets, but it’s a community hub rather than a search engine. Kaggle launched in 2010 with a number of machine learning competitions, which subsequently solved problems for the likes of NASA and Ford.

It’s since evolved into a renowned open data platform, offering cloud-based collaboration for data scientists, as well as educational tools for teaching artificial intelligence and data analysis techniques…plus, of course, tonnes of great datasets covering almost any topic you can imagine.

3. Data.Gov

Type of data: Government
Data compiled by: US Federal Government
Access: Free, no registration required
Sample dataset: Lobster Report for Transshipment and Sales

In 2015, the US Government made all its data publicly available. With over 200,000 datasets covering everything from climate change to crime, you can lose yourself in the database for hours.

For a government website, it has some surprisingly user-friendly search functions, including the ability to drill down by geographical area, organization type, and file format. Search results are also clearly labeled at federal, state, county, and city levels.

If you’re interested in more general data about the US population, you can also check out the US Census Bureau, offering a rich selection of data about US citizens, their geography, education, and population growth.

4. Datahub.io

Type of data: Mostly business and finance
Data compiled by: Datahub
Access: Mostly free, no registration required
Sample dataset: Average mass of glaciers since 1945

The goal of many data analysts is to help drive savvy business decisions. As such, using economic or business datasets for your portfolio project might be worth considering.

While Datahub covers a variety of topics from climate change to entertainment, it mainly focuses on areas like stock market data, property prices, inflation, and logistics. Because many of the data on the portal are updated monthly (or even daily) you’ll always have something fresh to work with, as well as data that covers broad timescales.

5. UCI Machine Learning Repository

Type of data: Machine learning
Data compiled by: University of California Irvine
Access: Free, no registration required
Sample dataset: Behavior of urban traffic in Sao Paulo, Brazil

Generalized repositories are great if you’re happy to browse. But if you’re seeking something more niche, why not specialize? Enter the UCI Machine Learning Repository.

Launched thirty years ago by the University of California Irvine, don’t let the 90s vibe mislead you—the UCI repository has a strong reputation among students, teachers, and researchers as the go-to place for machine learning data.

Datasets are clearly categorized by task (i.e. classification, regression, or clustering), attribute (i.e. categorical, numerical), data type, and area of expertise. This makes it easy to find something that’s suitable, whatever machine learning project you’re working on.

5. Earth Data

Type of data: Earth science
Data compiled by: NASA
Access: Free, no registration required
Sample dataset: Environmental conditions during fall moose hunting season in Alaska, 2000-2016

If you think space is awesome (let’s face it, space is awesome!) look no further than Earth Data. Publicly available since 1994, this repository provides access to all of NASA’s satellite observation data for our little blue planet.

As you can imagine, there’s plenty to peruse, from weather and climate measurements to atmospheric observations, ocean temperatures, vegetation mapping, and more. If Earth-based data isn’t your thing, NASA’s Planetary Data System takes things a step further with data from interplanetary missions, such as the Cassini probe (which orbited Saturn from 2004 to 2017). Who knows, you might even make a scientific discovery…

6. CERN Open Data Portal

Type of data: Particle Physics
Data compiled by: CERN
Access: Free, no registration required
Sample dataset: Higgs candidate collision events from 2011 and 2012

Want to demonstrate your ability to work with highly complex datasets? Head to the CERN Open Data Portal. It offers access to over two petabytes of information, including datasets from the Large Hadron Collider particle accelerator. Frankly, these data aren’t for the faint of heart but if you’re interested in particle physics, they’re worth checking out.

While even the names of these datasets are pretty complex, each entry has a helpful breakdown of what’s included, as well as related datasets, and how to go about analyzing them. In many cases, they even provide sample code to get you started (thanks, CERN!)

7. Global Health Observatory Data Repository

Type of data: Health
Data compiled by: UN World Health Organization
Access: Free, no registration required
Sample dataset: Polio immunization coverage estimates by region

The Global Health Observation data repository is the UN WHO’s gateway to health-related statistics from across the globe. If you’re looking to break into the healthcare industry (a key focus for many data scientists, especially in the area of machine learning), these datasets are a good option for your portfolio.

Covering everything from malaria to HIV/AIDS, antimicrobial resistance, and vaccination rates, the portal even has a nice little feature that lets you preview data tables before downloading them. Not strictly necessary, but definitely nice to have!

8. BFI film industry statistics

Type of data: Entertainment and film
Data compiled by: British Film Institute
Access: Free, no registration required
Sample dataset: Weekend box office figures from 2001-present

If you’re looking for some data that are a bit more digestible, the next few should be right up your street. First off: the British Film Institute industry statistics. Throughout the year, the BFI accrues and releases data on everything from UK box office figures, to audience demographics, home entertainment, movie production costs, and more.

The best part, though, is their annual statistical yearbook. This breaks down the year’s data with some excellent statistical analysis and visual reports—great if you’re new to data analytics and want to check your work against the real thing.

9. NYC Taxi Trip Data

Type of data: Transport
Data compiled by: New York City Taxi and Limousine Commission
Access: Free, no registration required
Sample dataset: Take your pick!

This is a weirdly fascinating one…since 2009, the NYC Taxi and Limousine Commission has been accruing transport data from across New York City. Find datasets covering pick-up/drop-off times and locations, trip distances, fares, rate and payment types, passenger counts, and more.

It’s pretty interesting to compare the differences in figures from 2009 to the present day, especially within such a small geographic area. The site also provides some additional tools, including user guides, taxicab zone maps, data dictionaries (for explaining the spreadsheet labels), and annual industry reports. All very intuitive and quite a helpful guide if you’re new to data analytics.

10. FBI Crime Data Explorer

Type of data: Crime and drugs
Data compiled by: Federal Bureau of Investigation
Access: Free, no registration required
Sample dataset: Homicide offense counts in Point Pleasant, 2008-2018

If you’re fascinated by crime, the FBI Crime Data Explorer is the one for you. It provides a broad collection of crime statistics from a variety of state organizations (universities and local law enforcement) and government (on a local, regional, and state-level). Pull data on hate crimes, officer assaults, homicides, and more.

Like the last couple of entries on our list, it also includes some helpful user guides to support data navigation. Each dataset also has some pretty nice visual breakdowns and analysis, so you can see if it has the features you’re looking for before downloading it.

10 Great Places To Find Open, Free Datasets [2024 Guide] (1)

Next steps

If you’re anything like us, you’ll lose hours simply browsing these vast repositories. From the quirky to the unashamedly geeky, there’s no better evidence of data’s ubiquity in our lives.

So what do you do once you’ve found your dataset and analyzed it? If you want to feature your analysis as a project in your portfolio, there are certain steps you’ll need to follow—you can learn how to build your data analytics portfolio in this guide.

If you’re completely new to data analytics, why not try out a free, 5-day introductory short course? You’ll get a hands-on introduction to the field, complete with access to a workable dataset. And, if you’d like to learn more about what it takes to forge a career in data, check out the following:

  • Am I a Good Fit for a Career as a Data Analyst?
  • The Best Online Data Analytics Courses
  • The 7 Top Data Analysis Software Tools
10 Great Places To Find Open, Free Datasets [2024 Guide] (2024)

FAQs

10 Great Places To Find Open, Free Datasets [2024 Guide]? ›

A subsidiary of Google, it is an online community of data scientists and machine learning engineers. Kaggle allows users to find datasets they want to use in building AI models, publish datasets, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.

Where can I find free datasets for data analysis? ›

A few free government datasets we recommend:
  • Data.gov.
  • USA.gov Data and Statistics.
  • Federal Reserve Data.
  • U.S. Bureau of Labor Statistics.
  • California Open Data Portal.
  • New York Open Data.
  • NOAA Data Access (mostly via API)
  • NASA Open Data Portal.

Where is the best place to find data? ›

Where Can I Find Data Sets?
Source of Data setsWeb Link
DataCamphttps://www.datacamp.com/workspace/datasets
Google Dataset Searchhttps://datasetsearch.research.google.com/
Data.govhttps://data.gov/
Datahubhttps://www.datahub.io/search
8 more rows

How do I find new datasets? ›

General Data Platforms
  1. Kaggle. Kaggle is a prime platform for accessing datasets due to its vast repository covering diverse topics like astronomy, diabetes, and more. ...
  2. AWS Data Exchange. ...
  3. Data. ...
  4. GitHub. ...
  5. Open Data Soft. ...
  6. DataHub. ...
  7. Google Public Data Explorer. ...
  8. Data.gov (US)
May 14, 2024

Where to find raw data for statistics project? ›

Dataset Sources
  • Academic Torrents. ...
  • Bureau of Labor Statistics: Data. ...
  • Data.gov. ...
  • FRED Economic Data. ...
  • ICPSR: Interuniversity Consortium for Political and Social Research. ...
  • MPC Data Projects. ...
  • National Center for Education Statistics (NCES) ...
  • National Centers for Environmental Information (NCEI)
Jul 17, 2024

Is Kaggle owned by Google? ›

A subsidiary of Google, it is an online community of data scientists and machine learning engineers. Kaggle allows users to find datasets they want to use in building AI models, publish datasets, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.

Where can I find datasets other than Kaggle? ›

So, we invite you to explore these projects and take a step further in your data science journey.
  • Kaggle. Kaggle is one of the most popular data science platforms. ...
  • Google Dataset Search. ...
  • GitHub. ...
  • World Bank Open Data. ...
  • Data. ...
  • DataHub. ...
  • Humanitarian Data Exchange. ...
  • FiveThirtyEight.

How to get data for free? ›

6 Best Ways to Get Free Internet on Android Without Service
  1. Public Wi-Fi Hotspots. Public Wi-Fi hotspots are your best bet for free internet access. ...
  2. Utilize Guest or Private Wi-Fi. Many businesses offer guest Wi-Fi to their customers. ...
  3. Mobile Data Sharing. ...
  4. Free Data Apps. ...
  5. Free VPNs. ...
  6. Offline Apps.
Dec 17, 2023

What websites collect the most data? ›

Google is the most avid big tech data miner currently on the internet because the search engine deals almost exclusively with user data. Google tracks and analyzes everything from your Gmail and calling history (for VoLTE calls) to your Chrome browsing preferences through third-party cookies.

What data is freely available? ›

Here are public sources of big data that are freely available.
  • Amazon Web Services. ...
  • Open Data Network. ...
  • Gallup. ...
  • Pew Research. ...
  • Google Scholar. ...
  • Chartr. ...
  • Data Catalogs. ...
  • UNData.

Where can I find the data set? ›

You can find datasets available on the web by searching Google or Open Data Repositories.

What is an open dataset? ›

Open data is data which is openly accessible to all, including companies, citizens, the media, and consumers. Here are some popular open data definitions: “Open data and content can be freely used, modified, and shared by anyone for any purpose.”

Where does AI get its data sets? ›

Image AI can gather data from various sources, such as publicly available image datasets, social media platforms, online repositories, and even user-contributed data. These datasets are typically labeled with annotations to help train the AI models effectively.

Where can I find research datasets? ›

General sources
  • DataCite. DataCite is an organization that assigns Digital Object Identifiers (DOI) for datasets. ...
  • Google Dataset Search. Google Dataset Search is a search engine for datasets. ...
  • Figshare. Figshare is a repository that allows researchers to share datasets and other research outputs. ...
  • Dataverse.

Where is the best place to find statistical information? ›

Government, agency and organizational websites are a great source of reliable statistical information.
  • U.S. Census Bureau. ...
  • U.S. Department of Commerce. ...
  • U.S. Energy Information Administration. ...
  • U.S. Statistical Abstract. ...
  • USAGov. ...
  • USDA Economic Research Service. ...
  • USDA National Agricultural Statistics Service.

Where can I get free statistics? ›

Principal U.S. Federal Statistical Agencies
  • Bureau of Economic Analysis.
  • Bureau of Justice Statistics.
  • Bureau of Labor Statistics.
  • Bureau of Transportation Statistics.
  • U.S. Census Bureau.
  • Economic Research Service.
  • Energy Information Administration.
  • National Agricultural Statistics Service.
Jun 11, 2024

Where can I collect data analysis? ›

Some common data collection methods include surveys, interviews, observations, focus groups, experiments, and secondary data analysis. The data collected through these methods can then be analyzed to support or refute research hypotheses and draw conclusions about the study's subject matter.

Is Python for data analysis free? ›

Enroll in this free online Python for Data Analysis course to advance your career in the in-demand field of data analysis and learn about numerous tools and working methods for the libraries.

How can I get data science for free? ›

Learn Data Science from Free Courses
  1. edX. An American online course provider, edX is offering an introductory course on Data Science for beginners to mainly cover: ...
  2. Google Cloud. ...
  3. Khan Academy. ...
  4. freeCodeCamp. ...
  5. Kaggle. ...
  6. The Open Source Data Science Masters. ...
  7. Intellipaat. ...
  8. KDnuggets.
Jun 4, 2024

Is Kaggle free to use? ›

Is Kaggle Free? Yes, it is, and that is one reason Kaggle is so popular among users. Its comprehensive resources, such as public datasets, forums, competitions and the freedom to exchange datasets and code, are all available without any financial commitments.

Top Articles
Latest Posts
Article information

Author: Rev. Porsche Oberbrunner

Last Updated:

Views: 6123

Rating: 4.2 / 5 (73 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Rev. Porsche Oberbrunner

Birthday: 1994-06-25

Address: Suite 153 582 Lubowitz Walks, Port Alfredoborough, IN 72879-2838

Phone: +128413562823324

Job: IT Strategist

Hobby: Video gaming, Basketball, Web surfing, Book restoration, Jogging, Shooting, Fishing

Introduction: My name is Rev. Porsche Oberbrunner, I am a zany, graceful, talented, witty, determined, shiny, enchanting person who loves writing and wants to share my knowledge and understanding with you.