Skip to Main Content

How To: Find Data in the Wild


Schedule a Research Appointment

Email Us

Chat with a Librarian

HOW TO | FIND DATA IN THE WILD


What do we mean by data?

Today the term 'data' is more commonly used as a catch-all phrase for information, statistics, or facts. When searching for data we want to find a collection of information in an organized and easily readable manner. Commonly this format is a table, or table adjacent, set of information that can be read by a computer (think of an Excel sheet). We can then use these collections of data, also known as a dataset, to help us draw conclusions. 


Define

Data needs to be specific and quantifiable

  • What are you seeking to evaluate?

  • How is it being measured (numerically)?


Search

Most freely available data can be found with a Google search. It is built out of three primary parts: keywords, data type, and website.

Keywords

Keywords are derived from how you chose to define your topic. Some keywords tend to work better than others, so it’s always good to have a list of examples on hand. For example, if I am looking for information on "Climate Change" I could also try the following synonymous terms: "Global Warming", "Global Heating", "Greenhouse Gases", and "Carbon Emissions." When searching keyword phrases, make sure to use quotes to keep the words together: "Climate Change."

Data Type

  • Use the term datasets instead of data

Data has become very broadly defined as any form of information, so you may get articles rather than datasets.

  • Use the term “Open Data” instead of “Public Data” or “Free Data”

“Open Data” is freely usable, commonly produced by government entities or organizations, and tends to have a high level of provenance and reliability. “Public Data” or “Free Data” is free, but is often unruly and unreliable, with no way to define where it came from or what was being measured. 

You can also search by file type, for example, .xls, .xlsx, .csv, .html, .xml, .json, etc., by searching filetype: in Google, followed by the format you want to find, for example, filetype:.csv to find CSV files.

Website

It is best to search websites that produce trustworthy and reliable information, most commonly government agencies or international organizations. Below are some examples.

Governments: .gov, .it, .fr, .do, .es, .lk, ct.gov, ny.gov

Organizations & International Agencies: .int, .org, .eu

To search by website in Google use site:, for example, site:.gov to find U.S. government websites. 


Evaluate

It is important to use credible reliable information:

  • Who collected/made the dataset? – Is it a trusted source of information?

  • How was the data acquired? – Was the information ethically acquired? Is it consistent?

  • How is the data defined? – Does the creator/collector outline the information clearly, via a data dictionary or standard intake procedure?


Need further guidance?

If you need further assistance finding data please schedule a reference consultation with a librarian.

If you need further assistance manipulating and modeling the data please schedule a consultation with the Digital Learning Center (DLC) by emailing dlc@hws.edu.