Internal Wharton Research Data Services (iWRDS)

Gain unprecedented access to individual-level datasets with real-world business contexts.

How It Works

Who is Eligible?
All Wharton and Penn faculty and students who will use the data for academic purposes – research, capstone and course projects, and independent study – though access eligibility varies by dataset.

Browse Catalog
Available datasets are listed alphabetically below. Click on the dataset title or the “+” icon to see additional information about the data and access eligibility.

Request Data
To request access, please complete this form. If you are interested in accessing multiple datasets, you must submit a form for each dataset.

Available Datasets

CANDOR Conversation Corpus, from BetterUp and Wharton Operations, Information and Decisions

The CANDOR corpus is a large, novel, multimodal corpus of 1,656 recorded conversations in spoken English. This 7+ million word, 850 hour corpus totals over 1TB of audio, video, and transcripts, with moment-to-moment measures of vocal, facial, and semantic expression, along with an extensive survey of speaker post conversation reflections.

Data includes:

  • 7 million words
  • 850 hours of video-taped interactions between;
  • 1,500 individuals

Who can access: anyone at Penn, for the purpose of academic research

Chicago Policing Data, from Research on Policing Reform and Accountability

Four years of activity data from the Chicago Police Department, accompanied by Chicago census and crime data and shapefiles for Chicago census block groups, police districts, and police beats.

Data includes:

  • Information on over 33,000 police officers, who collectively reported:
  • Over 3 million shifts
  • Over 1 million stops
  • Over 300,000 arrests
  • Over 9,000 uses of force
  • Between January 2012 and January 2016

Who can access: anyone at Penn, for the purpose of academic research

CoreLogic, from Wharton Real Estate and Wharton Finance

CoreLogic is the trusted source for property intelligence, with deep knowledge of powerful economic, social, and environmental forces that promote healthy housing markets and thriving communities.

Data includes:

  • 10 million observations of property listing data
  • More than 600 variables describing listing details (e.g., listing date and price, listing office and agent, commission rate offered to buyer’s agent)
  • Property characteristics (number of bedrooms and bathrooms, remarks from sellers)
  • Transaction details when a sale occurs (sale price and date, purchasing office and agent)
  • Mortgage loan-level data (including origination characteristics and monthly performance data)

Who can access: Wharton faculty, PhD students supervised by Wharton faculty, and Wharton-affiliated researchers, subject to approval

DataView, from Penn Wharton Budget Model

DataView is a powerful new tool that simplifies collecting, visualizing, and analyzing government and other public data.

  • Search across millions of data series available from dozens of sources
  • Transform and combine data as desired using simple “point and click”
  • Visualize your data with graphs, tables, scatterplots, and animated US maps
  • Test your ideas using integrated regression analysis
  • Create an account to save your work for later as well as share with others

Who can access: anyone at Penn

Experian, from Wharton Real Estate and Wharton Finance

Experian is an American–Irish multinational consumer credit reporting company.

Data includes:

  • De-identified credit bureau records
  • Mortgage, auto, student loan, and credit card balances
  • Credit card limit information
  • Credit scores
  • Credit inquiries
  • Delinquencies, bankruptcies, and judgments

Who can access: Wharton faculty, PhD students supervised by Wharton faculty, and Wharton-affiliated researchers, subject to approval

FTSE Russell, from Wharton Computing

FTSE Russell is a leading global provider of benchmarks, analytics, and data solutions with multi-asset capabilities.

Data includes:

  • London Stock Exchange Holdings data for several of Russell’s indexes, including the Russell 1000 and Russell 2000

Who can access: anyone within Wharton

GDELT 1.0 Events, from Wharton Computing

Supported by Google Jigsaw, the GDELT Project monitors the world’s broadcast, print, and web news from nearly every corner of every country in over 100 languages and identifies the people, locations, organizations, themes, sources, emotions, counts, quotes, images and events driving our global society every second of every day, creating a free open platform for computing on the entire world.

Who can access: Wharton faculty, PhD students, postdocs, and staff. Penn and Wharton UGR/MBA students may access these open-source data here.

Historical Tweet Database, from Wharton Computing

In partnership with The Annenberg School, Wharton Computing has compiled a dataset of historical Tweets. Collection began in April 2012 and concluded in November 2022. This data is queryable using SQL. For more info, see here.

Data includes:

  • Over 14 billion tweet objects, representing about 1% of total Twitter volume

Who can access: anyone within Wharton or Annenberg, for the purpose of academic research

Lexis Nexis Corporate Affiliations, from Lippincott Library

International directory of corporate structure information for public and private companies. It reports firm details including location, size, executives and directors, and links to parent or subsidiary firms. Available coverage begins in 1993.

Who can access: anyone at Penn, for the purpose of academic research.

National Land Use Restrictiveness Index (NLURI), from Wharton GIS Lab

In partnership with Econsult Solutions, Inc. and the Penn Institute for Urban Research, and with funding from Freddie Mac, the Wharton GIS Lab has compiled a U.S. land use restrictiveness dataset at the level of Census Block Group, county, and state.

Data includes:

  • Observed and estimated maximum allowable housing units per acre for 32 U.S. states

Who can access: anyone within Wharton, upon approval of the research purpose

Nielsen, from Wharton Computing

Nielsen is a global leader in audience measurement, data and analytics, shaping the future of media.

The James M. Kilts Center for Marketing at Chicago Booth and the Nielsen Company have partnered to make two consumer marketing datasets available to US-based academic researchers.

These datasets are available for purchase to interested Penn parties at a substantial discount.

Who can access: tenured and tenure-track faculty, PhD students, and postdoctoral researchers from an accredited academic institution are eligible to have direct access to data from the Kilts Center. Each eligible researcher accessing the data must register and have approval from the Kilts Center.

Police Officer Registry, from Research on Policing Reform and Accountability

This is a dataset of 220,000 sworn law enforcement officers from 98 of the 100 largest policing agencies in the U.S. ranked by number of sworn officers, representing 1/3 of all police officers nationwide.

Data includes:

  • Officer names, ranks, and agencies; all obtained from public records
  • Agency-level estimates, using both public and commercial data, on party identification, political participation (turnout), household income, age, race/ethnicity, and gender
  • Aggregate party identification, political participation (turnout), household income, age, race/ethnicity, and gender for officers’ home U.S. Census tracts, as well as for civilians at large in the jurisdiction

Who can access: anyone at Penn, for the purpose of academic research

TRACE, from Wharton Computing

The Trade Reporting and Compliance Engine is the FINRA-developed vehicle that facilitates the mandatory reporting of over-the-counter transactions in eligible fixed income securities.

Who can access: anyone at Penn who is approved by FINRA upon application

Add Your Dataset

If you have a dataset that you would like to add to this catalog, simply complete our Data Intake Form. Please note that iWRDS datasets must have a data user agreement with the provider that allows for data use by Wharton faculty and students for educational purposes. If you have any questions, contact .

Additional Resources

Student Data Portal, from AI & Analytics for Business

10+ years of small to large datasets collected through AIAB corporate partnerships.

Business Databases, from Lippincott Library

120+ business databases provided by the Wharton School’s Lippincott Library.

Data Repository, from WRDS

600+ datasets from more than 50 vendors available for users at all experience levels.