Internal Wharton Research Data Services (iWRDS)

Gain unprecedented access to individual-level datasets with real-world business contexts.

How It Works

Who is Eligible?
All Wharton and Penn faculty and students who will use the data for academic purposes – research, capstone and course projects, and independent study – though access eligibility varies by dataset.

Browse Catalog
Available datasets are listed alphabetically below. Click on the dataset title or the “+” icon to see additional information about the data and access eligibility.

Request Data
To request access, please complete this form. If you are interested in accessing multiple datasets, you must submit a form for each dataset.

Available Datasets

Annalect, from AI at Wharton

A unique and comprehensive dataset from Annalect, the data management division of the Omnicom Group, a leading global advertising, and marketing communications services company. This dataset includes exposures to email and online display advertisements from a travel business company, as well as conversions at the company’s website. Researchers will be able to track exposures, clicks, and conversions for 10,000 individual users (tracked by cookies) for ~60 days. As tourism consumers typically shop over the course of several weeks, this gives researchers the opportunity to explore how customers search for information about a highly considered product and how advertising affects the path to purchase.

Data includes:

  • Details about the exposure, including the type of ad, description, and size of the creative, and the campaign the creative was part of
  • Information about whether the user clicked on the ad, and if that click eventually led to a conversion
  • The type of conversion the user engaged in, such as exploring products or receiving a purchase confirmation

Who can access: anyone at Penn, for the purpose of academic research

Barnes Foundation, from AI at Wharton

The Barnes Foundation is a world-renowned nonprofit cultural and educational institution committed to transforming lives through art by sharing its unparalleled art collection, exhibitions, classes, and public programs with the widest audience possible.

Data includes:

  • Data on 300,000 customers, including members and non-members
  • Transactions
  • All purchase points, product info, and purchase channel
  • Historic product calendar and financial spreadsheets
  • List of promotions for non-members and non-members
  • Calendar of print mail campaigns

Who can access: anyone at Penn, for the purpose of academic research

Chicago Policing Data, from Research on Policing Reform and Accountability

Four years of activity data from the Chicago Police Department, accompanied by Chicago census and crime data and shapefiles for Chicago census block groups, police districts, and police beats.

Data includes:

  • Information on over 33,000 police officers, who collectively reported:
  • Over 3 million shifts
  • Over 1 million stops
  • Over 300,000 arrests
  • Over 9,000 uses of force
  • Between January 2012 and January 2016

Who can access: anyone at Penn, for the purpose of academic research

Clientivity, from AI at Wharton

Clientivity is a hotel booking software platform that empowers users to create, manage, and earn commission from personal, group, and corporate travel. The dataset includes funnel statistics, partner and end-user demographics, and hotel pricing trends.

Data includes:

  • 12,000 active partners
  • 53,000 partnering hotels, including location, star rating, and review count

Who can access: anyone at Penn, for the purpose of academic research

Coqovins, from AI at Wharton

Coqovins is a virtual sommelier that makes personalized wine recommendations through a chatbot at participating wine stores. The dataset includes wine attributes, wine reviews, and wine details.

Data includes:

  • 1,600 individual wine reviews
  • 9,100 wine attributes
  • 26,000 wine label details

Who can access: anyone at Penn, for the purpose of academic research

CoreLogic, from Wharton Real Estate and Wharton Finance

CoreLogic is the trusted source for property intelligence, with deep knowledge of powerful economic, social, and environmental forces that promote healthy housing markets and thriving communities.

Data includes:

  • 10 million observations of property listing data
  • More than 600 variables describing listing details (e.g., listing date and price, listing office and agent, commission rate offered to buyer’s agent)
  • Property characteristics (number of bedrooms and bathrooms, remarks from sellers)
  • Transaction details when a sale occurs (sale price and date, purchasing office and agent)
  • Mortgage loan-level data (including origination characteristics and monthly performance data)

Who can access: Wharton faculty, PhD students supervised by Wharton faculty, and Wharton-affiliated researchers, subject to approval

DataView, from Penn Wharton Budget Model

DataView is a powerful new tool that simplifies collecting, visualizing, and analyzing government and other public data.

  • Search across millions of data series available from dozens of sources
  • Transform and combine data as desired using simple “point and click”
  • Visualize your data with graphs, tables, scatterplots, and animated US maps
  • Test your ideas using integrated regression analysis
  • Create an account to save your work for later as well as share with others

Who can access: anyone at Penn

eMAXX Bond Holders, from Lippincott Library

Historical quarterly bond holder information, covering Q3 of 1998 through 2022. Subscription covers North American and Pacific bonds in the following market sectors:

  • Asset-Backed Securities (ABS)/Collateralized Debt Obligations (CDO)
  • Corporate
  • Government
  • Mortgage-Backed Securities (MBS)
  • Municipal

Who can access: anyone at Penn, for the purpose of academic research

Expedia, from AI at Wharton

Expedia, the largest online travel company in the world, provided a dataset that details events leading up to conversion (or failure to convert) for approximately 10,000 U.S.-based users searching for hotels in each of four geographic markets (Cancun, NYC, Paris, and Budapest).

Data includes:

  • Information about how the user arrived at Expedia
  • What promotional pages they have viewed
  • Details of their search query, such as dates and number of travelers
  • Which hotels were displayed in search results, which hotels were clicked on and which hotels were purchased

Who can access: anyone at Penn, for the purpose of academic research

Experian, from Wharton Real Estate and Wharton Finance

Experian is an American–Irish multinational consumer credit reporting company.

Data includes:

  • De-identified credit bureau records
  • Mortgage, auto, student loan, and credit card balances
  • Credit card limit information
  • Credit scores
  • Credit inquiries
  • Delinquencies, bankruptcies, and judgments

Please provide a budget code with your request. While costs are low, access to Experian data is billed based on scale of usage.

Who can access: Wharton faculty, PhD students supervised by Wharton faculty, and Wharton-affiliated researchers, subject to approval

Felix, from AI at Wharton

Felix is a chat-based platform for sending money from the U.S. to Mexico.  Felix Technologies Inc. is a technology company with the mission to make cross-border payments to Latin America as easy as sending a message on WhatsApp.  The data includes anonymous credit card transactions and associated risk data.  

Data includes:

  • Individual customer transactions, including:
  • Customer and card identifier,
  • Amount of the transaction,
  • Date and time,
  • Scores to determine if the transaction (and customer) was fraudulent
  • Whether the transaction was flagged as fraudulent

Who can access: anyone at Penn, for the purpose of academic research

FTSE Russell, from Wharton Computing

FTSE Russell is a leading global provider of benchmarks, analytics, and data solutions with multi-asset capabilities.

Data includes:

  • London Stock Exchange Holdings data for several of Russell’s indexes, including the Russell 1000 and Russell 2000

Who can access: anyone within Wharton

Fuel Cycle/Rent-A-Center, from AI at Wharton

Fuel Cycle is an all-in-one research platform that combines both qualitative and quantitative data to power real-time business decisions. Rent-A-Center stores offer name-brand furniture, electronics, appliances, computers, and smartphones through flexible rental purchase agreements that allow the customer to obtain ownership of the merchandise at the conclusion of an agreed upon rental period.

Data includes:

  • Product performance data from Rent-A-Center Rental agreement and rent to own performance metrics for eight TV models
  • Customer data, including demographics and customer status (new, active, reactivated)
  • Rental agreements, including purchase amounts, discounts, and whether it was a single agreement or if the TV was packaged with other items
  • Transactional level data associated with rental agreements, including product info, rate/price changes, whether the product was new or used, and sales channel
  • Store information
  • Survey data from Fuel Cycle Results from three separate surveys which collected data on specific TV models

Who can access: anyone at Penn, for the purpose of academic research

GDELT 1.0 Events, from Wharton Computing

Supported by Google Jigsaw, the GDELT Project monitors the world’s broadcast, print, and web news from nearly every corner of every country in over 100 languages and identifies the people, locations, organizations, themes, sources, emotions, counts, quotes, images and events driving our global society every second of every day, creating a free open platform for computing on the entire world.

Who can access: Wharton faculty, PhD students, postdocs, and staff. Penn and Wharton UGR/MBA students may access these open-source data here.

Hachette Book Group, from AI at Wharton

Hachette Book Group is a leading trade book publisher based in New York and a division of Hachette Livre (a Lagardère Company).

Data includes:

  • Information for ~2,200 books that generated significant traffic during a 12-month period, including:
  • Sales data, including shipments, aggregated point of sales (weekly), and affiliate marketing sales data
  • Social analytics data, including traffic from social media sites to website
  • Analytics for website pages related to books, including clicks, demographics, and visitor counts
  • Email campaign data
  • Book product metadata, including book information, current price, page count, genre, and ISBN
  • NPD BookScan (for sales data from competitors)
  • Online ad stats
  • Marketing spend/budgets

Who can access: anyone at Penn, for the purpose of academic research

Hearst, from AI at Wharton

Hearst is an American multinational mass media and business information conglomerate.  Hearst owns newspapers, magazines, television channels, and television stations, including the San Francisco Chronicle, the Houston Chronicle, Cosmopolitan and Esquire.  The data includes historical information for anonymous users (non-subscribers) who came to the Hearst news sites.  The project deliverables include a model (with code) to identify similarities and differences between subscribers and non-subscribers and model(s) to make recommendation(s) accordingly for what do to for each type of user, when they come to the Hearst news site.

Data includes:

  • Pageviews: Views of specific website pages, including the source of traffic and location
  • Content: Content viewed on the website pages, including the category, photo, and url
  • Paywall Info: Events where visitors hit a paywall, including the session and timestamp
  • Subscription Info: Information for the anonymous users who became subscribers for the website (or converted), including subscription durations and payment type

Project deliverables include:

  • Model (with code) to identify similarities and differences between subscribers and non-subscribers by:
    • Generating clusters of converters and non-converters using a K-Means Clustering model to group users with similar user characteristics together based on distances
    • Identifying non-converters who “look like” converters
  • Model (with code) to make recommendation(s) accordingly for what do to for each type of user:
    • For non-converters who “look like” converters, show a paywall
    • For non-converters who don’t necessarily “look like” converters, predict the conversion probability using an XGBoost predictive model

Who can access: anyone at Penn, for the purpose of academic research

Hertz, from AI at Wharton

The Hertz Corporation is a world leader in retail rental cars and equipment. This dataset includes employee engagement surveys linked to Hertz locations in the U.S. and Canada, transactions of rental cars in those locations and customer satisfaction surveys for those transactions. These data are longitudinal over a two-year window, providing opportunities for research from a variety of different angles. Studies of organizational behavior, customer loyalty and engagement, geographic retail transactions, up selling/add-on behavior, and customer segmentation are all possible in this rich and detailed dataset.

Data includes:

  • Over 68,000 responses to a semi-annual employee engagement survey
  • Over 3,000 rental locations in U.S. and Canada, all uniquely identified across data
  • Over 80,000 responses to a post-transaction customer satisfaction survey with detailed transaction data for the corresponding rental

Who can access: anyone at Penn, for the purpose of academic research

Historical Tweet Database, from Wharton Computing

In partnership with The Annenberg School, Wharton Computing has compiled a dataset of historical Tweets. Collection began in April 2012 and concluded in November 2022. This data is queryable using SQL. For more info, see here.

Data includes:

  • Over 14 billion tweet objects, representing about 1% of total Twitter volume

Who can access: anyone within Wharton or Annenberg, for the purpose of academic research

International Gaming Company, from AI at Wharton

An anonymous major sports video game franchise has provided data covering a three-year period, including annual releases of new versions and purchase incidences of virtual currency during that time.

Data includes:

  • Records on approximately 60,000 players covering up to three years of player behavior
  • Over 1.6 million unique game session records, including player ID, session duration, and game console used
  • Over 46,000 purchase incidences, including player ID, game console used, and timestamp of purchase

Who can access: anyone at Penn, for the purpose of academic research

Lexis Nexis Corporate Affiliations, from Lippincott Library

International directory of corporate structure information for public and private companies. It reports firm details including location, size, executives and directors, and links to parent or subsidiary firms. Available coverage begins in 1993.

Who can access: anyone at Penn, for the purpose of academic research.

National Land Use Restrictiveness Index (NLURI), from Wharton GIS Lab

In partnership with Econsult Solutions, Inc. and the Penn Institute for Urban Research, and with funding from Freddie Mac, the Wharton GIS Lab has compiled a U.S. land use restrictiveness dataset at the level of Census Block Group, county, and state.

Data includes:

  • Observed and estimated maximum allowable housing units per acre for 32 U.S. states

Who can access: anyone within Wharton, upon approval of the research purpose

Nielsen, from Wharton Computing

Nielsen is a global leader in audience measurement, data and analytics, shaping the future of media.

The James M. Kilts Center for Marketing at Chicago Booth and the Nielsen Company have partnered to make two consumer marketing datasets available to US-based academic researchers.

These datasets are available for purchase to interested Penn parties at a substantial discount.

Who can access: tenured and tenure-track faculty, PhD students, and postdoctoral researchers from an accredited academic institution are eligible to have direct access to data from the Kilts Center. Each eligible researcher accessing the data must register and have approval from the Kilts Center.

Police Officer Registry, from Research on Policing Reform and Accountability

This is a dataset of 220,000 sworn law enforcement officers from 98 of the 100 largest policing agencies in the U.S. ranked by number of sworn officers, representing 1/3 of all police officers nationwide.

Data includes:

  • Officer names, ranks, and agencies; all obtained from public records
  • Agency-level estimates, using both public and commercial data, on party identification, political participation (turnout), household income, age, race/ethnicity, and gender
  • Aggregate party identification, political participation (turnout), household income, age, race/ethnicity, and gender for officers’ home U.S. Census tracts, as well as for civilians at large in the jurisdiction

Who can access: anyone at Penn, for the purpose of academic research

Philadelphia Orchestra & Kimmel Center, from Analytics at Wharton

The 2021 union of the Philadelphia Orchestra (PO) and Kimmel Center (KC) brought together two heralded institutions in the Philadelphia performing arts community. With a campus located in the heart of the Avenue of the Arts, the partnered organizations play an integral role in the development and showcase of Philadelphia culture.  The data includes customer profile data, customer purchase data, and email marketing/journey data for 2017-present. 

Data includes:

  • Customer profiles, including customer number, age, gender, and subscription status for email opt-in and newsletters
  • Customer purchase data, including customer number, order information, and contribution with order
  • Email journey and email marketing data, including email description, customer numbers sent to, customer numbers opening email, customer numbers not opening email, customer numbers click through, and customer numbers unsubscribes 

Who can access: anyone at Penn, for the purpose of academic research

Quick Service Restaurant Chain, from AI at Wharton

An anonymous independent purchasing cooperative that serves as a supplier to a major quick service restaurant chain has provided a unique dataset, including individual transactions from approximately 2,300 restaurant locations across four geographic regions and all purchases made by 5,000 random individual customers over the course of two years. In addition to typical transaction data, the data also includes detailed information about what products each customer purchased and customer survey results – allowing a comprehensive view of the product and service quality for each customer purchase.

Data includes:

  • Franchise point of sale transactions, including details on which menu item(s) were purchased, quantities of each item, payment information, and any discounts/promotions applied to the order
  • Metadata on specific restaurants, including open/close date, and store type (such as street store vs. food court storefront)
  • Survey responses submitted by customers linked to individual restaurants

Who can access: anyone at Penn, for the purpose of academic research

Reed Smith, from AI at Wharton

Reed Smith is a dynamic international law firm, dedicated to helping clients move their businesses forward. The firm has more than 1,700 lawyers in 28 offices throughout the United States, Europe, the Middle East, and Asia.

Data includes:

  • Timecards data over three years, including task descriptions and codes, hours worked, amount billed, and information about the attorney
  • Legal matter data for 8,000-10,000 clients over three years, including types of work, tags, industry, and geography

Who can access: anyone at Penn, for the purpose of academic research

TRACE, from Wharton Computing

The Trade Reporting and Compliance Engine is the FINRA-developed vehicle that facilitates the mandatory reporting of over-the-counter transactions in eligible fixed income securities.

Who can access: anyone at Penn who is approved by FINRA upon application

Vivvix Advertising Data, from Lippincott Library

A database from Kantar Media, including advertising expenditure information on brands and product categories, industries, and companies across various media types including cable and network TV, broadcast radio networks, magazines, and newspapers. Functionality for creating customized data, comparative, and ranking reports in spreadsheet or text formats. Includes 5 years of data.

Who can access: anyone at Penn, for the purpose of academic research

Add Your Dataset

If you have a dataset that you would like to add to this catalog, simply complete our Data Intake Form. Please note that iWRDS datasets must have a data user agreement with the provider that allows for data use by Wharton faculty and students for educational purposes. If you have any questions, contact .

Additional Resources

Business Databases, from Lippincott Library

120+ business databases provided by the Wharton School’s Lippincott Library.

Data Repository, from WRDS

600+ datasets from more than 50 vendors available for users at all experience levels.

CANDOR Corpus, from BetterUp & OID

7+ million word, 850 hour corpus of audio, video, and transcripts from 1,656 recorded conversations.