Computational Social Science

Uniting computer science, statistics, and social science to solve real-life problems through mass collaboration on path-breaking transparent research in partnership with industry, government, and civil society.


Virtual Lab

Building a state-of-the-art infrastructure for high-throughput experiments by large teams in open science format. 

Media and Democracy

Exploring how new media companies are changing the way we receive information and shaping the views we form about the world.

Using cutting-edge statistical techniques to analyze police-civilian interactions, measure racial bias in policing, evaluate policing policy reforms, and improve the performance of policing agencies.

Digital Organization

Using large-scale data on social interactions to predict and understand individual and group behavior, and to design new digital tools for more effective collaboration and collective action across organizational, spatial, and social boundaries.

Projects and Publications

An Automated Solution to Causal Inference in Discrete Settings

The goal of this project is to create a tool to automate causal inference from incomplete or imperfect data. This tool will reach a broad audience of applied researchers across the social and medical sciences by developing an easy-to-use front-end interface and implementing more efficient back-end optimizations. In addition, the project will create a series of data applications to illustrate its ease of use. 

Automated Analysis of Police Body-Worn Camera Footage

Why have body-worn cameras failed to live up to their hype as transformative tools for police accountability? In conversations with police commanders, we consistently hear that agencies lack the capacity to manually review hundreds or even thousands of hours of footage, nearly all of which is deleted before review due to the financial strain of video storage. We are working to develop automated techniques for analyzing massive-scale footage from police body-worn cameras (BWCs). With university seed funding, our team of computer-vision, speech analysis, and social science experts is annotating videos to train artificial intelligence algorithms, evaluating state-of-the-art analytic tools, and communicating closely with police departments to develop practical tools for police supervisors.

Descriptive Representation in Policing

Descriptive representation has largely been studied in electoral contexts, and tends to focus on a single trait (e.g. race or gender). We draw on a host of community, agency and officer-level data sources for the 100 largest U.S. police agencies to assess whether police resemble the communities they serve in terms of race/ethnicity, gender, age, and political affiliation/participation. We then use micro-level panel data on officers’ shift assignments and behavior to test whether deploying officers who resemble residents on various dimensions leads to differential treatment of civilians, relative to non-representative officers facing similar circumstances. Taken together, our analysis sheds light on the impediments to, and consequences of, descriptive representation in the coercive arm of government.

Do Online Video Recommendation Algorithms Increase Polarization?

Based largely on vivid anecdotes, a media narrative has taken hold ascribing worrisome power to hidden algorithms. Academic researchers have struggled to add evidence to this debate due to both data limitations and serious research design challenges. We undertake a novel experimental approach that seeks to resolve the impasse on this critical question by randomizing the recommendations themselves. Our team has built an interface that serves sequences of videos using the recommendation tree from YouTube’s API. We create our own variations on the existing algorithm by injecting differing proportions of cross-cutting video content (algorithmically matched according to several criteria) to randomly assigned groups of subjects. This approach enables an attractive combination of experimental control and external validity without the need for reverse-engineering YouTube’s blackbox recommendations. Subjects in our study will choose to watch a sequence of short videos and follow recommendations while the interface tracks watch time and other measures of engagement. After the viewing session, participants will be surveyed on issue opinions, affective polarization, and perceptions of media hostility to their preferred viewpoint. If online video recommendation algorithms drive affective polarization, our design will provide rigorous evidence of the extent and magnitude of the effect.

The Many Paths to Radicalization Within YouTube

YouTube, a giant platform with more than 2 billion monthly active users, has been understudied by the research community. Not only has usage of the platform grown in the last few years, daily share of news consumption on YouTube has grown dramatically as well. YouTube as a tool that enables its users to have a voice in mainstream media has also the potential for this audience to be exposed to low-quality, extreme, and conspiratorial content. The claim on the role of the platform itself via its recommendation algorithms, however, requires systematic evidence. Using a unique data set comprising a large representative sample of the US population,  and their online browsing histories,  both on and off the YouTube platform, we study how much radical content is in fact being consumed (vs. produced), how it is changing over time, and how it is being encountered (from recommendations vs. other entry points).

More than Words: How Political Rhetoric Shapes Voters' Affect and Evaluation

How is information communicated in political speech? While a rich literature has studied textual transcriptions of this speech, recent work suggests that important political information is lost when auditory and other channels are discarded. Through a series of observational analyses and experiments, we provide the first direct test of nontextual communication in political speech. Our study collects and analyzes the first corpus of campaign speech recordings, including textual transcriptions, auditory characteristics, and measures of audience response in the 2012 Presidential Election. We begin by examining the textual and auditory determinants of audience reactions. This observational test is then paired with an experimental evaluation in which we hold fixed the text of political speech while manipulating tone of voice. To do so, we first develop a new, efficient computational technique for textual pattern identification at scale. This method is applied to the speech transcriptions, producing a dataset of campaign catchphrases that are spoken repeatedly, often in differing tones of voice. We conduct experiments with these matched phrases on actual voters in the 2012 Presidential Election, showing that voters evaluate political communication differently even when the textual content is held fixed. Next, to isolate the effects of certain components of political communication, we hire voice actors to manually vary their reading of scripts drawn from our corpus of campaign speech. We use these recordings as treatments in an experiment, and find evidence that speech modulation is more important for candidate impression than are sentence-level measures. In sum, this paper lays the methodological foundation for a broad and rigorous research agenda into the nonverbal communication of political information.

Project Ratio

Since the 2016 US presidential election, the deliberate spread of misinformation online, and on social media in particular, has generated extraordinary concern, in large part because of its potential effects on public opinion, political polarization, and ultimately democratic decision making. However, proper understanding of misinformation and its effects requires a much broader view of the problem, encompassing biased and misleading–but not necessarily factually incorrect–information that is routinely produced or amplified by mainstream news organizations. Project Ratio measures the origins, nature, and prevalence of misinformation, broadly construed, as well as its impact on democracy.

Reforming Police Misconduct Investigations

Using unique access to an archive of administrative records on civilian complaints against police, this multi-wave experimental study seeks to assess how Philadelphia residents understand and perceive the current civilian complaint process, and systematically evaluate the impact of transparency initiatives on civic engagement and public trust in police.

Virtual Lab

Experimentally-based social science today is moving slower and finding less than it could be. Although recent advances in digital technologies and crowdsourcing services allow individual experiments to be deployed and run faster than in traditional physical labs, a majority of experiments still focus on one-off results that do not generalize easily to real-world contexts or even to other variations of the same experiment. To achieve replicable, generalizable, scalable and ultimately useful social science a fundamental rethinking of the model of virtual-laboratory style experiments is required. Not only is it possible to design and run experiments that are radically different in scale and scope than was possible in an era of physical labs; this ability allows us to ask fundamentally different types of questions than have been asked historically of lab studies. However, taking full advantage of this new and exciting potential will require four major changes to the infrastructure, methodology, and culture of experimental science: (1) significant investments in software design and participant recruitment; (2) innovations in experimental design and analysis of experimental data, (3) adoption of new models of collaboration, and (4) a new understanding of the relationship between theory and experiment. The Virtual Lab project pursues this ambitious path to facilitate a new class of scientific advances in our understanding of social phenomena.

Hofman, Jake M., Watts, Duncan J., Athey, Susan, Garip, Filiz, Griffiths, Thomas L., Kleinberg, Jon, Margetts, Helen, Mullainathan, Sendhil, Salganik, Matthew J., Vazire, Simine, Vespignani, Alessandro, and Yarkoni, Tal. “Integrating explanation and prediction in computational social science.” Nature (June 30 2021).

Jacobs, Abigail Z., and Duncan J. Watts. “A Large-Scale Comparative Study of Informal Social Networks in Firms.” Management Science (April 2021). DOI:

Watts, Duncan J., David M. Rothschild, and Markus Mobius. “Measuring the news and its impact on democracy.” Proceedings of the National Academy of Sciences 118, no. 15 (April 2021). DOI:

Knox, Dean and Christopher Lucas. “A Dynamic Model of Speech for the Social Sciences.” American Political Science Review (March 2021): 1-18. DOI: 10.1017/S000305542000101X

Almaatouq, Abdullah, Joshua Becker, James P. Houghton, Nicolas Paton, Duncan J. Watts, and Mark E. Whiting. “Empirica: a virtual lab for high-throughput macro-level experiments.” Behavior Research Methods (March 2021): 1-14. DOI:

Ba, Bocar A., Dean Knox, Jonathan Mummolo, and Roman Rivera. “The role of officer race and gender in police-civilian interactions in Chicago.” Science 371, no. 6530 (February 2021): 696-702. DOI: 10.1126/science.abd8694

Knox, Dean, and Jonathan Mummolo. “Toward a General Causal Framework for the Study of Racial Bias in Policing.” Journal of Political Institutions and Political Economy 1, no. 3 (August 2020): 341–78. DOI:

Lazer, David MJ, Alex Pentland, Duncan J. Watts, et al. “Computational social science: Obstacles and opportunities.Science 369, no. 6507 (August 2020): 1060-1062.

Heck, Patrick R., Christopher F. Chabris, Duncan J. Watts, and Michelle N. Meyer. “Objecting to experiments even while approving of the policies or treatments they compare.Proceedings of the National Academy of Sciences 117, no. 32 (July 2020): 18948-18950.

Almaatouq, Abdullah, Joshua Becker, James P. Houghton, Nicolas Paton, Duncan J. Watts, and Mark E. Whiting. “Empirica: a virtual lab for high-throughput macro-level experiments.arXiv preprint arXiv:2006.11398 (June 2020).

Salganik, Matthew J., Ian Lundberg, Alexander T. Kindel, Caitlin E. Ahearn, et al. “Measuring the predictability of life outcomes with a scientific mass collaboration.Proceedings of the National Academy of Sciences 117, no. 15 (April 2020): 8398-8403.

Allen, Jennifer, Baird Howland, Markus Mobius, David Rothschild, and Duncan J. Watts. “Evaluating the fake news problem at the scale of the information ecosystem.Science Advances 6, no. 14 (April 2020): eaay3539

Knox, Dean, Will Lowe, and Jonathan Mummolo. “Administrative Records Mask Racially Biased Policing.” American Political Science Review 114, no. 3 (January 2020): 619–37. DOI:

In the News

How Misinformation Hurts Democracy

David M. Rothschild, a Wharton graduate and economist with Microsoft Research, speaks with Wharton Business Daily on SiriusXM about the impact of misinformation on democracy.

From Small-World Networks to Computational Social Science

Professor Duncan Watts discusses his vision for CSS at Penn.

The Role of Officer Race and Gender in Police-Civilian Interactions in Chicago

The Team

Computational Social Science encompasses two collaborative research teams with shared interests and interrelated research agendas, lead by Professors Duncan Watts and Dean Knox.


Professor Duncan Watts - Analytics at Wharton Faculty Fellow

Duncan Watts
Stevens University Professor & twenty-third Penn Integrates Knowledge Professor

Professor Dean Knox - Analytics at Wharton Faculty Fellow

Dean Knox
Assistant Professor
Operations, Information and Decisions

Affiliated Scholars

Abdullah Almaatouq
Assistant Professor, Information Technology
Massachusetts Institute of Technology

Jonathan Mummolo
Assistant Professor, Politics and Public Affairs
Princeton University

David M. Rothschild
Microsoft Research


Valery Yakubovich
Executive Director

Rachel Mariman
Research Project Manager

Homa Hosseinmardi
Research Scientist

Mark Whiting
Postdoctoral Researcher

James Houghton
Postdoctoral Researcher