Women in Data Science @ Penn Conference 2024 Agenda

Thursday, February 8

The Penn Museum | 3260 South St, Philadelphia, PA 19104

3:00–5:00 p.m.

Evolution of Data Storytelling
WIDS x Penn Museum Tour + Workshop

Join us for an exciting kick-off event at the Penn Museum as part of the Women in Data Science (WiDS) @ Penn conference, where the past meets the future in a guided tour and storytelling workshop.

Friday, February 9

Jon M. Huntsman Hall | 3730 Walnut Street, Philadelphia, PA 19104

8:30–9:00 a.m.

Check In + Grab-and-Go Breakfast

9:00–9:15 a.m.

Welcoming Remarks

Mary Purk
Executive Director, AI at Wharton

Susan Davidson
Weiss Professor of Computer and Information Science, University of Pennsylvania

Linda Zhao
Professor of Statistics and Data Science, The Wharton School

9:15–10:05 a.m.

Keynote Address

From Pattern Recognition to Cognition: What AI Means for the Society

The year 2023 will go down in the history of technology as the transformational year of “AI Awakening”, with AI being the topic of conversation from every corporate board room to your own dining room. This level of public enthusiasm has been thrilling, even to me as someone who has worked towards the gradual progression of the field of AI for the past 19 years.

In this talk, I’ll be talking about the gradual scientific progress in the field of AI, where are are coming from (Pattern Recognition) and where we are headed (Cognition). I’ll help demystify what AI means for the entire society, ranging from consumers, to enterprises, and our own humanity!

Nasrin Mostafazadeh
Co-Founder
Verneek

10:05–10:35 a.m.

How Do Large Language Models Think?

Large language models (LLMs) powered by the Transformer architecture, and large amounts of data and compute, have showcased emergent abilities of language understanding and reasoning, uniquely positioning them as general purpose tools. Despite these emergent abilities, these models are still far from perfect and fail unexpectedly when deployed in practice.

In this talk, I will first give an overview of the what, why, and how of LLMs and subsequently describe recent work that attempts to rigorously understand the inner workings of these models towards the goal of improving their reliability.

Surbhi Goel
Magerman Term Assistant Professor, Computer and Information Science, University of Pennsylvania

10:35–10:50 a.m.

Break

10:50–11:20 a.m.

Making the Most of Spatial Biology by Machine Learning

In this talk, I will present how machine learning techniques can be utilized to maximize the extraction of information from spatial omics data and how the resulting information can facilitiate biological discovery and clinical applications.

Mingyao Li
Professor of Biostatistics in Biostatistics and Epidemiology, University of Pennsylvania Perelman School of Medicine

11:20–11:35 a.m.

Transforming Online Education: ChaTA’s LLM-Based Teaching Assistant Solution

Introducing ChaTA, my team’s innovative AI solution that recently earned 1st place and a $20,000 prize at the Carnegie Mellon University Generative AI Hackathon. ChaTA tackles the challenge of limited teaching assistant support in online education, especially in low-cost courses. Built on the advanced LLaMA2 Large Language Model and enriched with a 10,000 Q&A pair database, it surpasses traditional AI in handling complex academic queries.

ChaTA leverages a vector database for prompt engineering and reinforcement learning with human feedback (RLHF), ensuring continuous refinement and accuracy. Our project not only curates a comprehensive knowledge base from CMU courses but also allows direct interaction between TAs, students, and the AI, enabling real-time feedback and improvement. This session will highlight ChaTA’s development, its impact on educational accessibility, and its potential to revolutionize teaching assistance through Generative AI.

Aashika Vishwanath
Computer Science Student
University of Pennsylvania

11:35–11:50 a.m.

Deciphering Heterogeneity in Alzheimer’s Disease Using Neuroimaging, Epigenetics, and Data Science

While Alzheimer’s disease (AD) is typically considered an amnestic, multi-domain disorder, at least 15% of individuals are considered atypical presentations. Often, but not always, atypical presentations are characterized by younger age of onset. Therefore, chronological age is strongly correlated with but does not fully explain atypicality. A biological definition of age may capture additional variance contributing to neurodegenerative patterns in the atypical AD spectrum.

Recent statistical approaches to study biological age include epigenetic clocks, which are algorithms that compute epigenetic age in individuals based on DNA methylation profiles in blood or tissue. Epigenetic age acceleration (EAA) is a robust measure of biological age defined as the residual of epigenetic age regressed on chronological age. Atypicality in AD could be defined along multiple axes, such as the spatial distribution of tau pathology and neurodegeneration, disease severity, or resistance (avoiding pathology) and resilience (coping despite pathological burden). We operationalize atypicality in AD as the relative neurodegeneration of cortex compared to the medial temporal lobe (MTL).

In this presentation, I will illustrate how data science methods are applied to neuroimaging and epigenetic data to investigate heterogeneous disease mechanism, resistance, and resilience in AD.

Lasya Sreepada
PhD Candidate, Bioengineering
University of Pennsylvania

11:50 a.m.–12:30 p.m.

Lunch

12:30 – 1:00 p.m.

Breakout Discussion & Lunch

Human-centered AI Research and Design

In this breakout session, Alyssa Hwang will discuss her experiences in human-centered AI research and design. The discussion will include observational studies of people using voice assistants, qualitative analysis of LLMs, future visions of more inclusive AI assistants, and other topics at the intersection of Natural Language Processing and Human-Computer Interaction. Ms. Hwang is also happy to answer questions about PhD programs, tech internships, and research in general.

Alyssa Hwang
PhD Student, Computer and Information Science
University of Pennsylvania

1:05–1:45 p.m.

Generative AI: Impact on Jobs, Education, Research

Hamsa Bastani
Associate Professor of Operations, Information, and Decisions at the Wharton School, University of Pennsylvania

Lilach Mollick
Director of Pedagogy, Wharton Interactive

Hether Danforth
General Manager, Education Solutions US Education and Academic Medical Centers, Microsoft

Susan Davidson
Weiss Professor of Computer and Information Science, University of Pennsylvania

1:45–2:00 p.m.

Analytics Accelerator – Kimmel Cultural Center

Through the Analytics Accelerator, we provided data analytics solutions to Kimmel Cultural Center to optimize their email marketing strategies, enhance customer engagement, and ultimately drive value. The project is centered around key questions on finding indicators for unsubscription, determining the optimal email sending schedule, and grouping customers for the purposes of email marketing. To answer these questions, we built a churn model using 70+ features to predict customer email unsubscription, established customer segmentation using clustering, performed text analysis, and developed an optimal email sending schedule.

Annie Wang
University of Pennsylvania Student

2:00–2:15 p.m.

Decoding Financial Decisions Through Neuroanalytics

Join me on an exciting journey where neuroscience meets AI to unveil the mysteries of financial decision-making. During the presentation, we will explore how the brain can unpack individual differences in their financial decisions, and how AI, inspired by sequential modeling, can dissect and predict complex uncertainty perceptions behind financial decision-making processes. This session explores the fusion between brain science and the role of AI, aiming to shed light on the future of building an individually customized financial investment strategy.

Betty Xu
MSE Electrical Engineering Student, University of Pennsylvania

2:15–2:50 p.m.

The Double-edged Sword of Banning Generative AI on Online Question-and-Answer Communities: Evidence from Stack Exchange

We investigate how banning generative artificial intelligence-generated content (AIGC) affects knowledge sharing in online question-and-answer communities. After the launch of ChatGPT in late November 2022, several online communities on Stack Exchange implemented official bans on AIGC. We collect data from all the communities on Stack Exchange and use the difference-in-differences (DID) approach to analyze the data. Results indicate that banning AIGC increases the demand for knowledge, but the overall process of answering these questions becomes less efficient. Interestingly, this impact is especially pronounced in non-STEM communities and for experienced users.

Moreover, we find that the AIGC ban increases knowledge demand for niche questions and other questions that Large Language Models (LLMs) cannot reliably answer. However, the ban also reduces the efficiency of answering popular questions and other questions that LLMs can answer reliably. Overall, this double-edged sword effect shows that banning AIGC engages users to ask more questions, but the increase in engagement is likely to wane as questions are not being answered.

The paper concludes by discussing the important implications for managers, moderators, and policymakers of online Q&A communities.

Lynn Wu
Associate Professor of Operations, Information and Decisions, University of Pennsylvania

2:50–3:00 p.m.

Closing Remarks

Mary Purk
Executive Director, AI at Wharton

Susan Davidson
Weiss Professor of Computer and Information Science, University of Pennsylvania

Linda Zhao
Professor of Statistics and Data Science, The Wharton School

Thursday, February 8

The Penn Museum | 3260 South St, Philadelphia, PA 19104

3:00–5:00 p.m.

Evolution of Data Storytelling WIDS x Penn Museum Tour + Workshop

Friday, February 9

Jon M. Huntsman Hall | 3730 Walnut Street, Philadelphia, PA 19104

8:30–9:00 a.m.

Check In + Grab-and-Go Breakfast

9:00–9:15 a.m.

Welcoming Remarks

9:15–10:05 a.m.

Keynote Address

From Pattern Recognition to Cognition: What AI Means for the Society

10:05–10:35 a.m.

How Do Large Language Models Think?

10:35–10:50 a.m.

Break

10:50–11:20 a.m.

Making the Most of Spatial Biology by Machine Learning

11:20–11:35 a.m.

Transforming Online Education: ChaTA’s LLM-Based Teaching Assistant Solution

11:35–11:50 a.m.

Deciphering Heterogeneity in Alzheimer’s Disease Using Neuroimaging, Epigenetics, and Data Science

11:50 a.m.–12:30 p.m.

Lunch

12:30 – 1:00 p.m.

Breakout Discussion & Lunch

1:05–1:45 p.m.

Generative AI: Impact on Jobs, Education, Research

1:45–2:00 p.m.

Analytics Accelerator – Kimmel Cultural Center

2:00–2:15 p.m.

Decoding Financial Decisions Through Neuroanalytics

2:15–2:50 p.m.

The Double-edged Sword of Banning Generative AI on Online Question-and-Answer Communities: Evidence from Stack Exchange

2:50–3:00 p.m.

Closing Remarks

Evolution of Data Storytelling
WIDS x Penn Museum Tour + Workshop