Websci '25: Proceedings of the 17th ACM Web Science Conference 2025

Full Citation in the ACM Digital Library

SESSION: Session 1: Digital Identity & Social Systems

Early Detection of DDoS Attacks via Online Social Networks Analysis

Ines Abbes
Maurantonio Caprolu
Roberto Di Pietro

Distributed Denial of Service (DDoS) attacks pose a significant and ever-present threat to the availability and stability of online services. Existing detection techniques can be broadly classified into direct and indirect approaches, based on the data sources considered. Despite extensive research and development of direct DDoS detection methods, indirect approaches, particularly those involving Online Social Networks (OSNs) analysis, remain relatively unexplored.

In this paper, we address the above gap by investigating the feasibility of detecting early stage DDoS attacks from OSN data. With this aim, we first study how users report availability problems in OSNs by identifying the most prevalent terms and expressions consistently used to express the initial indicator of a DDoS attack. Then, we leverage this knowledge to efficiently process OSN messages, filtering out irrelevant content. Finally, we introduce a sophisticated detection model based on Bidirectional Encoder Representations from Transformers (BERT), fine-tuned using a dataset we collected from Twitter and preprocessed via our innovative filtering technique. Notably, our proposed approach is platform-independent and offers a generalizable solution.

We trained and tested our model with over 1.2 million tweets related to 33 major DDoS attacks from 2012 to 2023. The results are compelling: our model achieved a 0.979 F1 score, significantly outperforming existing techniques that do not exceed 0.571. These results demonstrate the potential of user-generated content as a viable support to complement existing direct DDoS detection strategies.

Social Biases in Knowledge Representations of Wikidata separates Global North from Global South

Paramita Das
Sai Keerthana Karnam
Aditya Bharat Soni
Animesh Mukherjee

Knowledge Graphs have become increasingly popular due to their wide usage in various downstream applications, including information retrieval, chatbot development, language model construction, and many others. Link prediction (LP) is a crucial downstream task for knowledge graphs, as it helps to address the problem of the incompleteness of the knowledge graphs. However, previous research has shown that knowledge graphs, often created in a (semi) automatic manner, are not free from social biases. These biases can have harmful effects on downstream applications, especially by leading to unfair behavior toward minority groups. To understand this issue in detail, we develop a framework – AuditLP – deploying fairness metrics to identify biased outcomes in LP, specifically how occupations are classified as either male or female-dominated based on gender as a sensitive attribute. We have experimented with the sensitive attribute of age and observed that occupations are categorized as young-biased, old-biased, and age-neutral. We conduct our experiments on a large number of knowledge triples that belong to 21 different geographies extracted from the open-sourced knowledge graph, Wikidata. Our study shows that the variance in the biased outcomes across geographies neatly mirrors the socio-economic and cultural division of the world, resulting in a transparent partition of the Global North from the Global South.

Unpacking the Dilemma: The Dual Impact of AI Instructors' Social Presence on Learners' Perceived Learning and Satisfaction, Mediated by the Uncanny Valley

Xinhui Chen
Dominic DiFranzo

In online education, a high student-to-teacher ratio makes it hard for teachers to provide every student with personalized and just-in-time instruction synchronously. Recent advances in conversational AI increase the possibility of incorporating AI instructors to address this challenge. However, little is known about how to create an effective AI instructor. Previous work highlights the importance of social presence from human teachers in enhancing students’ perceived learning and satisfaction. However, it is rare that social presence is designed for AI agents. This paper conducts an experimental study with 279 participants where they took a digital literacy lesson on an e-learning platform with AI instructors. We empirically studies the effect of AI instructors’ social presence has on students’ learning. Results suggested that AI instructors’ social presence positively impacts learners’ perceived learning and satisfaction. However, mediating by humanness, increasing AI instructors’ social presence may induce potential risks (e.g. a feeling of eeriness), which may undermine these benefits. Based on the results, we discuss a number of suggestions for designing AI instructors in online education.

VIKI: Systematic Cross-Platform Profile Inference of Tech Users

Ben Treves
Emiliano De Cristofaro
Yue Dong
Michalis Faloutsos

What can we learn about online users by comparing their profiles across different platforms? We use the term profile to represent displayed personality traits, interests, and behavioral patterns (e.g. offensiveness). We also use the term displayed personas to refer to the personas that users manifest on a platform. Though individuals have a single real persona, it is not difficult to imagine that people can behave differently in different "contexts" as it happens in real life (e.g. behavior in office, bar, football game). The vast majority of previous studies have focused on profiling users on a single platform. Here, we propose VIKI, a systematic methodology for extracting and integrating the displayed personas of users across different social platforms. First, we extract multiple types of information, including displayed personality traits, interests, and offensiveness. Second, we evaluate, combine, and introduce methods to summarize and visualize cross-platform profiles. Finally, we evaluate VIKI on a dataset that spans three platforms – GitHub, LinkedIn, and X. Our experiments show that displayed personas change significantly across platforms, with over 78% of users exhibiting a significant change. For instance, we find that neuroticism exhibits the largest absolute change. We also identify significant correlations between offensive behavior and displayed personality traits. Overall, we consider VIKI as an essential building block for systematic and nuanced profiling of users across platforms.

SESSION: Session 2: Content Analysis & User Narratives

Characterizing YouTube's Role in Online Gambling Promotion: A Case Study of Fortune Tiger in Brazil

Jessica Costa
Geovana Oliveira
Guilherme Fonseca
Davi Reis
Giancarlo Oliveira Teixeira
Washington Cunha
Leonardo Rocha
Carlos H. G. Ferreira

The rise of online gambling poses significant social and economic challenges. In Brazil, one example is Fortune Tiger, an online gambling game whose growing popularity is driven by influencer-driven content on social media platforms. While existing research has examined the promotion of gambling on various platforms—focusing on aspects such as content normalization and promotional strategies—YouTube’s role in this creator-driven ecosystem remains underexplored. To address this gap, this paper examines the promotion of Fortune Tiger on YouTube through large-scale and temporal analyses. We approach this problem as a stance detection task in natural language processing, developing a consensus-based classifier based on state-of-the-art models to quantify user approval or disapproval of gambling practices and understand promotion characteristics. Our findings reveal distinct engagement patterns, with favorable comments being more frequent, often repetitive, and suggesting possible automation. They come from both small and highly influential accounts, with active users often using recently created profiles—reinforcing the role of content creators in promoting gambling-related content. This study sheds light on the societal impact of gambling promotion on social media and provides a robust methodology for analyzing similar phenomena on such platforms.

Exploring Stance on Affirmative Action Through Reddit Narratives

Aria Pessianzadeh
Rezvaneh Rezapour

Affirmative Action (AA), is a controversial topic that aims to address historical inequalities in education and employment by considering race, gender, and ethnicity during the selection process. While some view AA as a necessary tool for promoting diversity and correcting systemic discrimination, others criticize it as reverse discrimination, arguing it unfairly favors certain groups. These conflicting views have led to heated debates, legal battles, and polarized public discourse. This study explores narratives of AA on social media, focusing on how people express their positions on this issue using stance analysis. After collecting 3,839 posts from 50 subreddits, we developed fine-grained stance categories to capture the nuances of this controversial discourse on Reddit and used LLM-based classifiers to identify stances in our data. Our results suggest that the majority of users on Reddit oppose AA in its current format, while many express skepticism or raise questions about it. Additionally, our topic modeling results highlight a broad range of themes related to societal, cultural, legal, and political aspects of AA. Finally, moral analysis indicates the prevalence of Fairness and Authority in AA narratives. Our work contributes to a better understanding of public attitudes toward AA and provides insights into people’s perspectives on social media. We also contribute to stance analysis methodologies, highlighting the complexities involved in detecting diverse opinions on highly charged topics.

Warning: This paper includes language and content that may be offensive or triggering.

How Personal Narratives Empower Politically Disinclined Individuals to Engage in Political Discussions

Tejasvi Chebrolu
Ponnurangam Kumaraguru
Ashwin Rajadesingan

Engaging in political discussions is crucial in democratic societies, yet many individuals remain politically disinclined due to various factors such as perceived knowledge gaps, conflict avoidance, or a sense of disconnection from the political system. In this paper, we explore the potential of personal narratives—short, first-person accounts emphasizing personal experiences—as a means to empower these individuals to participate in online political discussions. Using a text classifier that identifies personal narratives, we conducted a large-scale computational analysis to evaluate the relationship between the use of personal narratives and participation in political discussions on Reddit. We find that politically disinclined individuals (PDIs) are more likely to use personal narratives than more politically active users. Personal narratives are more likely to attract and retain politically disinclined individuals in political discussions than other comments. Importantly, personal narratives posted by politically disinclined individuals are received more positively than their other comments in political communities. These results emphasize the value of personal narratives in promoting inclusive political discourse.

Scientific Accountability: Detecting Salient Features of Retracted Articles

Muhammad Usman
Wolf-Tilo Balke

The dissemination of misinformation can have detrimental effects on society by promoting false narratives. Similarly, retracted articles can result in the propagation of inaccuracies in scientific information. Despite the formal retraction of these articles, they continue to be cited in scientific literature and online, thereby distorting public understanding and undermining trust in science. Currently, there is no reliable method for identifying articles that require further re-evaluation or even retraction. Moreover, the existing methods for assessing the impact of retracted articles on subsequent research rely primarily on explicit dependencies stated in citation text. However, these dependencies are often implicitly defined, complicating the assessment of their true impact on subsequent research. Given these limitations, our objective is to address two challenges: (i) to identify salient characteristics of retracted articles that can distinguish them from non-retracted articles, based on textual analysis of sections such as the Abstract, Methodology, Results, and Conclusion, as well as their measurable readability and certainty characteristics, which reflect the degree of confidence with which authors report their findings; and (ii) to identify implicit dependencies of retracted articles on subsequent studies by examining experimental protocols used in both citing and cited (retracted) articles. This study aims to provide a proactive approach for flagging potential retractions and their impact on subsequent studies, thereby enhancing scientific integrity.

SESSION: Session 3: Networked Economics

Decentralization Funded by Centralization: Understanding Crypto Joint-Investment Network

Junyu Zhang
Chenyu Zhou
Wei Cai

In the crypto industry, venture capital investors provide funding to drive the domain towards its decentralized vision, while many individual investors follow online investment news to aid their decision-making. By scraping investment event data from crypto data analytics websites, we construct and analyze the joint investment network of venture capital investors in the crypto industry. Our study reveals the centralized nature of the investor network in this supposedly decentralized domain, by identifying central nodes such as Coinbase Venture and disclosing their persistence of dominance, despite Coinbase’s relatively small market share as an exchange. Based on node features, we divide the network into different communities, hinting at investor clustering. To measure the robustness of this network, we simulate various attack strategies to model bankruptcy and risk propagation, confirming its vulnerability as seen in historical events. Additionally, using graph neural network approaches, we fill in unknown structural information of investors, mitigating information asymmetry in investment disclosures and achieving classification accuracy above 70% for tier ratings and over 65% for investor types. Results are validated using data from another platform, exhibiting consistency. This study sheds light on the dynamics of joint investment networks in this emerging tech domain, offering insights for both potential and existing investors.

Political Leanings in Web3 Betting: Decoding the Interplay of Political and Profitable Motives

Hongzhou Chen
Xiaolin Duan
Abdulmotaleb El Saddik
Wei Cai

Harnessing blockchain’s transparent user behavior data, we construct the Political Betting Leaning Score (PBLS) to measure political leanings in Web3 prediction markets. Focusing on Polymarket, we analyze behaviors across more than 15,000 addresses, 4,500 events, and 8,500 markets, capturing the intensity and direction of their political leanings by the PBLS. We validate PBLS through internal consistency checks and external comparisons and explore its relationships with over 800 behavioral features. A 2022 U.S. Senate election case demonstrates our measurement’s effectiveness while decoding the interaction between political and profitable motives. The insights contribute to understanding decentralized market behavior, demonstrate blockchain’s potential in innovative studies, and advance the broader development of prediction markets.

Trust Dynamics and Bot-Driven Responses: An Approach to Rug Pulls in Solana Meme Coin Markets

Yueyao Li
Nanjun Yao
Yuhui Huo
Wei Cai

This paper examines the dynamics of trust and bot-driven responses within the meme coin ecosystem on the Solana blockchain, with a particular emphasis on the interplay between social media-induced sentiment and on-chain transaction behaviors. Meme coins, which originate from internet culture and are heavily influenced by community sentiment, represent a volatile and distinct category of cryptocurrencies. Employing sentiment propagation networks, on-chain transaction data, and sentiment-transaction integrated models, we quantitatively analyze the relationship between emotional fluctuations and market behaviors. By contrasting Rug Pull scams with sustainable projects, we identify critical differences in the role of sentiment across different phases of project development. Our findings reveal three distinct sentiment-driven user trading behaviors: sentiment followers, makers, and stabilizers. The results indicate that, while sentiment is a primary driver of early-stage trading within Rug Pull projects, its influence diminishes as community distrust intensifies, resulting in more opportunistic and reactive trading patterns. This study contributes to the understanding of the co-evolution of memes, sentiment, and market dynamics, offering new insights into the complexities of decentralized finance ecosystems, with a specific focus on Solana-based meme coin markets.

Understanding Social Capital in Decentralized Social Platform: An Empirical Study on Friend.tech

Junyu Zhang
Huiyu Wang
Wei Cai

This study presents an empirical investigation into social capital formation in decentralized social platforms, focusing on Friend.tech. Following an empirical research framework, we first identify and characterize "3-3 trading behavior," a distinctive reciprocal trading mechanism that has not been observed in other decentralized platforms. This emergent phenomenon arises from the interaction between platform incentives—which structure key-based social and financial interactions—and user-driven strategies that leverage these mechanisms for reciprocal engagement. Building on these observations, we propose a Social Capital Index (SCI) model to quantify user value by integrating social and economic dimensions. Finally, we validate the SCI model using on-chain transaction data and airdrop allocations, demonstrating that it effectively approximates the platform’s internal user valuation despite the opaque nature of its reward mechanisms. Our findings provide empirical insights into the interplay between financial incentives and social capital in decentralized ecosystems. This study contributes to the understanding of incentive design, value attribution, and user engagement dynamics in blockchain-based social networks.

SESSION: Session 4: Media Credibility & Bias

Accuracy and Political Bias of News Source Credibility Ratings by Large Language Models

Kai-Cheng Yang
Filippo Menczer

Search engines increasingly leverage large language models (LLMs) to generate direct answers, and AI chatbots now access the Internet for fresh data. As information curators for billions of users, LLMs must assess the accuracy and reliability of different sources. This paper audits nine widely used LLMs from three leading providers—OpenAI, Google, and Meta—to evaluate their ability to discern credible and high-quality information sources from low-credibility ones. We find that while LLMs can rate most tested news outlets, larger models more frequently refuse to provide ratings due to insufficient information, whereas smaller models are more prone to making errors in their ratings. For sources where ratings are provided, LLMs exhibit a high level of agreement among themselves (average Spearman’s ρ = 0.79), but their ratings align only moderately with human expert evaluations (average ρ = 0.50). Analyzing news sources with different political leanings in the US, we observe a liberal bias in credibility ratings yielded by all LLMs in default configurations. Additionally, assigning partisan roles to LLMs consistently induces strong politically congruent bias in their ratings. These findings have important implications for the use of LLMs in curating news and political information.

DocNet: Semantic Structure in Inductive Bias Detection Models

Jessica Zhu
Michel Cukier
Iain Cruickshank

News will be biased so long as people have opinions. As social media becomes the primary entry point for news and partisan differences increase, it is increasingly important for informed citizens to be able to recognize bias. If people are aware of the biases of the news they consume, they could take action to avoid polarizing echo chambers. In this paper, we explore an often-overlooked aspect of bias detection in media: the semantic structure of news articles. We present DocNet, a novel, inductive, and low-resource document embedding and political bias detection model. We also demonstrate that the semantic structure of news articles from opposing political sides, as represented in document-level graph embeddings, have significant similarities. DocNet bypasses the need for pre-trained language models, reducing resource dependency while achieving comparable performance. It can be used to advance political bias detection in low-resource environments. Our code and data are made available at: https://github.com/jhzsquared/DocNet__CodeData

SemCAFE: When Named Entities make the Difference–Assessing Web Source Reliability through Entity-level Analytics

Gautam Kishore Shahi
Oshani Seneviratne
Marc Spaniol

With the shift from traditional to digital media, the online landscape now hosts not only reliable news articles but also a significant amount of unreliable content. Digital media has faster reachability by significantly influencing public opinion and advancing political agendas. While newspaper readers may be familiar with their preferred outlets’ political leanings or credibility, determining unreliable news articles is much more challenging. The credibility of many online sources is often opaque, with AI-generated content being easily disseminated at minimal cost. Unreliable news articles, particularly those that followed the Russian invasion of Ukraine in 2022, closely mimic the topics and writing styles of credible sources, making them difficult to distinguish. To address this, we introduce SemCAFE (Semantically enriched Content Assessment for Fake news Exposure), a system designed to detect news reliability by incorporating entity-relatedness into its assessment. SemCAFE employs standard Natural Language Processing (NLP) techniques, such as boilerplate removal and tokenization, alongside entity-level semantic analysis using the YAGO knowledge base. By creating a “semantic fingerprint” for each news article, SemCAFE could assess the credibility of 46,020 reliable and 3,407 unreliable articles on the 2022 Russian invasion of Ukraine. Our approach improved the macro F1 score by 12% over state-of-the-art methods. The sample data and code are available on GitHub¹.

Unite or divide? Biased search queries and Google Search results in polarized politics

Chau Tong

Research on search engine algorithmic curation has primarily focused on source diversity in response to general search terms, often overlooking the role of biased user input and its interaction with algorithmic filtering. Given that online users frequently engage in motivated information-seeking, it is crucial to examine how query variation influences search results and potentially reinforces information inequalities. This study employs a user-centered auditing approach through three real-world instances, where self-identified political partisans in the United States conducted Google searches using ideologically slanted queries related to the 2020 U.S. presidential election, abortion, and climate change. Analyzing search results across three topics and timepoints, the study finds that query slant—rather than user ideology—was the primary driver of search result differences. While personalization was observed, search results also exhibited a mainstreaming effect, often surfacing authoritative sources that may counteract polarization. However, slanted queries still led to distinct sets of information, raising concerns about algorithmic amplification of biased search behaviors. These findings contribute to the literature on algorithmic curation and information diversity, highlighting the need for ongoing audits of search engines to assess their role in shaping online political discourse and mitigating information inequalities in an era of heightened polarization.

SESSION: Session 5: AI Systems & Human Behavior

Can LLMs Assist Annotators in Identifying Morality Frames? - Case Study on Vaccination Debate on Social Media

Tunazzina Islam
Dan Goldwasser

Nowadays, social media is pivotal in shaping public discourse, especially on polarizing issues like vaccination, where diverse moral perspectives influence individual opinions. In NLP, data scarcity and complexity of psycholinguistic tasks, such as identifying morality frames, make relying solely on human annotators costly, time-consuming, and prone to inconsistency due to cognitive load. To address these issues, we leverage large language models (LLMs), which are adept at adapting new tasks through few-shot learning, utilizing a handful of in-context examples coupled with explanations that connect examples to task principles. Our research explores LLMs’ potential to assist human annotators in identifying morality frames within vaccination debates on social media. We employ a two-step process: generating concepts and explanations with LLMs, followed by human evaluation using a "think-aloud" tool. Our study shows that integrating LLMs into the annotation process enhances accuracy, reduces task difficulty, lowers cognitive load, suggesting a promising avenue for human-AI collaboration in complex psycholinguistic tasks.

Evaluation of Reliability Criteria for News Publishers with Large Language Models

Manuel Pratelli
John Bianchi
Fabio Pinelli
Marinella Petrocchi

In this study, we investigate the use of a large language model to assist in the evaluation of the reliability of the vast number of existing online news publishers, addressing the impracticality of relying solely on human expert annotators for this task. In the context of the Italian news media market, we first task the model with evaluating expert-designed reliability criteria using a representative sample of news articles. We then compare the model’s answers with those of human experts. The dataset consists of 352 news articles annotated by three human experts and the LLM. Examining 6,081 annotations over six criteria, we observe good agreement between LLM and human annotators in three evaluated criteria, including the critical ability to detect instances where a text negatively targets an entity or individual. For two additional criteria, such as the detection of sensational language and the recognition of bias in news content, LLMs generate fair annotations, albeit with certain trade-offs. Furthermore, we show that the LLM is able to help resolve disagreements among human experts, especially in tasks such as identifying cases of negative targeting.

Master of Deceit: Comparative Analysis of Human and Machine-Generated Deceptive Text

Quang Minh Trinh
Samiha Zarin
Rezvaneh Rezapour

Deception, the intentional act of creating false impressions, has long been studied in human interactions. With the emergence of AI and large language models (LLMs), deception now extends to machine-generated content, raising concerns about distinguishing between human and AI-created content. In this study, we compare deceptive and truthful texts produced by humans and LLMs (GPT-3.5 and GPT-4o) using two datasets; a crowdsourced online Review Dataset and a transcribed interview dataset (MU3D). We replicate the data generation process with LLMs, introducing personas into prompts to examine linguistic differences and potential biases. Using LIWC, we analyze word choice, complexity, and cognitive patterns across human- and LLM-generated deception and truthful texts. Our findings show that LLM-generated deception differs significantly from human deception, exhibiting greater verbosity, formality, and lexical sophistication, while human deception is more socially driven, relying more on social references, interpersonal cues, and natural conversational patterns. Despite improvements in LLMs, context-dependent biases remain embedded in LLM-generated texts, emphasizing the need for stronger bias mitigation strategies and responsible AI deployment. Our study identifies key linguistic markers that differentiate LLM-generated from human deception and highlights the importance of assessing hidden biases and potential risks in AI-generated deceptive text and misinformation.

SESSION: Session 6: Digital Communication and Mobilization

A Discourse Analysis Framework for Legislative and Social Media Debates

Arman Irani
Ju Yeon Park
Kevin Esterling
Michalis Faloutsos

How can we capture the dynamics of deliberation in a debate? In an increasingly divided and misinformed world, understanding the relationship between who is arguing and what they are arguing about is becoming critical for fostering a meaningful exchange of ideas. Given the vast array of available platforms for people to express their viewpoints and deliberate on issues, how can we develop methods to accurately analyze these processes? Luckily, there is an abundance of debate data available, ranging from: (a) formal proceedings, such as committee hearings in legislatures, to (b) online discussion forums, such as Reddit. Here we introduce DALiSM, a data-driven argument-centric framework, to analyze discourse dynamics in diverse and multi-party spaces at scale. We develop methods to harness and extend the state-of-the-art in computational argumentation for: (a) identifying arguments from long-form raw texts, (b) calculating the intensity of deliberation, and (c) modeling the evolution of discourse over time. We deploy our framework as a comprehensive and interactive dashboard for dynamically viewing the outputs of DALiSM to clearly understand the nature of a discourse event. To showcase the importance and utility of DALiSM, we apply our framework to U.S. congressional committee hearings from 2005 to 2023 (109th to 117th Congresses), and to selected Reddit communities from 2008 to 2023. This case study reveals substantive insights into deliberative behavior in online and offline spaces.

The Effects of Enterprise Social Media on Communication Networks

Manoel Horta Ribeiro
Teny Shapiro
Siddharth Suri

Enterprise social media platforms (ESMPs) are web-based platforms with standard social media functionality, e.g., communicating with others, posting links and files, liking content, etc., yet all users are part of the same company. The first contribution of this work is the use of a difference-in-differences analysis of 99 companies to measure the causal impact of ESMPs on companies’ communication networks across the full spectrum of communication technologies used within companies: email, instant messaging, and ESMPs. Adoption caused companies’ communication networks to grow denser and more well-connected by adding new, novel ties that often, but not exclusively, involve communication from one to many employees. Importantly, some new ties also bridge otherwise separate parts of the corporate communication network. The second contribution of this work, utilizing data on Microsoft’s own communication network, is understanding how these communication technologies connect people across the corporate hierarchy. Compared to email and instant messaging, ESMPs excel at connecting nodes distant in the corporate hierarchy both vertically (between leaders and employees) and horizontally (between employees in similar roles but different sectors). Also, influence in ESMPs is more ‘democratic’ than elsewhere, with high-influence nodes well-distributed across the corporate hierarchy. Overall, our results suggest that ESMPs boost information flow within companies and increase employees’ attention to what is happening outside their immediate working group above and beyond email and instant messaging.

The Role of Organizations in Networked Mobilization: Examining the 2011 Chilean Student Movement Through The Logic of Connective Action

Diego Gomez-Zara
Carolina Pérez Arredondo
Denis Parra

This study examines the communication mechanisms that shape the formation of digitally-enabled mobilization networks. Informed by the logic of connective action, we postulate that the emergence of networks enabled by organizations and individuals is differentiated by network and framing mechanisms. From a case comparison within two mobilization networks—one crowd-enabled and one organizationally-enabled—of the 2011 Chilean student movement, we analyze their network structures and users’ communication roles. We found that organizationally-enabled networks are likely to form from hierarchical cascades and crowd-enabled networks are likely to form from triadic closure mechanisms. Moreover, we found that organizations are essential for both kinds of networks: compared to individuals, organizations spread more messages among unconnected users, and organizations’ messages are more likely to be spread. We discuss our findings in light of the network mechanisms and participation of organizations and influential users.

SESSION: Session 7: Online Safety & Policy

Facilitating Gender Diverse Authorship: A Comparative Analysis of Academic Publisher Name Change Policies

Deanna Zarrillo

Post-publication name change policies are vital for safeguarding privacy and equity for authors navigating identity changes, including gender transitions, within academic publishing. Before the introduction of these policies in 2019, trans, non-binary, and gender diverse authors faced significant barriers, often risking privacy violations and disruptions to their academic records. This study employs thematic content analysis to assess the publicly available name change policies of nine academic journal publishers, examining their structure, discoverability, and alignment with inclusivity principles. Key findings reveal a lack of standardization across policies, with notable variation in content and accessibility. While privacy and correction mechanisms are commonly addressed, critical themes such as author engagement and broader industry context remain underdeveloped. The policies’ discoverability on publisher websites also varies widely, potentially limiting their utility to those who need them most. These gaps highlight covert marginalization embedded in policy design and communication. By situating this analysis within an ethic of care and the broader context of digital identity management, this study reveals how publishing policies intersect with web-based systems of scholarly communication. The findings urge academic publishers, technologists, and policymakers to co-create inclusive solutions that align with emerging metadata standards and ethical frameworks. This research lays a foundation for understanding how academic infrastructure can evolve to better serve diverse author communities in a connected and equitable web ecosystem.

Multilingualism, Transnationality, and K-pop in the Online #StopAsianHate Movement

Tessa Masis
Zhangqi Duan
Weiai Wayne Xu
Jane Yeahin Pyo
Ethan Zuckerman
Brendan O'Connor

The #StopAsianHate (SAH) movement is a broad social movement against violence targeting Asians and Asian Americans, beginning in 2021 in response to racial discrimination related to COVID-19 and sparking worldwide conversation about anti-Asian hate. However, research on the online SAH movement has focused on English-speaking participants so the spread of the movement outside of the United States is largely unknown. In addition, there have been no long-term studies of SAH so the extent to which it has been successfully sustained over time is not well understood. We present an analysis of 6.5 million "#StopAsianHate" tweets from 2.2 million users all over the globe and spanning 60 different languages, constituting the first study of the non-English and transnational component of the online SAH movement. Using a combination of topic modeling, user modeling, and hand annotation, we identify and characterize the dominant discussions and users participating in the movement and draw comparisons of English versus non-English topics and users. We discover clear differences in events driving topics, where spikes in English tweets are driven by violent crimes in the US but spikes in non-English tweets are driven by transnational incidents of anti-Asian sentiment towards symbolic representatives of Asian nations. We also find that global K-pop fans were quick to adopt the SAH movement and, in fact, sustained it for longer than any other user group. Our work contributes to understanding the transnationality and evolution of the SAH movement, and more generally to exploring upward scale shift and public attention in large-scale multilingual online activism.

Not Here, Go There: Analyzing Redirection Patterns on the Web

Kritika Garg
Sawood Alam
Dietrich Ayala
Michele C. Weigle
Michael L. Nelson

URI redirections are integral to web management, supporting structural changes, SEO optimization, and security. However, their complexities affect usability, SEO performance, and digital preservation. This study analyzed 11 million unique redirecting URIs, following redirections up to 10 hops per URI, to uncover patterns and implications of redirection practices. Our findings revealed that 50% of the URIs terminated successfully, while 50% resulted in errors, including 0.06% exceeding 10 hops. Canonical redirects, such as HTTP to HTTPS transitions, were prevalent, reflecting adherence to SEO best practices. Non-canonical redirects, often involving domain or path changes, highlighted significant web migrations, rebranding, and security risks. Notable patterns included “sink” URIs, where multiple redirects converged, ranging from traffic consolidation by global websites to deliberate “Rickrolling.” The study also identified 62,000 custom 404 URIs, almost half being soft 404s, which could compromise SEO and user experience. These findings underscore the critical role of URI redirects in shaping the web while exposing challenges such as outdated URIs, server instability, and improper error handling. This research offers a detailed analysis of URI redirection practices, providing insights into their prevalence, types, and outcomes. By examining a large dataset, we highlight inefficiencies in redirection chains and examine patterns such as the use of “sink” URIs and custom error pages. This information can help webmasters, researchers, and digital archivists improve web usability, optimize resource allocation, and safeguard valuable online content.

The Evolving Landscape of Youth Online Safety: Insights from News Media Analysis

Mohammad (Matt) Namvarpour
Elham Aghakhani
Michael D Ekstrand
Rezvaneh Rezapour
Afsaneh Razi

There have been various efforts to understand how youth online safety has been reflected in the news, as news play an important role in shaping public opinions. However, these efforts focused on specific contexts, such as individual countries and specific online risk types. Therefore, there is a need for a holistic view of understanding trends in the news regarding stakeholders involved and various ranges of online risks. In this work, we seek to understand how discussions of online safety for youth has evolved in news publications over the last two decades. We applied quantitative media content analysis and sentiment analysis to 3.9K English-language news articles from 2002–2024, documenting shifts in the portrayal of key stakeholders. Our results showed increased media focus on technology companies and government in youth safety discussions, particularly highlighting cyberbullying as a key risk. We found a generally negative trend in the sentiment toward the perceived safety of youth online, which fluctuates based on societal concerns and policy changes. The significance of this work lies in its analysis of how media discourse has illuminated public perceptions and policy directions concerning youth safety in digital spaces. Content Warning: This paper discusses sensitive topics, such as sex and child harassment, which may be triggering.

SESSION: Session 8: Political Discourse & Platform Moderation

Analyzing Political Discourse on Discord during the 2024 U.S. Presidential Election

Arthur Buzelin
Pedro Robles Dutenhefner
Marcelo Sartori Locatelli
Samira Malaquias
Pedro Bento
Yan Aquino
Lucas Dayrell
Victoria Estanislau
Caio Santana
Pedro Alzamora
Marisa Vasconcelos
Wagner Jr Meira
Virgilio Almeida

Social media networks have amplified the reach of social and political movements, but most research focuses on mainstream platforms such as X, Reddit, and Facebook, overlooking Discord. As a rapidly growing, community-driven platform with optional decentralized moderation, Discord offers unique opportunities to study political discourse. This study analyzes over 30 million messages from political servers on Discord discussing the 2024 U.S. elections. Servers were classified as Republican-aligned, Democratic-aligned, or unaligned based on their descriptions. We tracked changes in political conversation during key campaign events and identified distinct political valence and implicit biases in semantic association through embedding analysis. We observed that Republican servers emphasized economic policies and Democratic servers focusing on equality-related and progressive causes. Furthermore, we detected an increase in toxic language, such as sexism, in Republican-aligned servers after Kamala Harris’s nomination. These findings provide a first look at political behavior on Discord, highlighting its growing role in shaping and understanding online political engagement.

Digital Sabotage and Inter-Community Dynamics: An Empirical Examination of Reddit's 2023 Moderator Strike

Franz Xaver Waltenberger
Angelina Voggenreiter
Juergen Pfeffer

In June 2023, one of the largest recent instances of digital collective action took place on Reddit, with moderators of reportedly more than 8000 subreddits shutting down access to their communities as a reaction to the monetization of Reddit’s API. Another similar strike had previously occurred on the platform in 2015 and yielded considerable results for its moderators. This time however, Reddit’s management refused to meet the moderators’ demands and threatened to remove them from their positions if they did not reopen their communities. This study investigates the strike’s impact on user activity and subreddit dynamics by analyzing 144,686,420 submissions and 2,692,393,593 comments made on Reddit between January 1st to November 30th, 2023. Our findings indicate that while the strike had a short-term impact on overall platform activity, it did not result in long-term negative effects for Reddit as a whole. Instead, the strike had a strong negative effect on participating subreddits, leading to a sustained decrease in user engagement as well as a permanent, long-term shift of activity of previously active users towards subreddits that had not partaken in the strike. These results raise critical questions about the overall effectiveness of this type of protest.

Framing the Fray: Conflict Framing in Indian Election News Coverage

Tejasvi Chebrolu
Rohan Modepalle
N Harsha Vardhan
Ponnurangam Kumaraguru
Ashwin Rajadesingan

In covering elections, journalists often use conflict frames which depict events and issues as adversarial, often highlighting confrontations between opposing parties. Although conflict frames result in more citizen engagement, they may distract from substantive policy discussion. In this work, we analyze the use of conflict frames in online English-language news articles by seven major news outlets in the 2014 and 2019 Indian general elections. We find that the use of conflict frames is not linked to the news outlets’ ideological biases but is associated with TV-based (rather than print-based) media. Further, the majority of news outlets do not exhibit ideological biases in portraying parties as aggressors or targets in articles with conflict frames. Finally, comparing news articles reporting on political speeches to their original speech transcripts, we find that, on average, news outlets tend to consistently report on attacks on the opposition party in the speeches but under-report on more substantive electoral issues covered in the speeches such as farmers’ issues and infrastructure.

SESSION: Session 9: Contemporary Issues in Social Media

A Call to Arms: Automated Methods for Identifying Weapons in Social Media Analysis of Conflict Zones

Afia Abedin
Abdul Bais
Cody Buntain
Laura Courchesne
Brian McQuinn
Matthew E. Taylor
Muhib Ullah

The massive proliferation of social media data represents a transformative opportunity for conflict studies and for tracking the proliferation and use of weaponry, as conflicts are increasingly documented in these online spaces. At the same time, the scale and types of data available are problematic for traditional open-source intelligence. This paper focuses on identifying specific weapon systems and the insignias of the armed groups using them as documented in the Ukraine war, as these tasks are critical to operational intelligence and tracking weapon proliferation, especially given the scale of international military aid given to Ukraine. The large scale of social media makes manual assessment difficult, however, so this paper presents early work that uses computer vision models to support this task. We demonstrate that these models can both identify weapons embedded in images shared in social media and how the resulting collection of military-relevant images and their post times interact with the offline, real-world conflict. Not only can we then track changes in the prevalence of images of tanks, land mines, military trucks, etc., we find correlations among time series data associated with these images and the daily fatalities in this conflict. This work shows substantial opportunity for examining similar online documentation of conflict contexts, and we also point to future avenues where computer vision can be further improved for these open-source intelligence tasks.

Beyond Sentiment: Examining the Role of Moral Foundations in User Engagement with News on Twitter

Jacopo D'Ignazi
Kyriaki Kalimeri
Mariano Gastón Beiró

This study uses sentiment analysis and the Moral Foundations Theory (MFT) to characterise news content in social media and examine its association with user engagement. We employ Natural Language Processing to quantify the moral and affective linguistic markers. At the same time, we automatically define thematic macro areas of news from major U.S. news outlets and their Twitter followers (Jan 2020 - Mar 2021). By applying Non-Negative Matrix Factorisation to the obtained linguistic features we extract clusters of similar moral and affective profiles, and we identify the emotional and moral characteristics that mostly explain user engagement via regression modelling. We observe that Surprise, Trust, and Harm are crucial elements explaining user engagement and discussion length and that Twitter content from news media outlets has more explanatory power than their linked articles. We contribute with actionable findings evidencing the potential impact of employing specific moral and affective nuances in public and journalistic discourse in today’s communication landscape. In particular, our results emphasise the need to balance engagement strategies with potential priming risks in our evolving media landscape.

Mapping News Narratives Using LLMs and Narrative-Structured Text Embeddings

Jan Elfes

Given the profound impact of narratives across various societal levels, from personal identities to international politics, it is crucial to understand their distribution and development over time. This is particularly important in online spaces. On the Web, narratives can spread rapidly and intensify societal divides and conflicts. While many qualitative approaches exist, quantifying narratives remains a significant challenge. Computational narrative analysis lacks frameworks that are both comprehensive and generalizable. To address this gap, we introduce a numerical narrative representation grounded in structuralist linguistic theory. Chiefly, Greimas’ Actantial Model represents a narrative through a constellation of six functional character roles. These so-called actants are genre-agnostic, making the model highly generalizable. We extract the actants using an open-source LLM and integrate them into a Narrative-Structured Text Embedding that captures both the semantics and narrative structure of a text. We demonstrate the analytical insights of the method on the example of 5000 full-text news articles from Al Jazeera and The Washington Post on the Israel-Palestine conflict. Our method successfully distinguishes articles that cover the same topics but differ in narrative structure.

Understanding Narratives of Trauma on Social Media

Mansi Saxena
Vaibhav Garg
Bhaskar Ray
Aura Mishra
Munindar Singh

Background: Victims of domestic and sexual violence often share their narratives on social media. Doing so helps them access validation, solidarity, and support from external sources, which has been shown to enhance resilience and facilitate healing. Problem Statement: We address two aspects of such narratives of trauma: (1) identifying causal relationships between narrative elements and (2) analyzing the effect of such elements on social support received. Method: We retrieved 5561 such narratives from Reddit, a popular online platform. We applied Large Language Models to extract features from these narratives and analyzed them computationally. Findings: Our analysis reveals that prolonged abuse increases self-blame and reduces the intent to seek legal advice; the presence of support increases the likelihood of a victim adopting coping strategies; night-time abuse and intoxication are strongly associated with higher rates of violence; victims experiencing nightmares are more likely to provide detailed descriptions of their abusers; suffering economic and familial abuse increases the support received online. Our research thus corroborates leading psychological theories of narrative, social support, and resilience in online stories and contributes to understanding trauma narratives. In this way, our research can facilitate enhanced social support for victims.

SESSION: Session 10: Platform Governance & User Safety

Can Large Language Models Effectively Mitigate Polarization in Social Media Text?

Lucas Raniére Juvino Santos
Leandro Balby Marinho
Claudio Elizio Calazans Campelo
Filippo Menczer
Alessandro Flammini

Social media are linked to phenomena, such as echo chambers and the spread of misinformation, that contribute to heightened political polarization. Textual content has been identified as a key factor in fueling online polarization, yet effective strategies to mitigate this issue remain underexplored. This study investigates the potential of Large Language Models (LLMs) to reduce textual polarization on social media platforms. We leveraged a large-scale dataset of tweets collected during Brazil’s last presidential election, one of the most polarized in the country’s history. We used an LLM to paraphrase polarized text and make it less polarized. Using a between-subjects experimental design with N=73 participants, we compared human perceptions of paraphrases generated by the LLM and by humans. Both LLM- and human-generated paraphrases significantly reduced perceived polarization in tweets while preserving textual coherence. Furthermore, LLMs performed comparably to humans in depolarizing content. These findings underscore the potential of LLMs as effective and scalable tools for mitigating polarization on social media, contributing to healthier online discourse.

Decentralized Discourse: Interaction Dynamics on Mastodon

Michael Brauweiler
Meher Chaitanya Pindiprolu
Jürgen Pfeffer
Ulrik Brandes

Decentralized online social networks, such as Mastodon, present compelling alternatives to centralized platforms such as X (formerly Twitter). Since its inception in 2016, Mastodon has experienced consistent growth, with a notable surge as users transitioned away from X. Its open-source framework and decentralized design have facilitated swift expansion across the Fediverse.

In this study, we explore the communication dynamics between Mastodon instances and the distribution of user communities, emphasizing their effects on cross-server communication – a critical scalability issue as the Fediverse continues to expand. We developed and analyzed network models at both the instance and user levels, introducing novel metrics to evaluate how communities align with instance structures. Our findings highlight potential scalability issues stemming from imbalances in instance sizes and misaligned community distributions.

Is it safe? Analysis of Live Streams Targeted at Kids on Twitch.tv

Luana Assis Silva
Kênia Carolina Gonçalves
Humberto T. Marques-Neto
Jussara M. Almeida

A number of recent studies have analyzed the exposure of kids to improper content (e.g., violence, sexual themes) on social media platforms. However, such studies are restricted mostly to YouTube. Our present goal is to contribute to such discussion by focusing on another platform very popular among the young audiences, namely Twitch.tv, a live video streaming and social media platform that has massive presence in the gaming world. To that end, we designed a crawler that uses the Twitch´s official API to periodically probe the platform for active streams with content which, based on the associated tags, explicitly target kids as viewers. We monitored the platform for one month, gathering a first of its kind dataset of live Twitch streams targeted at kids. Our analyses of the data revealed a number of streams that, despite explicitly targeting kids, have content (often games) improper to them. Even more, we found that, while Twitch follows one age classification system to categorize game streams as adequate to kids or not, inconsistencies across different classification systems reveal limitations of the platform that potentially put children in different regions of the world at risk of being exposed to content deemed improper.

Too Little, Too Late: Moderation of Misinformation around the Russo-Ukrainian Conflict

Gautam Kishore Shahi
Yelena Mejova

In this study, we examine the role of Twitter as a first line of defense against misinformation by tracking the public engagement with, and the platform’s response to, 500 tweets concerning the Russo-Ukrainian conflict which were identified as misinformation. Using a real-time sample of 543475 of their retweets, we find that users who geolocate themselves in the U.S. both produce and consume the largest portion of misinformation, however accounts claiming to be in Ukraine are the second largest source. At the time of writing, 84% of these tweets were still available on the platform, especially those having an anti-Russia narrative. For those that did receive some sanctions, the retweeting rate has already stabilized, pointing to ineffectiveness of the measures to stem their spread. These findings point to the need for a change in the existing anti-misinformation system ecosystem. We propose several design and research guidelines for its possible improvement.

SESSION: Poster Session

Emails by LLMs: A Comparison of Language in AI-Generated and Human-Written Emails

Weijiang Li
Yinmeng Lai
Sandeep Soni
Koustuv Saha

The growing excitement around generative AI (and LLMs) is fueling a heightened interest in the development of AI-assisted writing tools. One popular context is AI-assisted email writing, and this paper explores how AI-generated emails compare to human-written emails. We obtained human-written emails from the W3C corpus and generated analogous AI-generated emails using GPT-3.5, GPT-4, Llama-2, and Mistral-7B, and compared AI-generated and human-written emails using a suite of natural language analyses across syntactic, semantic, and psycholinguistic dimensions. AI-generated emails are generally consistent across different LLMs but differ significantly from human-written emails. Specifically, AI-generated emails tend to be more formal, verbose, and complex, whereas human-written emails are often more concise and personalized. While AI-generated emails are slightly more polite, both types exhibit a similar level of empathetic tone in language. Further, we qualitatively examined user perceptions of AI and human-written emails by conducting a small survey of 41 participants and interviewing a subset of them. This study highlights preliminary insights into generative AI’s distinct strengths and weaknesses in assisting email communication, and we discuss the theoretical and practical implications of the evolving landscape of AI-generated content.

Simulating the Effects of Multi-platform Use on Misinformation Diffusion and Countermeasures

Isabel Murdock
Kathleen M. Carley
Osman Yagan

The social media ecosystem is highly interconnected. Consequently, efforts to limit the spread of misinformation by individual platforms can have second-order effects on other platforms. This is further complicated by the diversity of moderation mechanisms provided on each platform and the level of multi-platform usage between the platforms. We explore these issues by performing experiments using a previously developed agent-based model of information diffusion between two heterogeneous platforms. By varying the share of users active on both platforms and the types of moderation employed by each platform, we study how multi-platform usage may impact the effectiveness of interventions taken to limit the spread of misinformation in a multi-platform context. We find that interventions taken on platforms with faster rates of content diffusion have a greater impact on the overall diffusion of misinformation, with content removal policies and detection algorithms being more effective than user banning in a multi-platform context.

Beyond the Lens: Quantifying the Impact of Scientific Documentaries through Amazon Reviews

Jill Naiman
Aria Pessianzadeh
Hanyu Zhao
AJ Christensen
Kalina Borkiewicz
Shriya Srikanth
Anushka Gami
Emma Maxwell
Louisa Zhang
Sri Nithya Yeragorla
Rezvaneh Rezapour

Engaging the public with science is critical for a well-informed population. A popular method of scientific communication is documentaries. Once released, it can be difficult to assess the impact of such works on a large scale, due to the overhead required for in-depth audience feedback studies. In what follows, we overview our complementary approach to qualitative studies through quantitative impact and sentiment analysis of Amazon reviews for several scientific documentaries. In addition to developing a novel impact category taxonomy for this analysis, we release a dataset containing 1296 human-annotated sentences from 1043 Amazon reviews for six movies created in whole or part by the Advanced Visualization Lab (AVL). This interdisciplinary team is housed at the National Center for Supercomputing Applications and consists of visualization designers who focus on cinematic presentations of scientific data. Using this data, we train and evaluate several machine learning and large language models, discussing their effectiveness and possible generalizability for documentaries beyond those focused on for this work. Themes are also extracted from our annotated dataset which, along with our large language model analysis, demonstrate a measure of the ability of scientific documentaries to engage with the public.

The Role of Curiosity in Information Dissemination on Telegram: A Case Study from the 2022 Brazilian Elections

Francisco Figueiredo Vasconcelos
Carlos Henrique Gomes Ferreira
Alexandre Magno de Sousa
Jussara Almeida

Social curiosity plays a pivotal role in driving information dissemination on social media platforms, influencing both individual behaviors and group dynamics. While its impact has been explored in platforms like WhatsApp, the mechanisms behind social curiosity remain underexamined, particularly in Telegram, a platform known for its unique user interactions and pivotal role in political mobilizations. Addressing this gap, this study examines the influence of social curiosity during the 2022 Brazilian elections, a period marked by significant political and social unrest. By analyzing almost 5 million messages across 137K users in public Telegram groups, we identify distinct curiosity stimulation profiles, dynamic transitions in user behaviors, and the varying influence of groups in amplifying curiosity. Our results highlight the profound role of social curiosity in shaping information flow and its strong connection to appealing and politically relevant topics. These findings underscore the societal implications of social curiosity, especially in high-stakes contexts where misinformation and political narratives can significantly influence public opinion and behavior.

Public versus Less-Public News Engagement on Facebook: Patterns Across Bias and Reliability

Alireza Mohammadinodooshan
Niklas Carlsson

The rapid growth of social media as a news platform has raised significant concerns about the influence and societal impact of biased and unreliable news on these platforms. While much research has explored user engagement with news on platforms like Facebook, most studies have focused on publicly shared posts. This focus leaves an important question unanswered: how representative is the public sphere of Facebook’s entire ecosystem? Specifically, how much of the interactions occur in less-public spaces, and do public engagement patterns for different news classes (e.g., reliable vs. unreliable) generalize to the broader Facebook ecosystem?

This paper presents the first comprehensive comparison of interaction patterns between Facebook’s more public sphere (referred to as public in paper) and the less public sphere (referred to as private). For the analysis, we first collect two complementary datasets: (1) aggregated interaction data for all Facebook posts (public + private) for 19,050 manually labeled news articles (225.3M user interactions), and (2) a subset containing only interactions with public posts (70.4M interactions). Then, through discussions and iterative feedback from the CrowdTangle team, we develop a robust method for fair comparison between these datasets.

Our analysis reveals that only 31% of news interactions occur in the public sphere, with significant variations across news classes. Engagement patterns in less-public spaces often differ, with users, for example, engaging more deeply in private contexts. These findings highlight the need to examine both public and less-public engagement to fully understand news dissemination on Facebook. The observed differences hold important implications on content moderation, platform governance, and policymaking, contributing to healthier online discourse.

GitHub Repository Complexity Leads to Diminished Web Archive Availability

David Calano
Michael Nelson
Michele Weigle

Software is often developed using versioned controlled software, such as Git, and hosted on centralized Web hosts, such as GitHub and GitLab. These Web hosted software repositories are made available to users in the form of traditional HTML Web pages for each source file and directory, as well as a presentational home page and various descriptive pages. We examined more than 12,000 Web hosted Git repository project home pages, primarily from GitHub, to measure how well their presentational components are preserved in the Internet Archive, as well as the source trees of the collected GitHub repositories to assess the extent to which their source code has been preserved. We found that more than 31% of the archived repository home pages examined exhibited some form of minor page damage and 1.6% exhibited major page damage. We also found that of the source trees analyzed, less than 5% of their source files were archived, on average, with the majority of repositories not having source files saved in the Internet Archive at all. The highest concentration of archived source files available were those linked directly from repositories’ home pages at a rate of 14.89% across all available repositories and sharply dropping off at deeper levels of a repository’s directory tree.

COBIAS: Assessing the Contextual Reliability of Bias Benchmarks for Language Models

Priyanshul Govil
Hemang Jain
Vamshi Bonagiri
Aman Chadha
Ponnurangam Kumaraguru
Manas Gaur
Sanorita Dey

Large Language Models (LLMs) often inherit biases from the web data they are trained on, which contains stereotypes and prejudices. Current methods for evaluating and mitigating these biases rely on bias-benchmark datasets. These benchmarks measure bias by observing an LLM’s behavior on biased statements. However, these statements lack contextual considerations of the situations they try to present. To address this, we introduce a contextual reliability framework, which evaluates model robustness to biased statements by considering the various contexts in which they may appear. We develop the Context-Oriented Bias Indicator and Assessment Score (COBIAS) to measure a biased statement’s reliability in detecting bias, based on the variance in model behavior across different contexts. To evaluate the metric, we augmented 2,291 stereotyped statements from two existing benchmark datasets by adding contextual information. We show that COBIAS aligns with human judgment on the contextual reliability of biased statements (Spearman’s ρ = 0.65, p = 3.4*10^{− 60}) and can be used to create reliable benchmarks, which would assist bias mitigation works. Our data and code are publicly available.¹

Warning: Some examples in this paper may be offensive or upsetting.

Characterizing Network Structure of Anti-Trans Actors on TikTok

Maxyn Rose Leitner
Rebecca Dorn
Fred Morstatter
Kristina Lerman

Content Warning: Trans-antagonistic Rhetoric and Terminology The recent proliferation of short form video social media sites such as TikTok has been effectively utilized for increased visibility, communication, and community connection amongst trans/nonbinary creators online. However, these same platforms have also been exploited by right-wing actors targeting trans/nonbinary people, enabling such anti-trans actors to efficiently spread hate speech and propaganda. Given these divergent groups, what are the differences in network structure between anti-trans and pro-trans communities on TikTok, and to what extent do they amplify the effects of anti-trans content? In this paper, we collect a sample of TikTok videos containing pro and anti-trans content, and develop a taxonomy of trans related sentiment to enable the classification of content on TikTok, and ultimately analyze the reply network structures of pro-trans and anti-trans communities. In order to accomplish this, we worked with hired expert data annotators from the trans/nonbinary community in order to generate a sample of highly accurately labeled data. From this subset, we utilized a novel classification pipeline leveraging Retrieval-Augmented Generation (RAG) with annotated examples and taxonomy definitions to classify content into pro-trans, anti-trans, or neutral categories. We find that incorporating our taxonomy and its logics into our classification engine results in improved ability to differentiate trans related content, and that Results from network analysis indicate many interactions between posters of pro-trans and anti-trans content exist, further demonstrating targeting of trans individuals, and demonstrating the need for better content moderation tools.

SmGNN: Link Prediction in Sparse Layers of Multi-layer Graphs

Huaisheng Zhu
Tianxiang Zhao
Zongyu Wu
Suhang Wang

Link prediction is a crucial task in multi-layer graphs for different applications, where real-world graphs often consist of multiple types of relations represented as different layers. However, these multi-layer graphs often suffer from missing edges, especially in specific layers with a high number of missing edges (sparse layers) due to privacy concerns. In this paper, we tackle the challenge of predicting missing links in such layers to enhance the link prediction performance in multi-layer graphs. Training a GNN directly for link prediction on the sparse layer with limited edges would be challenging for exploring missing links and may lead to sub-optimal performance. To tackle this problem, we propose a novel framework called Sparse Layer Reconstruction Multi-layer Graph Neural Network (SmGNN). SmGNN proposes to leverage information from other relation types (layers) to explore missing links in the sparse layer. By selectively fusing relevant information from other layers, we learn relevant representations that capture the characteristics of the sparse layer. Additionally, we incorporate node similarity information based on the relevant representation to enhance the graph structure of the sparse layer. By augmenting the graph structure, our approach improves the representation learning process and enables a more comprehensive exploration of relational patterns and connections within the sparse layer. Experimental evaluations on three real-world datasets demonstrate the effectiveness of our proposed SmGNN approach.

Cross-Platform Violence Detection on Social Media: A Dataset and Analysis

Celia Chen
Scotty Beland
Ingo Burghardt
Jill Byczek
William J. Conway
Eric Cotugno
Sadaf Davre
Megan Fletcher
Rajesh Kumar Gnanasekaran
Kristin Hamilton
Jordan Heustis
Tanaya Jha
Emily Klein
Hayden Kramer
Alex Leitch
Jessica Perkins
Casi Sherman
Celia Sterrn
Logan Stevens
Rebecca Zarrella
Jennifer Golbeck

Violent threats remain a significant problem across social media platforms. Useful, high-quality data facilitates research into the understanding and detection of malicious content, including violence. In this paper, we introduce a cross-platform dataset of 30,000 posts hand-coded for violent threats and sub-types of violence, including political and sexual violence. To evaluate the signal present in this dataset, we perform a machine learning analysis with an existing dataset of violent comments from YouTube. We find that, despite originating from different platforms and using different coding criteria, we achieve high classification accuracy both by training on one dataset and testing on the other, and in a merged dataset condition. These results have implications for content-classification strategies and for understanding violent content across social media.

Building Bridges between Users and Content across Multiple Platforms during Natural Disasters

Lynnette Hui Xian Ng
Iain J. Cruickshank
David Farr

Social media is a primary medium for information diffusion during natural disasters. The social media ecosystem has been used to identify destruction, analyze opinions and organize aid. While the overall picture and aggregate trends may be important, a crucial part of the picture is the connections on these sites. These bridges are essential to facilitate information flow within the network. In this work, we perform a multi-platform analysis (X, Reddit, YouTube) of Hurricanes Helene and Milton, which occurred in quick session to each other in the US in late 2024. We construct network graphs to understand the properties of effective bridging content and users. We find that bridges tend to exist on X, that bridging content is complex, and that bridging users have relatable affiliations related to gender, race and job. Public organizations can use these characteristics to manage their social media personas during natural disasters more effectively.

GPT-4o for Visual Political Communication: Toward Automated Image Type Analysis

Michael Achmann-Denkler
Mario Haim
Christian Wolff

This study explores the potential of multimodal large language models (LLMs), specifically GPT-4o, for automating visual political communication analysis on social media. Using a hierarchical decision tree, we guided non-expert annotators in categorizing Instagram campaign images, achieving reliable annotations (Krippendorff’s α = 0.66–0.86). The annotated dataset was used to test GPT-4o’s ability to classify images through prompts reflecting either a hierarchical structure or flat descriptions. Overall, classification for dominant categories like Campaign Event and Collage reached high F₁ scores (0.89-0.90), while hierarchies in prompts influenced the outcome minimally. These findings demonstrate that LLMs can effectively assist in classifying selected image types, reducing the workload for human annotators.

Can Generative Agent-Based Modeling Replicate the Friendship Paradox in Social Media Simulations?

Gian Marco Orlando
Valerio La Gatta
Diego Russo
Vincenzo Moscato

Generative Agent-Based Modeling (GABM) is an emerging simulation paradigm that combines the reasoning abilities of Large Language Models with traditional Agent-Based Modeling to replicate complex social behaviors, including interactions on social media. While prior work has focused on localized phenomena such as opinion formation and information spread, its potential to capture global network dynamics remains underexplored. This paper addresses this gap by analyzing GABM-based social media simulations through the lens of the Friendship Paradox (FP), a counterintuitive phenomenon where individuals, on average, have fewer friends than their friends. We propose a GABM framework for social media simulations, featuring generative agents that emulate real users with distinct personalities and interests. Using Twitter datasets on the US 2020 Election and the QAnon conspiracy, we show that the FP emerges naturally in GABM simulations. Consistent with real-world observations, the simulations unveil a hierarchical structure, where agents preferentially connect with others displaying higher activity or influence. Additionally, we find that infrequent connections primarily drive the FP, reflecting patterns in real networks. These findings validate GABM as a robust tool for modeling global social media phenomena and highlight its potential for advancing social science by enabling nuanced analysis of user behavior.

GenAI vs. Human Fact-Checkers: Accurate Ratings, Flawed Rationales

Yuehong Cassandra Tai
Khushi Navin Patni
Nicholas Hemauer
Bruce Desmarais
Yu-Ru Lin

Despite recent advances in understanding the capabilities and limits of generative artificial intelligence (GenAI) models, we are just beginning to understand their capacity to assess and reason about the veracity of content. We evaluate multiple GenAI models across tasks that involve the rating of, and reasoning about, the credibility of information. The information in our experiments comes from content that subnational U.S. politicians post to Facebook. We find that GPT-4o, one of the most used AI models in consumer applications, outperforms other models, but all models exhibit only moderate agreement with human coders. Importantly, even when GenAI models accurately identify low-credibility content, their reasoning relies heavily on linguistic features and “hard” criteria, such as the level of detail, source reliability and language formality, rather than an understanding of veracity. We also assess the effectiveness of summarized versus full content inputs, finding that summarized content holds promise for improving efficiency without sacrificing accuracy. While GenAI has the potential to support human fact-checkers in scaling misinformation detection, our results caution against relying solely on these models.

An Evaluation of the Google Perspective API by Race and Gender

Nitheesha Nakka

Research on American politicians demonstrates that minority politicians often face higher rates of uncivil speech online. While previous studies have focused on federal representatives, local politicians engage more frequently with constituents, increasing the risk of online and offline violence. These imminent threats to state-level officials highlight the need for tools that can measure the toxic discourse faced by local politicians, making algorithmic evaluation of these tools essential for understanding these interactions. To this end, this study evaluates the Google Perspective API, a widely used machine learning algorithm that quantifies toxicity. This study assesses the performance of the API across various demographic identities. Using a dataset of one million tweets directed at state legislators across all 50 states in January 2021, I identify significant gender and racial discrepancies in the algorithm’s performance. Specifically, the API demonstrates better performance in predicting toxicity toward men than toward women. The racial discrepancies are slightly more nuanced with the API performing better for some races and not others. This research underscores the importance of algorithmic validation and has implications for studies of algorithmic performance, online harassment and political communication.

Using Semantically Unrelated and Opposite Terms for In-Context Learning: A Case Study in Identifying Political Aversion in Tweets

Patrick Y. Wu

We investigate how semantic priors embedded in generative large language models (LLMs) interact with concept definitions in prompts, using political aversion detection as a case study. Through systematic variations in the wording of the definition of political aversion—replacing key words with opposite terms, semantically unrelated terms, or deliberately nonsensical strings—we examine how models process and apply these manipulated definitions in classification tasks. Our results show that certain LLMs maintain consistent performance across different prompt configurations, regardless of which terms are used or whether examples are included. Strong classification performances even with nonsensical definitions suggest these models may sometimes rely more on patterns in target content than definitions given in prompts. These findings challenge conventional assumptions about prompt engineering and raise important questions about how LLMs utilize information in prompts for classification decisions, while underscoring the need for careful validation when applying these methods to social and political science measurement tasks.

Internet Censorship through the Lens of Time Series Analysis

Dheeman Saha
Afsah Anwar
Abdullah Mueen

Internet censorship by nation-states to regulate the flow of information is evolving rapidly. Active traffic monitoring via deep packet inspection to track and control information flow is an attractive option. Active monitoring increases the delay in establishing connections and reduces the total packet flow to the censoring regime.

Internet measurement data contain traces of increased delay in establishing the connection, and thus allow us to infer about censorship. Large-scale Internet measurement datasets pose unique challenges for data mining, mainly due to their vastness and complexity. The Internet’s IP addresses are logically grouped into Autonomous Systems and domains, representing its hierarchical and organizational structure. These groupings introduce new challenges in analyzing and managing the interconnections. In addition, enormous amounts of data are generated, requiring robust computational storage, processing, and analysis methods.

This paper analyzes the delays in establishing Transmission Control Protocol (TCP) connections (via a TLS Handshake) from many vantage points in Russia. We have observed a significant increase in delays since the war between Russia and Ukraine. We describe our methodology for analyzing three-year-long trends in connection delays and demonstrate several patterns emerging from the data.