WSDM '25: Proceedings of the Eighteenth ACM International Conference on Web Search and Data Mining

Full Citation in the ACM Digital Library

SESSION: Plenary Session 1: Graph Neural Networks and Inferences

Graph Disentangle Causal Model: Enhancing Causal Inference in Networked Observational Data

Binbin Hu
Zhicheng An
Zhengwei Wu
Ke Tu
Ziqi Liu
Zhiqiang Zhang
Jun Zhou
Yufei Feng
Jiawei Chen

Estimating individual treatment effects (ITE) from observational data is a critical task across various domains. However, many existing works on ITE estimation overlook the influence of hidden confounders, which remain unobserved at the individual unit level. To address this limitation, researchers have utilized graph neural networks to aggregate neighbors' features to capture the hidden confounders and mitigate confounding bias by minimizing the discrepancy of confounder representations between the treated and control groups. Despite the success of these approaches, practical scenarios often treat all features as confounders and involve substantial differences in feature distributions between the treated and control groups. Confusing the adjustment and confounder and enforcing strict balance on the confounder representations could potentially undermine the effectiveness of outcome prediction. To mitigate this issue, we propose a novel framework called the Graph Disentangle Causal model (GDC) to conduct ITE estimation in the network setting. GDC utilizes a causal disentangle module to separate unit features into adjustment and confounder representations. Then we design a graph aggregation module consisting of three distinct graph aggregators to obtain adjustment, confounder, and counterfactual confounder representations. Finally, a causal constraint module is employed to enforce the disentangled representations as true causal factors. The effectiveness of our proposed method is demonstrated by conducting comprehensive experiments on two networked datasets.

SESSION: Poster Session 1

Mitigating Overfitting in Graph Neural Networks via Feature and Hyperplane Perturbation

Yoonhyuk Choi
Jiho Choi
Taewook Ko
Chong-Kwon Kim

Message-passing neural networks are widely employed in various graph mining applications. However, these methods are susceptible to the scarcity of labeled data, which often leads to overfitting. Our observations suggest that sparse initial vectors further exacerbate this issue by failing to fully represent the range of learnable parameters. This sparsity can hinder the optimization of specific dimensions in the initial projection matrix, as the training samples may not adequately span these parameters. To overcome this challenge, we propose a novel perturbation technique that introduces variability to the initial features and the projection hyperplane. Notably, even without employing grid search, we demonstrate that shifting with a small estimated value mitigates this problem more effectively than other perturbation methods. Experimental results on real-world datasets reveal that our technique significantly enhances node classification accuracy in semi-supervised scenarios.

Dynamic Graph Transformer with Correlated Spatial-Temporal Positional Encoding

Zhe Wang
Sheng Zhou
Jiawei Chen
Zhen Zhang
Binbin Hu
Yan Feng
Chun Chen
Can Wang

Learning effective representations for Continuous-Time Dynamic Graphs (CTDGs) has garnered significant research interest, largely due to its powerful capabilities in modeling complex interactions between nodes. A fundamental and crucial requirement for representation learning in CTDGs is the appropriate estimation and preservation of proximity. However, due to the sparse and evolving characteristics of CTDGs, the spatial-temporal properties inherent in high-order proximity remain largely unexplored. Despite its importance, this property presents significant challenges due to the computationally intensive nature of personalized interaction intensity estimation and the dynamic attributes of CTDGs. To this end, we propose a novel Correlated Spatial-Temporal Positional encoding that incorporates a parameter-free personalized interaction intensity estimation under the weak assumption of the Poisson Point Process. Building on this, we introduce the Dynamic Graph Transformer with Correlated Spatial-Temporal Positional Encoding (CorDGT), which efficiently retains the evolving spatial-temporal high-order proximity for effective node representation learning in CTDGs. Extensive experiments on seven small and two large-scale datasets demonstrate the superior performance and scalability of the proposed CorDGT. The code is available at: https://github.com/wangz3066/CorDGT.

S-Diff: An Anisotropic Diffusion Model for Collaborative Filtering in Spectral Domain

Rui Xia
Yanhua Cheng
Yongxiang Tang
Xiaocheng Liu
Xialong Liu
Lisong Wang
Peng Jiang

Recovering potential user preferences from user-item interaction matrices is a key challenge in recommender systems. While diffusion models can sample and reconstruct preferences from latent distributions, they often fail to capture similar users' collective preferences effectively. Additionally, latent variables degrade into pure Gaussian noise during the forward process, lowering the signal-to-noise ratio, which in turn degrades performance. To address this, we propose S-Diff, inspired by graph-based collaborative filtering, better to utilize low-frequency components in the graph spectral domain. S-Diff maps user interaction vectors into the spectral domain and parameterizes diffusion noise to align with graph frequency. As a result, this anisotropic diffusion retains significant low-frequency components, preserving a high signal-to-noise ratio. S-Diff further employs a conditional denoising network to encode user interactions, recovering true preferences from noisy data. This method achieves promising results across multiple datasets.

Maintaining k-MinHash Signatures over Fully-Dynamic Data Streams with Recovery

Andrea Clementi
Luciano Gualà
Luca Pepè Sciarria
Alessandro Straziota

We consider the task of performing Jaccard similarity queries over a large collection of items that are dynamically updated according to a streaming input model. An item here is a subset of a large universe U of elements. A well-studied approach to address this important problem in data mining is to design fast-similarity data sketches. In this paper, we focus on global solutions for this problem, i.e., a single data structure which is able to answer both Similarity Estimation and All-Candidate Pairs queries, while also dynamically managing an arbitrary, online sequence of element insertions and deletions received in input.

We introduce and provide an in-depth analysis of a dynamic, buffered version of the well-known k-MinHash sketch. This buffered version better manages critical update operations thus significantly reducing the number of times the sketch needs to be rebuilt from scratch using expensive recovery queries. We prove that the buffered k-MinHash uses O(k łog |U|) memory words per subset and that its amortized update time per insertion/deletion is O(k łog |U|) with high probability. Moreover, our data structure can return the k-MinHash signature of any subset in O(k) time, and this signature is exactly the same signature that would be computed from scratch (and thus the quality of the signature is the same as the one guaranteed by the static k-MinHash).

Analytical and experimental comparisons with the other, state-of-the-art global solutions for this problem given in [Bury et al.,WSDM'18] show that the buffered k-MinHash turns out to be competitive in a wide and relevant range of the online input parameters.

Hyperdimensional Representation Learning for Node Classification and Link Prediction

Abhishek Dalvi
Vasant Honavar

We introduce Hyperdimensional Graph Learner (HDGL), a novel method for node classification and link prediction in graphs. HDGL maps node features into a very high-dimensional space (hyperdimensional or HD space for short) using the injectivity property of node representations in a family of Graph Neural Networks (GNNs) and then uses HD operators such as bundling and binding to aggregate information from the local neighborhood of each node yielding latent node representations that can support both node classification and link prediction tasks. HDGL, unlike GNNs that rely on computationally expensive iterative optimization and hyperparameter tuning, requires only a single pass through the data set. We report results of experiments using widely used benchmark datasets which demonstrate that, on the node classification task, HDGL achieves accuracy that is competitive with that of the state-of-the-art GNN methods at substantially reduced computational cost; and on the link prediction task, HDGL matches the performance of DeepWalk and related methods, although it falls short of computationally demanding state-of-the-art GNNs.

Prospective Multi-Graph Cohesion for Multivariate Time Series Anomaly Detection

Jiazhen Chen
Mingbin Feng
Tony S. Wirjanto

Anomaly detection in high-dimensional time series data is pivotal for numerous industrial applications. Recent advances in multivariate time series anomaly detection (TSAD) have increasingly leveraged graph structures to model inter-variable relationships, typically employing Graph Neural Networks (GNNs). Despite their promising results, existing methods often rely on a single graph representation, which are insufficient for capturing the complex, diverse relationships inherent in multivariate time series. To address this, we propose the Prospective Multi-Graph Cohesion (PMGC) framework for multivariate TSAD. PMGC exploits spatial correlations by integrating a long-term static graph with a series of short-term instance-wise dynamic graphs, regulated through a graph cohesion loss function. Our theoretical analysis shows that this loss function promotes diversity among dynamic graphs while aligning them with the stable long-term relationships encapsulated by the static graph. Additionally, we introduce a "prospective graphing" strategy to mitigate the limitations of traditional forecasting-based TSAD methods, which often struggle with unpredictable future variations. This strategy allows the model to accurately reflect concurrent inter-series relationships under normal conditions, thereby enhancing anomaly detection efficacy. Empirical evaluations on real-world datasets demonstrate the superior performance of our method compared to existing TSAD techniques.

Large Language Model driven Policy Exploration for Recommender Systems

Jie Wang
Alexandros Karatzoglou
Ioannis Arapakis
Joemon M. Jose

Recent advancements in Recommender Systems (RS) have incorporated Reinforcement Learning (RL), framing the recommendation as a Markov Decision Process (MDP). However, offline RL policies trained on static user data are vulnerable to distribution shift when deployed in dynamic online environments. Additionally, excessive focus on exploiting short-term relevant items can hinder exploration, leading to sub-optimal recommendations and negatively impacting long-term user gains. Online RL-based RS also face challenges in production deployment, due to the risks of exposing users to untrained or unstable policies. Large Language Models (LLMs) offer a promising solution to mimic user objectives and preferences for pre-training policies offline to enhance the initial recommendations in online settings. Effectively managing distribution shift and balancing exploration are crucial for improving RL-based RS, especially when leveraging LLM-based pre-training.

To address these challenges, we propose an Interaction-Augmented Learned Policy (iALP) that utilizes user preferences distilled from an LLM. Our approach involves prompting the LLM with user states to extract item preferences, learning rewards based on feedback, and updating the RL policy using an actor-critic framework. Furthermore, to deploy iALP in an online scenario, we introduce an adaptive variant, A-iALP, that implements a simple fine-tuning strategy (A-iALP_ft), and an adaptive approach (A-iALP_ap) designed to mitigate issues with compromised policies and limited exploration. Experiments across three simulated environments demonstrate that A-iALP introduces substantial performance improvements.

SESSION: Plenary Session 2: Recommendation Systems

A Contrastive Framework with User, Item and Review Alignment for Recommendation

Hoang V. Dong
Yuan Fang
Hady W. Lauw

Learning effective latent representations for users and items is the cornerstone of recommender systems. Traditional approaches rely on user-item interaction data to map users and items into a shared latent space, but the limiting factor is often data sparsity. While reviews could mitigate this sparsity, existing review-aware recommendation models may not have fully exploited their potential. First, they typically rely heavily on reviews as side information or additional features, but reviews are not universal, with many users and items lacking them. Second, such approaches use reviews as supplementary information, allowing potential divergence or inconsistency between the review representations and the user-item space. To overcome these limitations, our work introduces a Review-centric Contrastive Alignment Framework for Recommendation (ReCAFR), which incorporates reviews into the core learning process, ensuring alignment among user, item, and review representations within a unified space. Specifically, we leverage two self-supervised contrastive strategies that not only exploit the review augmentation to alleviate sparsity, but also align the tripartite representations to enhance robustness. Empirical studies on public benchmark datasets demonstrate the effectiveness and robustness of ReCAFR

Facet-Aware Multi-Head Mixture-of-Experts Model for Sequential Recommendation

Mingrui Liu
Sixiao Zhang
Cheng Long

Sequential recommendation (SR) systems excel at capturing users' dynamic preferences by leveraging their interaction histories. Most existing SR systems assign a single embedding vector to each item to represent its features, and various types of models are adopted to combine these item embeddings into a sequence representation vector to capture the user intent. However, we argue that this representation alone is insufficient to capture an item's multi-faceted nature (e.g., movie genres, starring actors). Besides, users often exhibit complex and varied preferences within these facets (e.g., liking both action and musical films in the facet of genre), which are challenging to fully represent. To address the issues above, we propose a novel structure called Facet-Aware Multi-Head Mixture-of-Experts Model for Sequential Recommendation (FAME). We leverage sub-embeddings from each head in the last multi-head attention layer to predict the next item separately. A gating mechanism integrates recommendations from each head and dynamically determines their importance. Furthermore, we introduce a Mixture-of-Experts (MoE) network in each attention head to disentangle various user preferences within each facet. Each expert within the MoE focuses on a specific preference. A learnable router network is adopted to compute the importance weight for each expert and aggregate them. We conduct extensive experiments on four public sequential recommendation datasets and the results demonstrate the effectiveness of our method over existing baseline models.

Review-Based Hyperbolic Cross-Domain Recommendation

Yoonhyuk Choi
Jiho Choi
Taewook Ko
Chong-Kwon Kim

The issue of data sparsity poses a significant challenge to recommender systems. In response to this, algorithms that leverage side information such as review texts have been proposed. Furthermore, Cross-Domain Recommendation (CDR), which captures domain-shareable knowledge and transfers it from a richer domain (source) to a sparser one (target) has emerged recently. Nevertheless, existing methodologies assume an Euclidean embedding space, encountering difficulties in accurately representing richer text information and managing complex user-item interactions. This paper advocates a hyperbolic CDR approach for modeling review-based user-item relationships. We first emphasize that conventional distance-based domain alignment techniques may cause problems because small modifications in hyperbolic geometry result in magnified perturbations, ultimately leading to the collapse of hierarchical structures. To address this challenge, we propose hierarchy-aware embedding and domain alignment schemes that adjust the scale to extract domain-shareable information without disrupting structural forms. Extensive experiments substantiate the efficiency, robustness, and scalability of the proposed model. The source code is given here https://github.com/ChoiYoonHyuk/HEAD.

VARIUM: Variational Autoencoder for Multi-Interest Representation with Inter-User Memory

Nhu-Thuat Tran
Hady W. Lauw

Frameworks for discovering multiple user interest factors based on Variational AutoEncoder (VAE) has demonstrated competitive recommendation performance. However, as VAE only considers one user as input at a time, sharing across like-minded users may not be adequately facilitated. Moreover, interest sharing between users is not always available and thus, poses a challenge for VAE to explicitly model this information. To resolve this, we introduce an inter-user memory-based mechanism to unsupervisedly discover latent interest sharing between users under VAE framework. Concretely, we design a memory including an array of prototypes, each hypothetically representing a group of users sharing a particular interest. These memory prototypes are jointly trained with the backbone VAE-based recommendation model. For each user, we first discover multiple intra-user interest factors behind their item adoptions. Next, intra-user interest factors query to memory to retrieve the inter-user interest clues from like-minded users. This query-retrieve process is performed sequentially via a series of attention-transformation steps. Then, interest clues retrieved from memory are incorporated into interest factor representations of each user to increase their expressiveness. Thorough experiments on real-world datasets verify the strength of our method over an array of baselines. We further conduct qualitative analysis to understand the inner working of our memory-based refinement approach.

SESSION: Poster Session 2

The Initial Screening Order Problem

Jose M. Alvarez
Antonio Mastropietro
Salvatore Ruggieri

We investigate the role of the initial screening order (ISO) in candidate screening. The ISO refers to the order in which the screener searches the candidate pool when selecting k candidates. Today, it is common for the ISO to be the product of an information access system, such as an online platform or a database query. The ISO has been largely overlooked in the literature, despite its impact on the optimality and fairness of the selected k candidates, especially under a human screener. We define two problem formulations describing the search behavior of the screener given an ISO: the best-k, where it selects the top k candidates; and the good-k, where it selects the first good-enough k candidates. To study the impact of the ISO, we introduce a human-like screener and compare it to its algorithmic counterpart, where the human-like screener is conceived to be inconsistent over time. Our analysis, in particular, shows that the ISO, under a human-like screener solving for the good-k problem, hinders individual fairness despite meeting group fairness, and hampers the optimality of the selected k candidates. This is due to position bias, where a candidate's evaluation is affected by its position within the ISO. We report extensive simulated experiments exploring the parameters of the best-k and good-k problems for both screeners. Our simulation framework is flexible enough to account for multiple candidate screening tasks, being an alternative to running real-world procedures.

Cross-Domain Pre-training with Language Models for Transferable Time Series Representations

Mingyue Cheng
Xiaoyu Tao
Qi Liu
Hao Zhang
Yiheng Chen
Defu Lian

Pre-training universal models across multiple domains to enhance downstream tasks is a prevalent learning paradigm. However, there has been minimal progress in pre-training transferable models across domains for time series representation. This dilemma is incurred by two key factors: the limited availability of training set within each domain and the substantial differences in data characteristics between domains. To address these challenges, we present a novel framework, namely CrossTimeNet, designed to perform cross-domain self-supervised pre-training to benefit target tasks. Specifically, to address the issue of data scarcity, we utilize a pre-trained language model as the backbone network to effectively capture the sequence dependencies of the input time series. Meanwhile, we adopt the recovery of corrupted region inputs as a self-supervised optimization objective, taking into account the locality of the time series. To address discrepancies in data characteristics, we introduce a novel tokenization module that converts continuous time series inputs into discrete token sequences using vector quantization techniques. This approach facilitates the learning of transferable time series models across different domains. Extensive experimental results on diverse time series tasks, including classification and forecasting, demonstrate the effectiveness of our approach. Our codes are publicly available at https://github.com/Mingyue-Cheng/CrossTimeNet.

SESSION: Plenary Session 3: Large Language Models

LOGIN: A Large Language Model Consulted Graph Neural Network Training Framework

Yiran Qiao
Xiang Ao
Yang Liu
Jiarong Xu
Xiaoqian Sun
Qing He

Recent prevailing works on graph machine learning typically follow a similar methodology that involves designing advanced variants of graph neural networks (GNNs) to maintain the superior performance of GNNs on different graphs. In this paper, we aim to streamline the GNN design process and leverage the advantages of Large Language Models (LLMs) to improve the performance of GNNs on downstream tasks. We formulate a new paradigm, coined "LLMs-as-Consultants", which integrates LLMs with GNNs in an interactive manner. A framework named LOGIN (LLM cOnsulted GNN traINing) is instantiated, empowering the interactive utilization of LLMs within the GNN training process. First, we attentively craft concise prompts for spotted nodes, carrying comprehensive semantic and topological information, and serving as input to LLMs. Second, we refine GNNs by devising a complementary coping mechanism that utilizes the responses from LLMs, depending on their correctness. We empirically evaluate the effectiveness of Lalebox1 [0.8]O Galebox1 [0.8]IN on node classification tasks across both homophilic and heterophilic graphs. The results illustrate that even basic GNN architectures, when employed within the proposed LLMs-as-Consultants paradigm, can achieve comparable performance to advanced GNNs with intricate designs. Our code is available at https://github.com/QiaoYRan/LOGIN.

Beyond Answers: Transferring Reasoning Capabilities to Smaller LLMs Using Multi-Teacher Knowledge Distillation

Yijun Tian
Yikun Han
Xiusi Chen
Wei Wang
Nitesh V. Chawla

Transferring the reasoning capability from stronger large language models (LLMs) to smaller ones has been quite appealing, as smaller LLMs are more flexible to deploy with less expense. Among the existing solutions, knowledge distillation stands out due to its outstanding efficiency and generalization. However, existing methods suffer from several drawbacks, including limited knowledge diversity and the lack of rich contextual information. To solve the problems and facilitate the learning of compact language models, we propose TinyLLM, a new knowledge distillation paradigm to learn a small student LLM from multiple large teacher LLMs. In particular, we encourage the student LLM to not only generate the correct answers but also understand the rationales behind these answers. Given that different LLMs possess diverse reasoning skills, we guide the student model to assimilate knowledge from various teacher LLMs. We further introduce an in-context example generator and a teacher-forcing Chain-of-Thought strategy to ensure that the rationales are accurate and grounded in contextually appropriate scenarios. Extensive experiments on six datasets across two reasoning tasks demonstrate the superiority of our method. Results show that TinyLLM can outperform large teacher LLMs significantly, despite a considerably smaller model size. The source code is available at: https://github.com/YikunHan42/TinyLLM.

Large Language Model Simulator for Cold-Start Recommendation

Feiran Huang
Yuanchen Bei
Zhenghang Yang
Junyi Jiang
Hao Chen
Qijie Shen
Senzhang Wang
Fakhri Karray
Philip S Yu

Recommending cold items remains a significant challenge in billion-scale online recommendation systems. While warm items benefit from historical user behaviors, cold items rely solely on content features, limiting their recommendation performance and impacting user experience and revenue. Current models generate synthetic behavioral embeddings from content features but fail to address the core issue: the absence of historical behavior data. To tackle this, we introduce the LLM Simulator framework, which leverages large language models to simulate user interactions for cold items, fundamentally addressing the cold-start problem. However, simply using LLM to traverse all users can introduce significant complexity in billion-scale systems. To manage the computational complexity, we propose a coupled funnel ColdLLM framework for online recommendation. ColdLLM efficiently reduces the number of candidate users from billions to hundreds using a trained coupled filter, allowing the LLM to operate efficiently and effectively on the filtered set. Extensive experiments show that ColdLLM significantly surpasses baselines in cold-start recommendations, including Recall and NDCG metrics. A two-week A/B test also validates that ColdLLM can effectively increase the cold-start period GMV.

SESSION: Poster Session 3

SESSION: Plenary Session 4: Sequential and Temporal Data Modeling

Sequential Diversification with Provable Guarantees

Honglian Wang
Sijing Tu
Aristides Gionis

Diversification is a useful tool for exploring large collections of information items. It has been used to reduce redundancy and cover multiple perspectives in information-search settings. Diversification finds applications in many different domains, including presenting search results of information-retrieval systems and selecting suggestions for recommender systems.

Interestingly, existing measures of diversity are defined over sets of items, rather than evaluating sequences of items. This design choice comes in contrast with commonly-used relevance measures, which are distinctly defined over sequences of items, taking into account the ranking of items. The importance of employing sequential measures is that information items are almost always presented in a sequential manner, and during their information-exploration activity users tend to prioritize items with higher ranking.

In this paper, we study the problem of maximizing sequential diversity. This is a new measure of diversity, which accounts for the ranking of the items, and incorporates item relevance and user behavior. The overarching framework can be instantiated with different diversity measures, and here we consider the measures of sum diversity and coverage diversity. The problem was recently proposed by Coppolillo et al. [11], where they introduce empirical methods that work well in practice. Our paper is a theoretical treatment of the problem: we establish the problem hardness and present algorithms with constant approximation guarantees for both diversity measures we consider. Experimentally, we demonstrate that our methods are competitive against strong baselines.

Temporal Linear Item-Item Model for Sequential Recommendation

Seongmin Park
Mincheol Yoon
Minjin Choi
Jongwuk Lee

In sequential recommendation (SR), neural models have been actively explored due to their remarkable performance, but they suffer from inefficiency inherent to their complexity. Linear SR models exhibit high efficiency and achieve competitive or superior accuracy compared to neural models. However, they solely deal with the sequential order of items (i.e., sequential information) and overlook the actual timestamp (i.e., temporal information). It is limited to effectively capturing various user preference drifts over time. To address this issue, we propose a novel linear SR model, named TemporAl LinEar item-item model (TALE), incorporating temporal information while preserving training/inference efficiency. It consists of three key components. (i) Single-target augmentation concentrates on a single target item, enabling us to learn the temporal correlation for the target item. (ii) Time interval-aware weighting utilizes the actual timestamp to discern the item correlation depending on time intervals. (iii) Trend-aware normalization reflects the dynamic shift of item popularity over time. Our empirical studies show that TALE outperforms ten competing SR models by up to 18.71% gains across five benchmark datasets. It also exhibits remarkable effectiveness for evaluating long-tail items by up to 30.45% gains. The source code is available at https://github.com/psm1206/TALE.

Oracle-guided Dynamic User Preference Modeling for Sequential Recommendation

Jiafeng Xia
Dongsheng Li
Hansu Gu
Tun Lu
Peng Zhang
Li Shang
Ning Gu

Sequential recommendation methods can capture dynamic user preferences from user historical interactions to achieve better performance. However, most existing methods only use past information extracted from user historical interactions to train the models, leading to the deviations of user preference modeling. Besides past information, future information is also available during training, which contains the ''oracle'' user preferences in the future and will be beneficial to model dynamic user preferences. Therefore, we propose an oracle-guided dynamic user preference modeling method for sequential recommendation (Oracle4Rec), which leverages future information to guide model training on past information, aiming to learn ''forward-looking'' models. Specifically, Oracle4Rec first extracts past and future information through two separate encoders, then learns a forward-looking model through an oracle-guiding module which minimizes the discrepancy between past and future information. We also tailor a two-phase model training strategy to make the guiding more effective. Extensive experiments demonstrate that Oracle4Rec is superior to state-of-the-art sequential methods. Further experiments show that Oracle4Rec can be leveraged as a generic module in other sequential recommendation methods to improve their performance with a considerable margin.

Neo-TKGC: Enhancing Temporal Knowledge Graph Completion with Integrated Node Weights and Future Information

Zihan Qiu
Xiaoling Zhou
Chunyan An
Qiang Yang
Zhixu Li

Temporal Knowledge Graph Completion (TKGC) involves predicting and filling in missing facts within time series data, a crucial task with wide-ranging applications across various domains. The dynamic evolution of Temporal Knowledge Graphs (TKGs) adds complexity to this task, making it inherently challenging. Existing research predominantly relies on historical data to complete the missing facts. However, these approaches often overlook the potential of future information and the significance of node weights.To address these challenges, we propose Neo-TKGC, a novel temporal knowledge graph completion model that integrates a graph structure encoding module and a temporal encoding module. The graph structure encoding module introduces node weights to enhance the capabilities of graph neural networks (GNNs) for entity and relation representation learning, implemented using CompGCN. This module can be easily extended to any GNN models utilizing node and edge aggregation. The temporal encoding module leverages both future and historical information to capture relevant contexts and temporal dependencies among entities and relations.By combining node weights and future information, Neo-TKGC achieves more accurate entity and relation representations, thereby improving the model's ability to infer unknown entities. Extensive experiments on three real-world TKGC datasets demonstrate the superior performance of our model compared to existing approaches, achieving at least a 1.7% relative improvement in Hits@1 across most metrics.

SESSION: Poster Session 4

Hawkes Point Process-enhanced Dynamic Graph Neural Network

Zhiqiang Wang
Baijing Hu
Kaixuan Yao
Jiye Liang

Dynamic graph representation learning aims to capture the evolution of graph structures and obtain accurate node embeddings, a crucial task in graph machine learning. The Hawkes point process, a mathematical framework effective for modeling the influence of historical events on future occurrences, has been validated as a powerful tool for capturing the dynamics of graph evolution in dynamic graph representation learning. However, existing dynamic graph representation learning methods based on the Hawkes point process primarily model excitation at the individual node level, failing to adequately account for structural influences during graph evolution. This limitation restricts their ability to comprehensively capture network evolution patterns. To address this limitation, we propose a Hawkes Point Process-enhanced Dynamic Graph Neural Network (HP-DGNN) model. This model leverages the Hawkes point process to model both individual node histories and structural histories, capturing their respective influences on future node interactions. By integrating individual and structural influences in computing Hawkes conditional intensity, the model comprehensively captures the impacts of both layers on future node interactions. We evaluate our proposed model on two downstream tasks of dynamic graph representation learning: dynamic link prediction and future node degree prediction. Compared to 12 state-of-the-art methods, our model consistently demonstrates superior performance, underscoring its effectiveness in capturing the complexities of graph evolution.

Optimizing Blockchain Analysis: Tackling Temporality and Scalability with an Incremental Approach with Metropolis-Hastings Random Walks

Junliang Luo
Xue Liu

Blockchain technology, with implications in the financial domain, offers data in the form of large-scale transaction networks. Analyzing transaction networks facilitates fraud detection, market analysis, and supports government regulation. Despite many graph representation learning methods for transaction network analysis, we pinpoint two salient limitations that merit more investigation. Existing methods predominantly focus on the snapshots of transaction networks, sidelining the evolving nature of blockchain transaction networks. Existing methodologies may not sufficiently emphasize efficient, incremental learning capabilities, which are essential for addressing the scalability challenges in ever-expanding large-scale transaction networks. To address these challenges, we employed an incremental approach for random walk-based node representation learning in transaction networks. Further, we proposed a Metropolis-Hastings-based random walk mechanism for improved efficiency. The empirical evaluation conducted on blockchain transaction datasets reveals comparable performance in node classification tasks while reducing computational overhead. Potential applications include transaction network monitoring, the efficient classification of blockchain addresses for fraud detection or the identification of specialized address types within the network.

SCONE: A Novel Stochastic Sampling to Generate Contrastive Views and Hard Negative Samples for Recommendation

Chaejeong Lee
Jeongwhan Choi
Hyowon Wi
Sung-Bae Cho
Noseong Park

Graph-based collaborative filtering (CF) has emerged as a promising approach in recommender systems. Despite its achievements, graph-based CF models face challenges due to data sparsity and negative sampling. In this paper, we propose a novel Stochastic sampling for i) COntrastive views and ii) hard NEgative samples (SCONE) to overcome these issues. SCONE generates dynamic augmented views and diverse hard negative samples via a unified stochastic sampling approach based on score-based generative models. Our extensive experiments on 6 benchmark datasets show that SCONE consistently outperforms state-of-the-art baselines. SCONE shows efficacy in addressing user sparsity and item popularity issues, while enhancing performance for both cold-start users and long-tail items. Furthermore, our approach improves the diversity of the recommendation and the uniformity of the representations. The code is available at https://github.com/jeongwhanchoi/SCONE.

Towards Personalized Federated Multi-Scenario Multi-Task Recommendation

Yue Ding
Yanbiao Ji
Xun Cai
Xin Xin
Yuxiang Lu
Suizhi Huang
Chang Liu
Xiaofeng Gao
Tsuyoshi Murata
Hongtao Lu

In modern recommender systems, especially in e-commerce, predicting multiple targets such as click-through rate (CTR) and post-view conversion rate (CTCVR) is common. Multi-task recommender systems are increasingly popular in both research and practice, as they leverage shared knowledge across diverse business scenarios to enhance performance. However, emerging real-world scenarios and data privacy concerns complicate the development of a unified multi-task recommendation model.

In this paper, we propose PF-MSMTrec, a novel framework for personalized federated multi-scenario multi-task recommendation. In this framework, each scenario is assigned to a dedicated client utilizing the Multi-gate Mixture-of-Experts (MMoE) structure. To address the unique challenges of multiple optimization conflicts, we introduce a bottom-up joint learning mechanism. First, we design a parameter template to decouple the expert network parameters, distinguishing scenario-specific parameters as shared knowledge for federated parameter aggregation. Second, we implement personalized federated learning for each expert network during a federated communication round, using three modules: federated batch normalization, conflict coordination, and personalized aggregation. Finally, we conduct an additional round of personalized federated parameter aggregation on the task tower network to obtain prediction results for multiple tasks. Extensive experiments on two public datasets demonstrate that our proposed method outperforms state-of-the-art approaches. The source code and datasets will be released as open-source for public access.

ESA: Example Sieve Approach for Multi-Positive and Unlabeled Learning

Zhongnian Li
Meng Wei
Peng Ying
Xinzheng Xu

Learning from Multi-Positive and Unlabeled (MPU) data has gradually attracted significant attention from practical applications. Unfortunately, the risk of MPU also suffer from the shift of minimum risk, particularly when the models are very flexible. In this paper, to alleviate the shifting of minimum risk problem, we propose an Example Sieve Approach (ESA) to select examples for training a multi-class classifier. Specifically, we sieve out some examples by utilizing the Certain Loss (CL) value of each example in the training stage and analyze the consistency of the proposed risk estimator. Besides, we show that the estimation error of proposed ESA obtains the optimal parametric convergence rate. Extensive experiments on various real-world datasets show the proposed approach outperforms previous methods.

SESSION: Plenary Session 5: Graph Learning and Adaptation

FedGF: Enhancing Structural Knowledge via Graph Factorization for Federated Graph Learning

Pengyang Zhou
Chaochao Chen
Weiming Liu
Xinting Liao
Fengyuan Yu
Zhihui Fu
Xingyu Lou
Wu Wen
Xiaolin Zheng
Jun Wang

Federated graph learning involves training graph neural networks distributively on local graphs and aggregating model parameters in a central server. However, existing methods fail to effectively capture and leverage the inherent global structures, hindering local structural modeling. To address this, we propose Federated Graph Factorization (FedGF), which enhances structural knowledge via privacy-preserving graph factorization. Specifically, FedGF includes three modules, i.e., global structure reconstruction (GSR), local structure exploration (LSE), and global-local structure alignment (GLSA). Firstly, GSR factorizes client graphs into a series of learnable graph atoms and conducts reconstruction to capture the globally shared structure. Then, LSE explores the local structure, mining potential but unrevealed connections within client subgraphs. GLSA further aligns the global and local structure to alternatively refine the graph atoms and GNN model, enhancing the overall structural modeling. Extensive experiments on six datasets consistently validate the effectiveness of \modelname.

Graph Size-imbalanced Learning with Energy-guided Structural Smoothing

Jiawen Qin
Pengfeng Huang
Qingyun Sun
Cheng Ji
Xingcheng Fu
Jianxin Li

Graph is a prevalent data structure employed to represent the relationships between entities, frequently serving as a tool to depict and simulate numerous systems, such as molecules and social networks. However, real-world graphs usually suffer from the size-imbalanced problem in the multi-graph classification, i.e., a long-tailed distribution with respect to the number of nodes. Recent studies find that off-the-shelf Graph Neural Networks (GNNs) would compromise model performance under the long-tailed settings. We investigate this phenomenon and discover that the long-tailed graph distribution greatly exacerbates the discrepancies in structural features. To alleviate this problem, we propose a novel energy-based size-imbalanced learning framework named SIMBA, which smooths the features between head and tail graphs and re-weights them based on the energy propagation. Specifically, we construct a higher-level graph abstraction named Graphs-to-Graph according to the correlations between graphs to link independent graphs and smooths the structural discrepancies. We further devise an energy-based message-passing belief propagation method for re-weighting lower compatible graphs in the training process and further smooth local feature discrepancies. Extensive experimental results over five public size-imbalanced datasets demonstrate the superior effectiveness of the model for size-imbalanced graph classification tasks.

Inductive Graph Few-shot Class Incremental Learning

Yayong Li
Peyman Moghadam
Can Peng
Nan Ye
Piotr Koniusz

Node classification with Graph Neural Networks (GNN) under a fixed set of labels is well studied, while Graph Few-Shot Class Incremental Learning (GFSCIL), which involves learning a GNN classifier as graph nodes and classes growing over time sporadically, has received much less attention despite its importance. We introduce inductive GFSCIL that continually learns novel classes with newly emerging nodes while maintaining performance on old classes without accessing previous data. This addresses the practical concern of transductive GFSCIL, which requires storing the entire graph with historical data. Compared to the transductive GFSCIL, the inductive setting exacerbates catastrophic forgetting due to inaccessible previous data during incremental training, in addition to the overfitting issue caused by label sparsity. Thus, we propose a novel method, called Topology-based class Augmentation and Prototype calibration (TAP). To be specific, it first performs a topology-based class augmentation method, helping replicate the setting of disjoint subgraphs with nodes of novel classes received in incremental sessions, to enhance backbone versatility. In incremental learning, given the limited number of novel class samples, we propose an iterative prototype calibration to improve the separation of class prototypes. Furthermore, as backbone fine-tuning poses the feature distribution drift, prototypes of old classes start failing over time, we propose the prototype shift method for old classes to compensate for the drift. We showcase the proposed method on four datasets.

Incomplete Multi-view Clustering via Local Reasoning and Correlation Analysis

Xiaocui Li
Guoliang Li
Xinyu Zhang
Yangtao Wang
Qingyu Shi
Wei Liang

In recent years, incomplete multi-view clustering (IMVC) has attracted considerable attention for its ability to acheieve effective clustering results through the integration of key information amidst missing view. However, the existing IMVC methods are still faced with 3 limitations: (1) They exhibit deficiencies in considering the weight distribution within views, (2) they ignore the varying contributions of different views to the common consistent representation, and (3) they struggle to sufficiently extract and recover the vital information within incomplete views. To address these limitations, we incorporates local reasoning and correlation analysis to design an incomplete multi-view clustering method(IMVCLRCA), which introduces a new strategy of feature learning and missing view recovery, fully exploiting local similarity and structural continuity within views and performing precise local reasoning recovery on missing data. By maximizing mutual information between views through contrastive learning, we achieve the consistent representation learning of multiple views. Furthermore, based on semantic consistency, we comprehensively consider the correlation between views, utilized a weight matrix to fuse cross-view data, and constructed a view with a correlation structure, ultimately obtaining a common consistent representation. We conduct extensive experiments on 4 public datasets including Caltech101-20, BBCSport, Scene-15, and LandUse-21. Experimental results demonstrate that IMVCLRCA has higher accuracy and robustness compared to the state-of-the-art IMVC methods. The anonymous code of this project is available on GitHub at https://github.com/ggg2111/2025WSDM-IMVCLRCA.

SESSION: Poster Session 5

Context Embeddings for Efficient Answer Generation in Retrieval-Augmented Generation

David Rau
Shuai Wang
Hervé Déjean
Stéphane Clinchant
Jaap Kamps

Retrieval-Augmented Generation (RAG) allows overcoming the limited knowledge of LLMs by extending the input with external information. As a consequence, the contextual inputs to the model become much longer slowing down decoding time affecting the time a user has to wait for an answer. We address this challenge by presenting COCOM, an effective context compression method, reducing long contexts to only a handful of Context Embeddings, speeding up the generation time by a large margin. Our method allows for different compression rates, trading off decoding time for answer quality. Compared to earlier methods, COCOM allows for handling multiple contexts more effectively, significantly reducing decoding time for long inputs. Our method demonstrates an inference speed-up of up to 5.69 times while achieving higher performance compared to existing efficient context compression methods

An Aspect Performance-aware Hypergraph Neural Network for Review-based Recommendation

Junrui Liu
Tong Li
Di Wu
Zifang Tang
Yuan Fang
Zhen Yang

Online reviews allow consumers to provide detailed feedback on various aspects of items. Existing methods utilize these aspects to model users' fine-grained preferences for specific item features through graph neural networks. We argue that the performance of items on different aspects is important for making precise recommendations, which has not been taken into account by existing approaches, due to lack of data. In this paper, we propose an aspect performance-aware hypergraph neural network (APH) for the review-based recommendation, which learns the performance of items from the conflicting sentiment polarity of user reviews. Specifically, APH comprehensively models the relationships among users, items, aspects, and sentiment polarity by systematically constructing an aspect hypergraph based on user reviews. In addition, APH aggregates aspects representing users and items by employing an aspect performance-aware hypergraph aggregation method. It aggregates the sentiment polarities from multiple users by jointly considering user preferences and the semantics of their sentiments, determining the weights of sentiment polarities to infer the performance of items on various aspects. Such performances are then used as weights to aggregate neighboring aspects. Experiments on six real-world datasets demonstrate that APH improves MSE, Precision@5, and Recall@5 by an average of 2.30%, 4.89%, and 1.60% over the best baseline. The source code and data are available at https://github.com/dianziliu/APH.

Balancing Revenue and Privacy with Signaling Schemes in Online Ad Auctions

Hongtao Liu
Luxi Chen
Yiming Ding
Changcheng Li
Han Li
Peng Jiang
Weiran Shen

In online ad auctions, when an Internet user's certain actions trigger an auction, the auctioneer (the platform) usually sends the information about the user to help the buyers better estimate their valuations. However, by strategically revealing only partial information, we cannot only improve the revenue of the auction, but also help protect the privacy of the user. In this paper, we propose a privacy measure in the online ad auction setting, and seek to maximize a convex combination of revenue and privacy. We formulate the problem as a convex optimization program and derive structural results and properties of the program. We prove that any combination coefficient achieves a certain fraction of the optimal revenue gain and privacy gain, and that we can trade-off between revenue and privacy by simply tuning the combination coefficient. We also show that the gap between the optimal revenue and the revenue achieved by revealing no information can be bounded by a certain valuation discrepancy between the buyers. We also conduct extensive experiments (on both synthetic and real data) to show the effectiveness of our method.

DeMBR: Denoising Model with Memory Pruning and Semantic Guidance for Multi-Behavior Recommendation

Shuai Zhang
Hua Chu
Jianan Li
Yangtao Zhou
Shirong Wang
Qiaofei Sun

Multi-behavior recommendation systems aim to incorporate auxiliary behaviors (e.g., click, cart, etc.) to enhance the understanding of sparse target behaviors (e.g., purchase), thereby capturing user preferences more accurately. Currently, multi-behavior recommendation research focuses on modeling the associations between different user behaviors, but ignores the large amount of noise in user interaction data. This noise may come from accidental touches, curiosity, or ineffective operations during the purchasing process, and can be further categorized into two types: 1) hard noise is significantly deviates from the user's true preferences, and 2) soft noise is closer to the user's true preferences. The presence of noise can interfere with the model's ability to accurately identify the user's true preferences. To overcome the aforementioned issue, we innovatively propose a Denoising Model with Memory Pruning and Semantic Guidance for Multi-Behavior Recommendation (DeMBR). The model eliminates different types of noise at the data level and the representation level, respectively. Specifically, since hard noise significantly deviates from user preferences, we design a pruning-based denoising module that leverages a memory bank, which identifies and removes hard noise interactions from the data. Since soft noise reflects some user preferences, we design a semantic guidance denoising module that leverages behaviors with strong expressive ability (e.g., purchase) to guide those with weaker ability (e.g., click), effectively suppressing noise while preserving true's preferences. Finally, we designed a cross-learning module that allows noise-identifying signals to be exchanged between the two modules, and ultimately learn representations that accurately reflect user's preferences. Extensive experiments conducted on two public datasets demonstrate that our model substantially surpasses the state-of-the-art recommendation models. Our code is publicly available at: https://github.com/DeMBR2024/DeMBR.git

Progressive Tasks Guided Multi-Source Network for Customer Lifetime Value Prediction in Online Advertising

Zheng Pan
Xingyu Lou
Xiao Jin
Chiye Ou
Feng Liu
Tieyong Zeng
Chengwei He
Xiang Liu
Lilong Wei
Jun Wang

Customer lifetime value (LTV) is crucial to companies who are intending to adopt personalized promoting strategies to optimize the profits. However, LTV prediction in the scenario of online App advertising usually suffers from label sparsity issue, towards which existing methods designed complex model structures but ignored the information contained in intermediate user behaviors. Moreover, previous works mainly focus on fitting the overall LTV distribution, overlooking the fact that LTV in online App advertising consists of sources with diverse data distributions and thus resulting in sub-optimal solutions. In this paper, we propose a novel Progressive Tasks guided Multi-Source Network (PTMSN) to tackle the aforementioned problems. Specifically, a Cascaded Sub-task Module (CSM) is introduced to alleviate data sparsity by modeling reliance between explicit interactions and implicit monetization. In addition, as the overall LTV is assembled from multiple sources, we propose a divide-and-conquer scheme named Multi-source Integrating Module (MIM) to disentangle the original single target into several source distributions and model in a fine-grained manner. Extensive offline experiments on real-world industrial datasets compared to state-of-the-art baseline models validate the effectiveness of our approach. PTMSN has been successfully deployed in industrial online advertising system, serving various business scenarios and acquiring 2.97% absolute ROI gains.

Mining Topics towards ChatGPT Using a Disentangled Contextualized-neural Topic Model

Rui Wang
Xing Liu
Yanan Wang
Shuyu Chang
Yuanzhi Yao
Haiping Huang

Mining topics relevant to the advanced AI dialogue system, such as ChatGPT, from short-length posts on social media poses several challenges for existing topic-mining approaches. Firstly, Bag-Of-Words approaches, including probabilistic topic models and their embedding-based variants, may struggle to extract interpretable topics due to insufficient word co-occurrence. Secondly, contextualized based approaches, built on the autoencoding framework, often yield entangled topic spaces, resulting in the mixing of irrelevant words into topics. To address these limitations, we propose a novel Dis entangled Contextualized-neural Topic Model (DisCTM) based on textual representation learning. DisCTM leverages a pre-trained transformer language model to incorporate word sequence information and deal with the sparsity in short text. Additionally, it employs a topic disentangling mechanism to decorrelate dimensions of the latent topic space, effectively separating semantically irrelevant words into different topics. Extensive experiments have been conducted on three publicly available text corpora, and the results demonstrate the effectiveness of DisCTM in extracting high-quality topics, as measured by topic coherence and diversity metrics.

LightGNN: Simple Graph Neural Network for Recommendation

Guoxuan Chen
Lianghao Xia
Chao Huang

Graph neural networks (GNNs) have demonstrated superior performance in collaborative recommendation through their ability to conduct high-order representation smoothing, effectively capturing structural information within users' interaction patterns. However, existing GNN paradigms face significant challenges in scalability and robustness when handling large-scale, noisy real-world datasets. To address these challenges, we present LightGNN, a lightweight and distillation-based GNN pruning framework designed to substantially reduce model complexity while preserving essential collaboration modeling capabilities. Our LightGNN framework introduces a computationally efficient pruning module that adaptively identifies and removes adverse edges and embedding entries for model compression. The framework is guided by a resource-friendly hierarchical knowledge distillation objective, whose intermediate layer augments the observed graph to maintain performance, particularly in high-rate compression scenarios. Extensive experiments on public datasets demonstrate LightGNN's effectiveness, significantly improving both computational efficiency and recommendation accuracy. Notably, LightGNN achieves an 80% reduction in edge count and 90% reduction in embedding entries while maintaining performance comparable to more complex state-of-the-art baselines. The implementation of our LightGNN model is available at the github repository: https://github.com/HKUDS/LightGNN.

SESSION: Plenary Session 6: Fake News and Anomaly Detection

Revisiting Fake News Detection: Towards Temporality-aware Evaluation by Leveraging Engagement Earliness

Junghoon Kim
Junmo Lee
Yeonjun In
Kanghoon Yoon
Chanyoung Park

Social graph-based fake news detection aims to identify news articles containing false information by utilizing social contexts, e.g., user information, tweets and comments. However, conventional methods are evaluated under less realistic scenarios, where the model has access to future knowledge on article-related and context-related data during training. In this work, we newly formalize a more realistic evaluation scheme that mimics real-world scenarios, where the data is temporality-aware and the detection model can only be trained on data collected up to a certain point in time. We show that the discriminative capabilities of conventional methods decrease sharply under this new setting, and further propose DAWN, a method more applicable to such scenarios. Our empirical findings indicate that later engagements (e.g., consuming or reposting news) contribute more to noisy edges that link real news-fake news pairs in the social graph. Motivated by this, we utilize feature representations of engagement earliness to guide an edge weight estimator to suppress the weights of such noisy edges, thereby enhancing the detection performance of DAWN. Through extensive experiments, we demonstrate that DAWN outperforms existing fake news detection methods under real-world environments. The source code is available at https://github.com/LeeJunmo/DAWN.

GAMED: Knowledge Adaptive Multi-Experts Decoupling for Multimodal Fake News Detection

Lingzhi Shen
Yunfei Long
Xiaohao Cai
Imran Razzak
Guanming Chen
Kang Liu
Shoaib Jameel

Multimodal fake news detection often involves modelling heterogeneous data sources, such as vision and language. Existing detection methods typically rely on fusion effectiveness and cross-modal consistency to model the content, complicating understanding how each modality affects prediction accuracy. Additionally, these methods are primarily based on static feature modelling, making it difficult to adapt to the dynamic changes and relationships between different data modalities. This paper develops a significantly novel approach, GAMED, for multimodal modelling, which focuses on generating distinctive and discriminative features through modal decoupling to enhance cross-modal synergies, thereby optimizing overall performance in the detection process. GAMED leverages multiple parallel expert networks to refine features and pre-embed semantic knowledge to improve the experts' ability in information selection and viewpoint sharing. Subsequently, the feature distribution of each modality is adaptively adjusted based on the respective experts' opinions. GAMED also introduces a novel classification technique to dynamically manage contributions from different modalities, while improving the explainability of decisions. Experimental results on the Fakeddit and Yang datasets demonstrate that GAMED performs better than recently developed state-of-the-art models. The source code can be accessed at https://github.com/slz0925/GAMED.

SESSION: Poster Session 6

Enhancing Code Search Intent with Programming Context Exploration

Yanmin Dong
Zhenya Huang
Zheng Zhang
Guanhao Zhao
Likang Wu
Hongke Zhao
Binbin Jin
Qi Liu

An intelligent code search engine tries to find and suggest a code piece given a developer's query quickly from a large-scale program database, which can significantly promote software development efficiency. Existing solutions can search the relevant codes to some extent. However, most of them fail to precisely understand the search intent of developers since they only mine their natural language queries, while ignoring the valuable programming context (e.g., the code written by the developer). In this paper, we study the novel problem of context-aware code search. To promote a step forward, we first provide the CodeSearchNet-C dataset with constructing sufficient programming context from the GitHub website for each query-code instance. The dataset is supplemented on the CodeSearchNet benchmark, ensuring both generality and comparability for relevant research. Then, by analyzing the characteristics of programming context, we propose a novel two-stage Context-aware Code Retrieval (ConCR) framework. In the first stage, we propose a Context Walking algorithm, which simulates the programming habits of different developers. The generated programming context could ensure the diversity of search intent among developers. In the second stage, imitating the reading habits of developers, we introduce a novel Context Hierarchical Encoder, to understand the search intent with contextual information from local to global. Our ConCR framework is general, and we give three implementations on the basis of typical code search models as backbones. Extensive experimental results clearly prove that our ConCR significantly enhances the code search performance, effectively fulfilling developers' needs for efficient code resource searching on the web. These results also verify the necessity of introducing programming context to understand developers' intent.

AMLCDR: An Adaptive Meta-Learning Model for Cross-Domain Recommendation by Aligning Preference Distributions

Fanqi Meng
Zhiyuan Zhang

The issue of data sparsity poses a formidable challenge in the field of recommender systems. Encouragingly, leveraging the interactions among overlapping users in the source domain can enhance item recommendation in the target domain. The transfer of user preferences across domains is a crucial concern in the cross-domain recommendation and represents a hopeful method to address data sparsity. Most existing methods transfer users' preference information by building a preference transfer network. These methods focus on the cross-domain mapping of preference features and ignore the inherent data distribution differences between the source domain and target domain. Consequently, the mapped user embeddings do not align with the item embeddings in the target domain and the recommendation quality decreases. On this basis, we propose a new method called Adaptive Meta-Learning for Cross-Domain Recommendation (AMLCDR). The method includes a meta-learning network for fully extracting user characteristics and generating a transfer network to reduce the user preference loss, as well as a domain adaptation network to align user preference distributions. We perform comprehensive experiments to assess the efficacy of AMLCDR by utilizing a substantial real-world dataset. We validate the effectiveness of data distribution alignment in domain adaptation. For diverse cross-domain recommendation tasks under different start conditions, AMLCDR outperforms state-of-the-art models in multiple evaluation metrics.

HACD: Harnessing Attribute Semantics and Mesoscopic Structure for Community Detection

Anran Zhang
Xingfen Wang
Yuhan Zhao

Community detection plays a pivotal role in uncovering closely connected subgraphs, aiding various real-world applications such as recommendation systems and anomaly detection. With the surge of rich information available for entities in real-world networks, the community detection problem in attributed networks has attracted widespread attention. While previous research has effectively leveraged network topology and attribute information for attributed community detection, these methods overlook two critical issues: (i) the semantic similarity between node attributes within the community, and (ii) the inherent mesoscopic structure, which differs from the pairwise connections of the micro-structure. To address these limitations, we propose HACD, a novel attributed community detection model based on heterogeneous graph attention networks. HACD treats node attributes as another type of node, constructs attributed networks into heterogeneous graph structures and employs attribute-level attention mechanisms to capture semantic similarity. Furthermore, HACD introduces a community membership function to explore mesoscopic community structures, enhancing the robustness of detected communities. Extensive experiments demonstrate the effectiveness and efficiency of HACD, outperforming state-of-the-art methods in attributed community detection tasks. Our code is publicly available at https://github.com/Anniran1/HACD1-wsdm https://github.com/Anniran1/HACD1-wsdm.

IMPO: Interpretable Memory-based Prototypical Pooling

Alessio Ragno
Roberto Capobianco

Graph Neural Networks (GNNs) have proven their effectiveness in various graph-structured data applications. However, one of the significant challenges in the realm of GNNs is representation learning, a critical concept that bridges graph pooling, aimed at creating compressed graph representations, and explainable artificial intelligence, which focuses on building models with transparent reasoning mechanisms. This research paper introduces a novel approach called Interpretable Memory-based Prototypical Pooling (IMPO) to address this challenge. IMPO is a graph pooling layer designed to enhance the interpretability of GNNs while maintaining high performance in graph classification tasks. It builds upon the MemPool algorithm and incorporates prototypical components to cluster nodes around class-aware centroids. This approach allows IMPO to selectively aggregate relevant substructures, paving the way for generating more interpretable graph representations. The experimental results in our study underscore the potential of pooling architectures in constructing inherently explainable GNNs. Notably, IMPO achieves state-of-the-art results in both classification and explanatory capacities across a diverse set of graph classification datasets.

Personalised Outfit Recommendation via History-aware Transformers

Myong Chol Jung
Julien Monteil
Philip Schulz
Volodymyr Vaskovych

We present the history-aware transformer (HAT), a transformer-based model that uses shoppers' purchase history to personalise outfit predictions. The aim of this work is to recommend outfits that are internally coherent while matching an individual shopper's style and taste. To achieve this, we stack two transformer models, one that produces outfit representations and another one that processes the history of purchased outfits for a given shopper. We use these models to score an outfit's compatibility in the context of a shopper's preferences as inferred from their previous purchases. During training, the model learns to discriminate between purchased and random outfits using 3 losses: the focal loss for outfit compatibility typically used in the literature, a contrastive loss to bring closer learned outfit embeddings from a shopper's history, and an adaptive margin loss to facilitate learning from weak negatives. Together, these losses enable the model to make personalised recommendations based on a shopper's purchase history.

Our experiments on the IQON3000 and Polyvore datasets show that HAT outperforms strong baselines on the outfit Compatibility Prediction (CP) and the Fill In The Blank (FITB) tasks. The model improves AUC for the CP hard task by 15.7% (IQON3000) and 19.4% (Polyvore) compared to previous SOTA results. It further improves accuracy on the FITB hard task by 6.5% and 9.7%, respectively. We provide ablation studies on the personalisation, constrastive loss, and adaptive margin loss that highlight the importance of these modelling choices.

DTPN: A Diffusion-based Traffic Purification Network for Tor Website Fingerprinting

Chenchen Yang
Xi Xiao
Guangwu Hu
Zhen Ling
Hao Li
Bin Zhang

Website Fingerprinting attack is a type of method used to classify network traffic generated by users on the Tor (The Onion Router) based on the websites they visit, leading to the leakage of individuals' privacy . For Website Fingerprinting attack, network traffic defense methods involve adding noise to the original network traffic to render the attacker's methods ineffective. Previous attack methods primarily focused on improving classification accuracy by enhancing the attack model, with adversarial training being the most common approach. However, adversarial training requires frequent updates and exhibits poor generalization when dealing with previously unseen network traffic protection methods. In order to address the limitations of adversarial training, a novel method is proposed leveraging a diffusion model for network traffic purification. This paper is the first to use a diffusion model to resist network traffic defense based on adversarial perturbations. The diffusion models are theoretically suited for data purification in the training mode, i.e., removing noises generated by adversarial perturbations from the data. Our method enables existing network traffic classification methods to maintain effective classification of network traffic after protection without requiring retraining, while also achieving good generalization performance with previously unseen network traffic defense methods. The purified network traffic data can effectively improve the robustness of existing website fingerprinting methods. Experiments conducted under various network traffic defense strategies demonstrate that the proposed method increases accuracy by up to 60.8% on DF dataset and 50.3% on CW100 dataset, respectively, compared to adversarial training.

Density-aware and Cluster-based Federated Anomaly Detection on Data Streams

Bin Li
Li Cheng
Zheng Qin
Yunlong Wu

Federated active anomaly detection on data streams becomes a crucial research problem, since it attempts to discover anomalous data with protecting data privacy and avoiding extensive data labeling. Although extensive work has been conducted on anomaly detection, distinguishing similar anomalies of different categories still remains quite a challenging issue. The requirement of privacy protection in federated settings aggravates the difficulties for instance query and scoring in active anomaly detection when solving this issue. To the best of our knowledge, limited work has focused on this research area. Therefore, we propose Density-aware and cluster-based Federated Active anomaly detection on data Streams, called DFAS. We design a novel lightweight federated anomaly detection clusters with density-aware hash cells, which successfully capture evolving data distribution. The federated anomaly detection clusters are incrementally updated with an acceptable theoretical reconstruction error guarantee. In addition, we propose a straightforward but effective metric divergences accompanied by a greedy search algorithm, which takes both global aggregation bias mitigation and efficiency into account. At last, DFAS detects anomalies and queries the instances for manual labels by measuring the density in hash cells of each cluster, effectively distinguishing closely distributed anomaly classes while maintaining data privacy in the federated setting. Comprehensive experiments on several real-world data sets show that DFAS outperforms previous methods, improving F1 scores by up to 26.7%.

SESSION: Plenary Session 7: Bias in Recommendations

How Do Recommendation Models Amplify Popularity Bias? An Analysis from the Spectral Perspective

Siyi Lin
Chongming Gao
Jiawei Chen
Sheng Zhou
Binbin Hu
Yan Feng
Chun Chen
Can Wang

Recommendation Systems (RS) are often plagued by popularity bias. When training a recommendation model on a typically long-tailed dataset, the model tends to not only inherit this bias but often exacerbate it, resulting in over-representation of popular items in the recommendation lists. This study conducts comprehensive empirical and theoretical analyses to expose the root causes of this phenomenon, yielding two core insights: 1) Item popularity is memorized in the principal spectrum of the score matrix predicted by the recommendation model; 2) The dimension reduction phenomenon amplifies the relative prominence of the principal spectrum, thereby intensifying the popularity bias.

Building on these insights, we propose a novel debiasing strategy that leverages a spectral norm regularizer to penalize the magnitude of the principal singular value. We have developed an efficient algorithm to expedite the calculation of the spectral norm by exploiting the spectral property of the score matrix. Extensive experiments across seven real-world datasets and three testing paradigms have been conducted to validate the superiority of the proposed method.

Exploration and Exploitation of Hard Negative Samples for Cross-Domain Sequential Recommendation

Yidan Wang
Xuri Ge
Xin Chen
Ruobing Xie
Su Yan
Xu Zhang
Zhumin Chen
Jun Ma
Xin Xin

Negative sampling plays a crucial role for cross-domain recommendation as it provides contrastive signals to learn user preference. Existing methods usually select items with high predicted scores or popularity as hard negative samples to improve model training. However, such methods suffer from choosing false negative samples since items with high predicted scores or popularity could also indicate potential positive user preference. Although several studies devoted to discovering true negative samples, few of them leverage user cross-domain behaviors to alleviate the false negative issue. How to effectively mine and utilize hard negative samples to improve cross-domain recommendation remains an open question.

In this work, we propose exploration and exploitation of hard negative samples (EXHANS) for cross-domain sequential recommendation. For better exploration, we utilize the user preference from the source domain to guide negative sampling in the target domain. The key idea is that compared with hard negative samples, false negative samples have higher probability to be consistent with the user preference in both domains. Besides, we propose adaptive popularity-based score correction to account for users' different tastes of popular items. The idea is that for users who favor popular items, such items are more likely to be false negatives rather than hard negatives. For better exploitation, we design a replay buffer to cache the obtained negative samples and further propose a curriculum learning framework to balance exploration and exploitation of hard negative samples. Extensive experiments on three real-world datasets show that our method significantly outperforms state-of-the-art negative sampling methods for cross-domain sequential recommendation, which verify the effectiveness of EXHANS.

SESSION: Poster Session 7

Training MLPs on Graphs without Supervision

Zehong Wang
Zheyuan Zhang
Chuxu Zhang
Yanfang Ye

Graph Neural Networks (GNNs) have demonstrated their effectiveness in various graph learning tasks, yet their reliance on neighborhood aggregation during inference poses challenges for deployment in latency-sensitive applications, such as real-time financial fraud detection. To address this limitation, recent studies have proposed distilling knowledge from teacher GNNs into student Multi-Layer Perceptrons (MLPs) trained on node content, aiming to accelerate inference. However, these approaches often inadequately explore structural information when inferring unseen nodes. To this end, we introduce SimMLP, a Self-supervised framework for learning MLPs on graphs, designed to fully integrate rich structural information into MLPs. Notably, SimMLP is the first MLP-learning method that can achieve equivalence to GNNs in the optimal case. The key idea is to employ self-supervised learning to align the representations encoded by graph context-aware GNNs and neighborhood dependency-free MLPs, thereby fully integrating the structural information into MLPs. We provide a comprehensive theoretical analysis, demonstrating the equivalence between SimMLP and GNNs based on mutual information and inductive bias, highlighting SimMLP's advanced structural learning capabilities. Additionally, we conduct extensive experiments on 20 benchmark datasets, covering node classification, link prediction, and graph classification, to showcase SimMLP's superiority over state-of-the-art baselines, particularly in scenarios involving unseen nodes (e.g., inductive and cold-start node classification) where structural insights are crucial. Our codes are available at: https://github.com/Zehong-Wang/SimMLP.

Explainable CTR Prediction via LLM Reasoning

Xiaohan Yu
Li Zhang
Chong Chen

Recommendation Systems have become integral to modern user experiences, but lack transparency in their decision-making processes. Existing explainable recommendation methods are hindered by reliance on a post-hoc paradigm, wherein explanation generators are trained independently of the underlying recommender models. This paradigm necessitates substantial human effort in data construction and raises concerns about explanation reliability. In this paper, we present ExpCTR, a novel framework that integrates large language model based explanation generation directly into the CTR prediction process. Inspired by recent advances in reinforcement learning, we employ two carefully designed reward mechanisms, LC alignment, which ensures explanations reflect user intentions, and IC alignment, which maintains consistency with traditional ID-based CTR models. Our approach incorporates an efficient training paradigm with LoRA and a three-stage iterative process. ExpCTR circumvents the need for extensive explanation datasets while fostering synergy between CTR prediction and explanation generation. Experimental results demonstrate that ExpCTR significantly enhances both recommendation accuracy and interpretability across three real-world datasets.

Adaptive Graph Enhancement for Imbalanced Multi-relation Graph Learning

Yiyue Qian
Tianyi Ma
Chuxu Zhang
Yanfang Ye

Graph Neural Networks (GNNs), as the mainstream graph representation learning method, has demonstrated its effectiveness in learning graph embeddings over benchmark datasets. However, existing GNNs still have limitations in handling real-world graphs in the following aspects: (i) nodes in most real-world graphs are inherently class-imbalanced; (ii) node degrees vary considerably in real-world graphs; (iii) most existing works study these two issues separately but ignore the co-occurrence between class imbalance and topology imbalance in graphs. They overlook the fact that topology imbalance varies significantly across different relation types. Hence, we propose a novel model called AD-GSMOTE (Adaptive Graph SMOTE) to tackle the class and topology issues simultaneously in multi-relation graphs. Specifically, we first design an adaptive topology-aware node generator and an efficient triadic edge generator to enhance the graph structure under each relation type by generating synthetic nodes for all tail nodes in minority classes and creating rich connections among tail nodes and others. Then, the enhanced multi-relation graph is fed into a GNN encoder to get node embeddings. Afterward, a class-aware logit adjustment module is designed to adjust the pre-softmax logit during model training, which enables the model to learn larger margins between minority and majority classes. To evaluate the performance of AD-GSMOTE, we build a new real-world graph (Twitter-Drug) to classify user roles in the drug trafficking community. The excellent performance on three real-world graphs demonstrates the effectiveness and efficiency of AD-GSMOTE compared with state-of-the-art methods. Source code and dataset are available at https://github.com/graphprojects/AD-GSMOTE https://github.com/graphprojects/AD-GSMOTE.

DimeRec: A Unified Framework for Enhanced Sequential Recommendation via Generative Diffusion Models

Wuchao Li
Rui Huang
Haijun Zhao
Chi Liu
Kai Zheng
Qi Liu
Na Mou
Guorui Zhou
Defu Lian
Yang Song
Wentian Bao
Enyun Yu
Wenwu Ou

Sequential Recommendation (SR) plays a pivotal role in recommender systems by tailoring recommendations to user preferences based on their non-stationary historical interactions. Achieving high-quality performance in SR requires attention to both item representation and diversity. However, designing an SR method that simultaneously optimizes these merits remains a long-standing challenge. In this study, we address this issue by integrating recent generative Diffusion Models (DM) into SR. DM has demonstrated utility in representation learning and diverse image generation. Nevertheless, a straightforward combination of SR and DM leads to sub-optimal performance due to discrepancies in learning objectives (recommendation vs. noise reconstruction) and the respective learning spaces (non-stationary vs. stationary). To overcome this, we propose a novel framework called DimeRec (Di ffusion with multi-interest enhanced Rec ommender). DimeRec synergistically combines a guidance extraction module (GEM) and a generative diffusion aggregation module (DAM). The GEM extracts crucial stationary guidance signals from the user's non-stationary interaction history, while the DAM employs a generative diffusion process conditioned on GEM's outputs to reconstruct and generate consistent recommendations. Our numerical experiments demonstrate that DimeRec significantly outperforms established baseline methods across three publicly available datasets. Furthermore, we have successfully deployed DimeRec on a large-scale short video recommendation platform, serving hundreds of millions of users. Live A/B testing confirms that our method improves both users' time spent and result diversification.

An Edge-Based Decomposition Framework for Temporal Networks

Lutz Oettershagen
Athanasios L. Konstantinidis
Giuseppe F. Italiano

A temporal network is a dynamic graph where every edge is assigned an integer time label that indicates at which discrete time step the edge is available. We consider the problem of hierarchically decomposing the network and introduce an edge-based decomposition framework that unifies the core and truss decompositions for temporal networks while allowing us to consider the network's temporal dimension. Based on our new framework, we introduce the (k,∆)-core and (k,∆)-truss decompositions, which are generalizations of the classic k-core and k-truss decompositions for multigraphs. Moreover, we show how (k,∆)-cores and (k,∆)-trusses can be efficiently further decomposed to obtain spatially and temporally connected components. We evaluate the characteristics of our new decompositions and the efficiency of our algorithms. Moreover, we demonstrate how our (k,∆)-decompositions can be applied to analyze malicious content in a Twitter network to obtain insights that state-of-the-art baselines cannot obtain.

Fusion Matters: Learning Fusion in Deep Click-through Rate Prediction Models

Kexin Zhang
Fuyuan Lyu
Xing Tang
Dugang Liu
Chen Ma
Kaize Ding
Xiuqiang He
Xue Liu

The evolution of previous Click-Through Rate (CTR) models has mainly been driven by proposing complex components, whether shallow or deep, that are adept at modeling feature interactions. However, there has been less focus on improving fusion design. Instead, two naive solutions, stacked and parallel fusion, are commonly used. Both solutions rely on pre-determined fusion connections and fixed fusion operations. It has been repetitively observed that changes in fusion design may result in different performances, highlighting the critical role that fusion plays in CTR models. While there have been attempts to refine these basic fusion strategies, these efforts have often been constrained to specific settings or dependent on specific components. Neural architecture search has also been introduced to partially deal with fusion design, but it comes with limitations. The complexity of the search space can lead to inefficient and ineffective results. To bridge this gap, we introduce OptFusion, a method that automates the learning of fusion, encompassing both the connection learning and the operation selection. We have proposed a one-shot learning algorithm tackling these tasks concurrently. Our experiments are conducted over three large-scale datasets. Extensive experiments prove both the effectiveness and efficiency of OptFusion in improving CTR model performance. Our code implementation is available here https://github.com/kexin-kxzhang/OptFusion.

SESSION: Plenary Session 8: Multimodal Data and Time Series Analysis

InstrucTime: Advancing Time Series Classification with Multimodal Language Modeling

Mingyue Cheng
Yiheng Chen
Qi Liu
Zhiding Liu
Yucong Luo
Enhong Chen

For the advancement of time series classification, we can summarize that most existing methods adopt a common learning-to-classify paradigm - a classifier model tries to learn the relation between sequence inputs and target label encoded by one-hot distribution. Although effective, this paradigm conceals two inherent limitations: (1) one-hot distribution fails to reflect the comparability and similarity between labels, and (2) it is difficult to learn transferable representation across domains. In this work, we propose InstructTime, a novel attempt to reshape time series classification as a learning-to-generate paradigm. Relying on the generative capacity of the pre-trained language model, the core idea is to formulate the classification of time series as a multimodal understanding task. Specifically, firstly, a time series discretization module is designed to convert continuous inputs into a sequence of discrete tokens to solve the inconsistency issue across modality data. Secondly, we introduce an alignment projected layer before feeding the transformed token of time series into language models. Thirdly, prior to fine-tuning the language model for the target domain, it is essential to emphasize the necessity of auto-regressive pre-training across various modality inputs. Finally, extensive experimentation are conducted on several prevalent public benchmark datasets, indicating the superior performance of the InstructTime. Our code is at https://github.com/Mingyue-Cheng/InstructTime.

SESSION: Poster Session 8

DLCRec: A Novel Approach for Managing Diversity in LLM-Based Recommender Systems

Jiaju Chen
Chongming Gao
Shuai Yuan
Shuchang Liu
Qingpeng Cai
Peng Jiang

The integration of Large Language Models (LLMs) into recommender systems has led to substantial performance improvements. However, this often comes at the cost of diminished recommendation diversity, which can negatively impact user satisfaction. To address this issue, controllable recommendation has emerged as a promising approach, allowing users to specify their preferences and receive recommendations that meet their diverse needs. Despite its potential, existing controllable recommender systems frequently rely on simplistic mechanisms, such as a single prompt, to regulate diversity-an approach that falls short of capturing the full complexity of user preferences. In response to these limitations, we propose DLCRec, a novel framework designed to enable fine-grained control over diversity in LLM-based recommendations. Unlike traditional methods, DLCRec adopts a well-designed task decomposition strategy, breaking down the recommendation process into three sequential sub-tasks: genre prediction, genre filling, and item prediction. These sub-tasks are trained independently and inferred sequentially according to user-defined control numbers, ensuring more precise control over diversity. Furthermore, the scarcity and uneven distribution of diversity-related user behavior data pose significant challenges for fine-tuning. To overcome these obstacles, we introduce two data augmentation techniques that enhance the model's robustness to noisy and out-of-distribution data. These techniques expose the model to a broader range of patterns, improving its adaptability in generating recommendations with varying levels of diversity. Our extensive empirical evaluation demonstrates that DLCRec not only provides precise control over diversity but also outperforms state-of-the-art baselines across multiple recommendation scenarios.

Reindex-Then-Adapt: Improving Large Language Models for Conversational Recommendation

Zhankui He
Zhouhang Xie
Harald Steck
Dawen Liang
Rahul Jha
Nathan Kallus
Julian McAuley

Large Language Models (LLMs) are revolutionizing conversational recommender systems (CRS) by effectively indexing item content, understanding complex conversational contexts, and generating relevant item titles. However, the autoregressive nature of LLMs, which outputs item titles as a long sequence of subtokens, hinders the ability to efficiently obtain and control recommendations across the entire item set. This challenge in calculating probabilities over all items limits LLMs' potential, such as (1) limiting control over recommendation popularities and (2) preventing the synergy of marrying LLMs and traditional recommender systems (RecSys).

To address this challenge, we propose the Reindex-Then-Adapt (RTA) framework. It consists of two steps: (1) Reindex: a lightweight network learns to condense multi-token item titles into single tokens within the LLM and distills LLM-generated recommendations as ranked lists. This bypasses the autoregressive nature of LLMs while trying to preserve their CRS abilities; (2) Adapt: LLMs after reindexing enable efficient adjustment of probability distributions over single-token titles, further enhanced through RecSys integration. RTA bridges the strengths of LLMs and RecSys, enabling understanding of complex queries as LLMs do, while efficiently controlling recommended item distributions as in traditional RecSys. We show the effectiveness of our RTA over base LLMs across three CRS datasets with negligible additional parameters.

SESSION: Plenary Session 9: Emerging Topics in Data Mining

Predicting Eviction Status Using Airbnb Data in the Absence of Ground-Truth Eviction Records

Maryam Tabar
Anusha Abdulla
J. Andrew Petersen
Dongwon Lee

The eviction of tenants is a pressing problem, which is prevalent among low-income renters in the USA, and has devastating consequences. Despite the presence of various measures to combat evictions, identifying high-need regions and tenant groups is highly challenging in many regions due to a lack of access to eviction records (partly because of some infrastructural/policy constraints). In response to this information gap, this paper proposes a solution driven by Machine Learning (ML) to monitor eviction status at various spatial resolutions using Airbnb data when ground-truth eviction data is inaccessible. In particular, we begin by demonstrating the potential of utilizing Airbnb data to build ML-driven methods for distinguishing different neighborhoods across different spatial resolutions with respect to eviction status. We then proceed to develop an ML model capable of learning eviction status levels from Airbnb data, even in the absence of ground-truth labels. Empirical evidence is presented, showcasing the model's performance on par with several robust fully-supervised ML models that had access to ground-truth labels during training. Finally, we conduct a set of cross-region tests to comprehensively study the generalizability of the achieved performance across various unseen regions in the USA that were not used during model training. The code of this project can be accessed via https://github.com/maryam-tabar/Airbnb-Eviction.

Improving Scientific Document Retrieval with Concept Coverage-based Query Set Generation

SeongKu Kang
Bowen Jin
Wonbin Kweon
Yu Zhang
Dongha Lee
Jiawei Han
Hwanjo Yu

In specialized fields like the scientific domain, constructing large-scale human-annotated datasets poses a significant challenge due to the need for domain expertise. Recent methods have employed large language models to generate synthetic queries, which serve as proxies for actual user queries. However, they lack control over the content generated, often resulting in incomplete coverage of academic concepts in documents. We introduce Concept Coverage-based Query set Generation (CCQGen) framework, designed to generate a set of queries with comprehensive coverage of the document's concepts. A key distinction of CCQGen is that it adaptively adjusts the generation process based on the previously generated queries. We identify concepts not sufficiently covered by previous queries, and leverage them as conditions for subsequent query generation. This approach guides each new query to complement the previous ones, aiding in a thorough understanding of the document. Extensive experiments demonstrate that CCQGen significantly enhances query quality and retrieval performance.

Demystify Epidemic Containment in Directed Networks: Theory and Algorithms

Yinhan He
Chen Chen
Song Wang
Guanghui Min
Jundong Li

Epidemic containment has long been a crucial task in many high-stake application domains, ranging from public health to misinformation dissemination. Existing studies for epidemic containment are primarily focused on undirected networks, assuming that the infection rate is constant throughout the contact network regardless of the strength and direction of contact. However, such an assumption can be unrealistic given the asymmetric nature of the real-world infection process. To tackle the epidemic containment problem in directed networks, simply grafting the methods designed for undirected network can be problematic, as most of the existing methods rely on the orthogonality and Lipschitz continuity in the eigensystem of the underlying contact network, which do not hold for directed networks. In this work, we derive a theoretical analysis on the general epidemic threshold condition for directed networks and show that such threshold condition can be used as an optimization objective to control the spread of the disease. Based on the epidemic threshold, we propose an asymptotically greedy algorithm DINO (DIrected NetwOrk epidemic containment) to identify the most critical nodes for epidemic containment. The proposed algorithm is evaluated on real-world directed networks, and the results validate its effectiveness and efficiency. The code is available at https://github.com/YinhanHe123/DINO/.

SESSION: Poster Session 9

Adaptive Loss-based Curricula for Neural Team Recommendation

Reza Barzegar
Marco Nikola Kurepa
Hossein Fani

Neural team recommendation models have excelled at recommending collaborative teams of experts who, more likely than not, can solve complex tasks. Yet, they suffer from popularity bias due to the disproportionate distribution of popular experts over many teams and the sparse long-tailed distribution of non-popular ones in training datasets, overlooking the difficulty of recommending hard non-popular vs. easy popular experts. To bridge the gap, we propose three curriculum-based learning strategies to empower neural team recommenders sifting through easy popular and hard non-popular experts and to mitigate popularity bias and improve upon them. We propose (1) a parametric curriculum that assigns a learnable parameter to each expert enabling the model to learn an expert's levels of difficulty (or conversely, levels of popularity) during training, (2) a parameter-free (non-parametric) curriculum that presumes the worst-case difficulty for each expert based on the model's loss, and (3) a static curriculum to provide a minimum base for comparison amongst curriculum-based learning strategies and lack thereof. Our experiments on two benchmark datasets with distinct distributions of teams over skills showed that our parameter-free curriculum improved the performance of non-variational models across different domains, outperforming its parametric counterpart, and the static curriculum was the poorest. Moreover, among neural models, variational models obtain little to no gain from our proposed curricula, urging further research on more effective curricula for them. The code to reproduce our experiments is publically available at https://github.com/fani-lab/OpeNTF/tree/cl-wsdm25.

How Does Memorization Impact LLMs' Social Reasoning? An Assessment using Seen and Unseen Queries

Maryam Amirizaniani
Maryna Sivachenko
Adrian Lavergne
Chirag Shah
Afra Mashhadi

As Large Language Models (LLMs) have rapidly advanced in social reasoning tasks, their applications have expanded to domains such as healthcare and psychology. Given the direct interaction of users with these applications, it is essential to evaluate the performance of LLMs, particularly in human-like social reasoning capabilities. While previous studies have explored human-aligned social reasoning in LLMs, they have not adequately assessed whether the generated reasoning answers stem from the LLMs' memorization of training data or their natural language understanding. In this study, we aim to address this gap by assessing the impact of training data memorization on the human-aligned social reasoning capabilities of LLMs. We introduce IR+CoT (Information Retrieval (IR) + Chain of Thought (CoT)), a framework that leverages retrieved information from input questions to fine-tune prompt templates and employs CoT methods. IR+CoT mitigates the effects of memorization and enhances the LLMs' social reasoning performance. Experiments on three LLMs, using seen (present during the training of the LLMs) and unseen (introduced post-training) questions from Reddit and Lemmy, show that IR+CoT enhances social reasoning and reduces memorization effects. This research's novelty lies in using old and new questions to assess memorization's impact on social reasoning.

SESSION: Tutorials

Query Performance Prediction: Theory, Techniques and Applications

Negar Arabzadeh
Chuan Meng
Mohammad Aliannejadi
Ebrahim Bagheri

Query performance prediction (QPP) is a key task in information retrieval (IR), focusing on estimating the retrieval quality of a given query without relying on human-labeled relevance judgments. Over the decades, QPP has gained increasing significance, with a surge in research activity in recent years. It has proven to benefit various aspects of retrieval, such as optimizing retrieval effectiveness by selecting the most appropriate ranking function for each query.

Despite its critical role, there were only a few tutorials that cover the QPP techniques. The topic is even playing a more important role in the new era of pre-trained and large language models (LLMs), and the emerging fields of multi-agent intelligent systems and conversational search (CS ). Moreover, while research in QPP has yielded promising outcomes, studies on its practical application and integration into real-world search engines remain limited.

This tutorial has four main objectives. First, it aims to cover both the fundamentals and the latest advancements in QPP methods. Second, it broadens the scope of QPP beyond ad-hoc search to various search scenarios, e.g., CS and image search. Third, this tutorial provides a comprehensive review of QPP applications across various aspects of IR, providing insights on where and how to apply QPP in practice. Fourth, we equip participants with hands-on materials, enabling them to apply QPP implementation in practice. This tutorial seeks to benefit both researchers and practitioners in IR, encouraging further exploration and innovation in QPP.

Advances in Vector Search

Sebastian Bruch

Whether a text document is freed from the rules of grammar, stripped of word order, and thereby turned into a bag of words, or whether its semantic nuances learnt and condensed into an embedding space, its final representation is the same mathematical object: a vector. In fact, vectors represent much more than just text documents. Any object, be it a document or query, that contains text, images, speech, or a mix of these modalities, is often represented as a vector. Collect a large enough quantity of these vectors and the fundamental question of retrieval from the Information Retrieval (IR) discipline becomes urgently relevant: Finding k vectors that are more similar to a query. This full-day tutorial is concerned with the question above and intends to cover foundational concepts and advanced algorithms for vector retrieval or vector search. The tutorial begins with a focus on foundational concepts, including a brief history from space partitioning, to locality-sensitive hashing, graph-based, and clustering-based methods. As we discuss each class of solutions, we show failure scenarios and explain why they prove insufficient. We conclude the tutorial by turning our attention in the second half to recent developments for maximum inner product search over dense and sparse vectors, as well as open questions that need further research. Through this tutorial, we wish to recap the fascinating topic of retrieval in modern IR for the community, lower barriers of entry into this rich area of research, and inspire interest in conducting research on the underlying theoretical and empirical questions that are specific to IR.

Tutorial on Recommendation with Generative Models (Gen-RecSys)

Yashar Deldjoo
Zhankui He
Julian McAuley
Anton Korikov
Scott Sanner
Arnau Ramisa
René Vidal
Maheswaran Sathiamoorthy
Atoosa Kasirzadeh
Silvia Milano

This intermediate-level tutorial, titled "Gen-RecSys", merges both industrial and academic perspectives on recent advances in Generative AI for recommender systems (beyond LLMs). It aims to highlight the transformative role of generative models in modern recommender systems, which have significantly impacted the AI field-particularly with the rise of large language models (LLMs) like ChatGPT-and have contributed to a rapid convergence of the fields of search, data mining, and recommendation. By providing attendees with a modern perspective on GenAI applications in recommendation, the tutorial will emphasize how generative models can drive recommendation by unlocking and interacting with rich data representations, including behavioral, textual, and multi-modal data-knowledge highly transferable across many applications of interest to the WSDM community. Participants will learn about the categorization of generative models in recommender systems based on underlying data modalities: (i) ID-based collaborative models, (ii) text-driven models such as LLMs, and (iii) multi-modal models. Within each category, various deep generative model paradigms (e.g., AR, GAN, diffusion models) will be introduced, along with insights into their application areas. The tutorial will also cover evaluation aspects, including benchmarks, metrics, and assessments of social and ethical impacts and harms. This tutorial presents a condensed version of the industrial and academic work featured in the forthcoming book at FntIR 2024-25, titled "Recommendation with Generative Models [7]," and a shorter version prepared, and presented by the team, see GenRecSys-Survey [6].

Bridging Historical Subgraph Optimization and Modern Graph Neural Network Approaches in Team Recommendation

Mahdis Saeedi
Christine Wong
Hossein Fani

Team recommendation involves selecting experts with certain skills to form a successful task-oriented team. This tutorial provides a comprehensive study of conventional graph-based and a detailed review of cutting-edge neural network-based methods through unified definitions and formulations, along with insights into future research directions and real-world applications.

Towards Secure and Robust Recommender Systems: A Data-Centric Perspective

Zongwei Wang
Junliang Yu
Tong Chen
Hongzhi Yin
Shazia Sadiq
Min Gao

As recommender systems (RS) continue to evolve, the field has seen a pivotal shift from model-centric to data-centric paradigms, where the quality, integrity, and security of data are increasingly becoming the key drivers of system performance and personalization. This transformation has unlocked new avenues for more precise recommendations, yet it also introduces significant challenges. As reliance on data intensifies, RS face mounting threats that can compromise both their effectiveness and user trust. These challenges include (1) Malicious Data Manipulation, where adversaries corrupt or tamper with datasets, distorting recommendation outcomes and undermining system reliability; (2) Data Privacy Leakage, where adversarial actors exploit system outputs to infer sensitive user information, leading to serious privacy concerns; and (3) Erroneous Data Noise, where inaccuracies, inconsistencies, and redundant data obscure the true user preferences, degrading recommendation quality and user satisfaction. By focusing on these critical data-centric challenges, this tutorial aims to equip participants with the knowledge to build RS that are secure, privacy-preserving, and resilient to data-driven threats, ensuring reliable and trustworthy performance in real-world environments. In addition, attendees will gain hands-on experience with our newly released toolkit for RS-based attacks and defenses, providing them with practical, actionable insights into safeguarding RS against emerging vulnerabilities.

SESSION: Demonstrations

SAGESSE: A System for Argument Generation, Extraction and Structuring of Social Exchanges

Nicolas Almerge
Matteo Santelmo
Ilker Gül
Amin Asadi Sarijalou
Rémi Lebret
Léo Laugier
Karl Aberer

Online debates provide critical insights into public opinion and societal trends, yet the unstructured nature of these discussions presents significant challenges for analysis. In this paper, we present SAGESSE, a novel argumentation parsing pipeline tailored to Reddit debates, leveraging the capabilities of large language models to structure and interpret complex online arguments. SAGESSE generates detailed argument maps that organize debates systematically, offering a clearer understanding of discourse dynamics. We have developed a web application where users can select controversial Reddit topics and visualize the corresponding argument maps generated from user comments. This tool has the potential to aid analysts, policymakers, and researchers in tracking debate progress, gauging public sentiment, and identifying influential arguments. The web application is available at https://modemos.epfl.ch/sagesse/ .

CRS Arena: Crowdsourced Benchmarking of Conversational Recommender Systems

Nolwenn Bernard
Hideaki Joko
Faegheh Hasibi
Krisztian Balog

We introduce CRS Arena, a research platform for scalable benchmarking of Conversational Recommender Systems (CRS) based on human feedback. The platform displays pairwise battles between anonymous conversational recommender systems, where users interact with the systems one after the other before declaring either a winner or a draw. CRS Arena collects conversations and user feedback, providing a foundation for reliable evaluation and ranking of CRSs. We conduct experiments with CRS Arena on both open and closed crowdsourcing platforms, confirming that both setups produce highly correlated rankings of CRSs and conversations with similar characteristics. We release CRSArena-Dial, a dataset of 474 conversations and their corresponding user feedback, along with a preliminary ranking of the systems based on the Elo rating system. The platform is accessible at https://iai-group-crsarena.hf.space/.

A Shopping Agent for Addressing Subjective Product Needs

Preetam Prabhu Srikar Dammu
Omar Alonso
Barbara Poblete

In e-commerce, customers often struggle to find relevant items when their needs involve subjective properties characterized by personal or collective perception, tastes, and opinions, which are typically not captured in catalog data. This challenge is particularly pronounced in event-based scenarios like gifting, where selecting the right product involves complex subjective reasoning. Customer reviews can be a valuable source of subjective information to bridge this gap. Consequently, customers often spend significant amount of time navigating multiple products and reading numerous reviews to find suitable gifts that meet their needs. In order to reduce the effort involved, we propose an agentic approach driven by large language models to streamline this process by autonomously executing various user actions. These include computational tasks like vagueness detection and subjective product needs extraction, conversational interactions to gather missing user information, and web browsing actions that search for product details, reviews, and review images. Additionally, the agent employs generative actions to synthesize gifting ideas and explanations, helping users discover suitable products more efficiently. The proposed approach not only reduces the cognitive burden on users but also facilitates the exploration of a wider range of products. Our solution highlights the potential of autonomous agents to handle subjective queries in e-commerce, enhancing personalization, product exploration, and selection in a user-centric manner.

Cluster Insight: A Weighted Clustering Tool for Large Textual Data Exploration

Amine Ferdjaoui
Séverine Affeldt
Mohamed Nadif

In unsupervised learning, the exploration of large volumes of textual data is a topic of significant interest. In this article, we present our compact and easy-to-use application to explore large volumes of textual data using clustering and generative models. We demonstrate how to adapt the Lasso weighted k-means algorithm to handle textual data. In addition, we present in detail a user-friendly package that shows how to use LLMs effectively to describe document classes.

Evidence Contextualization and Counterfactual Attribution for Conversational QA over Heterogeneous Data with RAG Systems

Rishiraj Saha Roy
Joel Schlotthauer
Chris Hinze
Andreas Foltyn
Luzian Hahn
Fabian Kuech

Retrieval Augmented Generation (RAG) works as a backbone for interacting with an enterprise's own data via Conversational Question Answering (ConvQA). In a RAG system, a retriever fetches passages from a collection in response to a question, which are then included in the prompt of a large language model (LLM) for generating a natural language (NL) answer. However, several RAG systems today suffer from two shortcomings: (i) retrieved passages usually contain their raw text and lack appropriate document context, negatively impacting both retrieval and answering quality; and (ii) attribution strategies that explain answer generation typically rely only on similarity between the answer and the retrieved passages, thereby only generating plausible but not causal explanations. In this work, we demonstrate RAGonite, a RAG system that remedies the above concerns by: (i) contextualizing evidence with source metadata and surrounding text; and (ii) computing counterfactual attribution, a causal explanation approach where the contribution of an evidence to an answer is determined by the similarity of the original response to the answer obtained by removing that evidence. To evaluate our proposals, we release a new benchmark ConfQuestions: it has 300 hand-created conversational questions, each in English and German, coupled with ground truth URLs, completed questions, and answers from 215 public Confluence pages. These documents are typical of enterprise wiki spaces with heterogeneous elements. Experiments with RAGonite on ConfQuestions show the viability of our ideas: contextualization improves RAG performance, and counterfactual explanations outperform standard attribution.

Integrating Knowledge Graphs and Neuro-Symbolic AI: LDM Enables FAIR and Federated Research Data Management

Ahmad Sakor
Mauricio Brunet
Enrique Iglesias
Ariam Rivas
Philipp D. Rohde
Angelina Kraft
Maria-Esther Vidal

Managing research digital objects (RDOs) in compliance with FAIR principles is crucial for ensuring accessibility, interoperability, and reusability across scientific domains. The Leibniz Data Manager (LDM) is a state-of-the-art framework that integrates Knowledge Graphs (KGs) and Neuro-Symbolic AI, combining the reasoning power of Large Language Models (LLMs) with structured metadata. LDM supports the management and enhancement of RDOs through entity linking, connecting datasets to external KGs like Wikidata and the Open Research Knowledge Graph (ORKG). Additionally, LDM offers federated query processing across KGs, enabling users to explore related papers, datasets, and resources through natural language questions. This demo showcases LDM's capabilities to explore RDOs, compare existing datasets, and extend metadata. By blending Neuro-Symbolic AI with FAIR and federated research data management, LDM offers a powerful tool for accelerating data-driven discovery in science. LDM is publicly accessible at https://service.tib.eu/ldmservice/.

Lightning IR: Straightforward Fine-tuning and Inference of Transformer-based Language Models for Information Retrieval

Ferdinand Schlatt
Maik Fröbe
Matthias Hagen

A wide range of transformer-based language models have been proposed for information retrieval tasks. However, including transformer-based models in retrieval pipelines is often complex and requires substantial engineering effort. In this paper, we introduce Lightning~IR, an easy-to-use PyTorch Lightning-based framework for applying transformer-based language models in retrieval scenarios. Lightning IR provides a modular and extensible architecture that supports all stages of a retrieval pipeline: from fine-tuning and indexing to searching and re-ranking. Designed to be scalable and reproducible, Lightning IR is available as open-source: https://github.com/webis-de/lightning-ir.

Ventana a la Verdad (Window to the Truth): A Chatbot Application for Navigating The Colombian Truth Commission's Archives

Anna Sokol
Matthew L. Sisk
Josefina Echavarría Alvarez
Nitesh Chawla

We present Ventana a la Verdad, a chatbot designed to make the Clar- ification Archive and the reports of the Colombian Truth Commis- sion [6] more accessible to a wider audience. These archives contain a wealth of documents, interviews, and testimonies from Colom- bia's armed conflict, but navigating them can be challenging due to their volume and complexity. Using existing large language models (LLMs) and natural language processing techniques, our chatbot allows users to interact with the archives through natural language queries, receiving relevant and contextually appropriate responses. In the sensitive context of peace and reconciliation, where misin- formation or hallucinations can have significant adverse effects, ensuring the accuracy and reliability of information is paramount. This tool aims to facilitate better understanding and engagement with historical content, supporting educational and research efforts. We discuss the development of the chatbot, the challenges encoun- tered, and its potential impact on making the Colombian Truth Commission's archives more accessible. The chatbot is available by link here: http://ventanaverdad.lucyapps.net:1337/

Don't Forget This: Augmenting Results with Event-Aware Search

Hugo Sousa
Austin R. Ward
Omar Alonso

Events like Valentine's Day and Christmas can influence user intent when interacting with search engines. For example, a user searching for gift around Valentine's Day is likely to be looking for Valentine's-themed options, whereas the same query close to Christmas would more likely suggest an interest in Holiday-themed gifts. These shifts in user intent, driven by temporal factors, are often implicit but important to determine the relevance of search results. In this demo, we explore how incorporating temporal awareness can enhance search relevance in an e-commerce setting. We constructed a database of 2K events and, using historical purchase data, developed a temporal model that estimates each event's importance on a specific date. The most relevant events on the date the query was issued are then used to enrich search results with event-specific items. Our demo illustrates how this approach enables a search system to better adapt to temporal nuances, ultimately delivering more contextually relevant products.

LiveFC: A System for Live Fact-Checking of Audio Streams

Venktesh V
Vinay Setty

The advances in the digital era have led to rapid dissemination of information. This has also aggravated the spread of misinformation and disinformation. This has potentially serious consequences, such as civil unrest. While fact-checking aims to combat this, manual fact-checking is cumbersome and not scalable. While automated fact-checking approaches exist, they do not operate in real-time and do not always account for spread of misinformation through different modalities. This is particularly important as proactive fact-checking on live streams in real-time can help people be informed of false narratives and prevent catastrophic consequences that may cause civil unrest. This is particularly relevant with the rapid dissemination of information through video on social media platforms or other streams like political rallies and debates. Hence, in this work we develop a platform named LiveFC, that can aid in fact-checking live audio streams in real-time. LiveFC has a user-friendly interface that displays the claims detected along with their veracity and evidence for live streams with associated speakers for claims from respective segments. The app can be accessed at http://livefc.factiverse.ai and a screen recording of the demo can be found at https://bit.ly/3WVAoIw.

WildlifeLookup: A Chatbot Facilitating Wildlife Management with Accessible Data and Insights

Xiangqi Wang
Tianyu Yang
Jason Rohr
Brett Scheffers
Nitesh Chawla
Xiangliang Zhang

Wildlife management is increasingly reliant on data-driven insights to address the impacts of climate change on species and ecosystems. However, the complexity of accessing and querying large, multimodal datasets often limits the ability of non-technical users, such as wildlife managers and conservationists, to make informed decisions. To address this challenge, we present WildlifeLookup, a public accessible, intelligent chatbot designed to facilitate natural language interaction with a novel knowledge graph (KN-Wildlife) that houses critical wildlife and environmental data. WildlifeLookup simplifies access to species distributions, habitat interactions, and climate-related events by converting user queries into precise graph queries, reducing the technical barriers for end users. The chatbot WildlifeLookup is available at https://oknbot.ngrok.dev/

SESSION: Doctoral Consortium

Mind Over Machine: Evaluating Theory of Mind Reasoning in LLMs and Humans

Maryam Amirizaniani

As Large Language Models (LLMs) become increasingly integrated into applications demanding human-like understanding-such as mental health support, education, and social robotics-their capacity to exhibit Theory of Mind (ToM) reasoning is essential. Although previous research has evaluated LLMs' capability in ToM tasks, a critical gap remains as few studies have systematically investigated how LLMs' ToM reasoning diverges from human reasoning and the extent of these differences. This study introduces a reinforcement learning-based framework designed to bridge this gap. This approach seeks to enhance LLM alignment with human ToM reasoning, effectively narrowing the differences in their reasoning processes. Finally, future research directions to advance this field will be discussed, including strategies for developing LLMs that can better approximate human social cognition. This work lays a foundation for responsible LLM deployment, offering guidelines for applications in sensitive contexts where accurate ToM understanding is crucial.

Edge-Centric Network Analytics

Xueqi Cheng

Network analysis has evolved substantially, with notable advancements in node-centric and graph-centric tasks, yet the exploration of edge-centric analytics has been notably limited. This oversight is significant given the crucial role of edges in elucidating the complex relationships within networks, particularly in fields such as social network analysis, cybersecurity, and bioinformatics, where the dynamics of connections between entities are often pivotal. My doctoral research aims to address this gap by delving into the under-explored domain of edge-centric analytics, providing a foundational background that is crucial for advancing the field and enhancing the application of network theory in real-world scenarios. The significance of this research lies in its potential to open new avenues for inquiry and application across diverse disciplines where understanding the nuances of relational dynamics is essential.

Bearing Power Loss Predictions in Wind Turbine Gearbox: An Approach Based on LLMs

Janice Anta Zebaze
Azanzi Jiomekong
Innocent Souopgui
Germaine Djuidje Kenmoe

A constant and consistent supply in electrical energy in a location is a reflection of a good economy. Developing countries nevertheless don't have access to this quality of energy, which slows down their economy and consequently development. Wind is a clean, sustainable and renewable resource which can be used to meet the energy needs in such countries. However, the intermittent nature of wind yields fluctuations on the amount of energy produced by a wind turbine. Coupled with frictional power losses in the wind turbine gearbox bearings, one can't be sure on the exact amount of energy that will be produced. This leads to the distribution and management issues. To tackle this issue, we propose here the use of Large Language models. These are tools which have been proving their potential in various domains till date and whose potential are still to be seen in the field to our knowledge. Taking advantage of their flexibility and adaptability to any model and dataset, we intend to explore its abilities in the fields of wind energy and tribology. Making use of available data, predictions on the wind energy potential and power losses will be carried out using Large Language models such as BERT. The results of this work intends to promote the use of wind energy by lifting barriers in thee management and knowledge of the resource.

SESSION: Industry Day Talks

Advancing Voice AI for E-commerce: Tracking ASR Model Performance at Scale

Dhruv Agarwal
Nupur Neti
Federica Cerina

Traditionally, automatic speech recognition (ASR) systems rely on human transcriptions to calculate word error rate (WER) by comparing ASR outputs to manual transcriptions. Recently, Amazon's mobile voice shopping platform stopped storing audio from incoming requests to enhance customer privacy, making offline, human-based evaluation unfeasible. This presentation introduces a multitask Speech LLM-based system that processes real-time audio, extracting key features to track ASR performance and detect traffic shifts-all without storing audio or requiring human annotations. Additionally, we demonstrate how combining these features with a synthetic audio generation model (TTS) enables accurate detection of ASR performance degradation, ensuring continuous optimization of the customer voice experience.

Beyond Relevance: A Demand Balancer Model for Rental Platforms with Single-Unit Inventory

Guilherme G. Bonaldo
Pedro F. Nogueira
Tetiana Torovets
Thays F. da Silva

House rental marketplace platforms face unique challenges due to their single-unit inventory nature, where each property is distinct and can only be rented to one tenant. Traditional ranking systems typically optimize for user-item fit [1--5]. For the rental market, this optimization creates demand bottlenecks by continuously directing users to already popular properties. This approach results in high-competition scenarios where multiple users are drawn to the same high-relevance properties, generating excessive competing offers. This can lead to user frustration, as only one individual can ultimately secure the property, leaving others dissatisfied despite their high compatibility with the listing. At the same time, given the nature of property technology businesses being inherently supply-constrained, demand concentration negatively impacts landlords as well, creating challenges for those struggling to rent out their properties, facing longer vacancy periods and increasing the risk of renting elsewhere. To redistribute demand across our house rental marketplace, our solution incorporates the likelihood of successful conversion based on historical user-house interaction signals, including visits, offers, and others. By dynamically adjusting property visibility based on predicted rental probability, we effectively redistribute user attention to low demand properties while minimizing losses in relevance. The implementation of this system in a large-scale rental marketplace through an online controlled experiment resulted in 4% increase in unique houses receiving offers from users and a 3% improvement in contract conversion rates. These results suggest that incorporating availability predictions into ranking systems can lead to more efficient marketplace dynamics while maintaining user satisfaction, without impacting user engagement. Our approach provides a framework for balancing demand in marketplaces with unique inventory constraints.

Compliant Personalization for Recommended Documents in Microsoft 365 with L-Profile as an Exemplary Feature

Matthias Braunhofer
Grzegorz Kukla
Abhishek Arun

Collecting and utilizing user data is essential for effective recommender systems to personalize content. However, privacy and compliance regulations protect personal user data. With strict regulations such as the General Data Protection Regulation (GDPR) or California Privacy Rights Act (CPRA) in effect, one may ask: how can a recommender system be both compliant and effective? This paper aims to answer this question, demonstrating privacy-compliant personalization for the Recommended Documents service within Microsoft 365 (M365), particularly Microsoft Feed. It outlines the development of an exemplary L-Profile personalization feature from conception to productionization, covering offline and online evaluations.

Efficient Creative Selection in Online Advertising using Top-Two Thompson Sampling

Daiki Katsuragawa
Yusuke Kaneko
Kaito Ariu
Kenshi Abe

In online advertising, identifying the optimal creative is critical to maximizing performance. This study examines the application of top-two Thompson sampling (TTTS), an adaptive experimental design method, as an efficient alternative to traditional A/B testing for identifying the optimal creative. Our experiments on an online advertising platform highlight the effectiveness of TTTS in both accurately identifying the optimal creative and minimizing experimental costs, underscoring its potential as a promising approach to creative selection.

Zero-Shot Image Moderation in Google Ads with LLM-Assisted Textual Descriptions and Cross-modal Co-embeddings

Enming Luo
Wei Qiao
Katie Warren
Jingxiang Li
Eric Xiao
Krishna Viswanathan
Yuan Wang
Yintao Liu
Jimin Li
Ariel Fuxman

We present a scalable and agile approach for ads image content moderation at Google, addressing the challenges of moderating massive volumes of ads with diverse content and evolving policies. The proposed method utilizes human-curated textual descriptions and cross-modal text-image co-embeddings to enable zero-shot classification of policy violating ads images, bypassing the need for extensive supervised training data and human labeling. By leveraging large language models (LLMs) and user expertise, the system generates and refines a comprehensive set of textual descriptions representing policy guidelines. During inference, co-embedding similarity between incoming images and the textual descriptions serves as a reliable signal for policy violation detection, enabling efficient and adaptable ads content moderation. Evaluation results demonstrate the efficacy of this framework in significantly boosting the detection of policy violating content.

Navigating the Hype and Embracing the Hope: The Future of Generative AI in Retail Product Discovery

Darshan Nagaraja

Generative AI is poised to revolutionize the retail industry, offering transformative potential to enhance customer experiences and operational efficiencies. Despite immense benefits -- such as personalized search, shopping assistance, product content enrichment, and dynamic recommendations -- the industry faces challenges distinguishing genuine value from overhyped applications. According to McKinsey & Company, Generative AI could unlock $240 billion to $390 billion in economic value for retailers, potentially increasing margins by up to 1.9 percentage points [1]. This paper navigates the delicate balance between the hype and the tangible promise of Generative AI in retail. We explore practical applications that deliver real value, critically examine overhyped use cases, and share strategies for successful AI integration. Readers will gain insights into effectively leveraging Generative AI while avoiding common pitfalls.

Personalization At Doordash: From Conversion Modeling To Multi-objective Long-term Value Optimization

Qilin Qi

Doordash is one of the largest platform in the world to connect millions of local business with customers. We use advanced machine learning technologies to build a personalized customer experience and help customers discover a variant of local businesses they love. In this talk, we will introduce a few technologies we used to build our personalized homepage experience and the lessons learned during the process. Customers use our platform in different ways, they can browse on homepage, search on search bar or respond to a push notification or an email sent to them. There are also different types of actions they can take during their shopping journeys, included but not limited to views, (good) clicks, add-to-cart, and checkout. We will first introduce how we leverage customers various action sequence and transformer to build our user interest model to understand customer interests. Doordash homepage has a very vivid design containing different components and complex layout to serve our customers. The stores are organized with themes into an UI component that we call carousel. The stores, carousels and other UI components are mixed on our homepage to showcase a diverse set of options and deals customers can choose from. The complex homepage design poses challenges for homepage ranking. We build a heterogeneous ranking system to rank different type of components in a 2-D layout. Traditionally, our ranking model is optimized for conversion. However, as our business grows, we have multiple business objectives to care about. In the meanwhile, we also want to optimize for customers long term satisfaction so we can sustain and grow our platform. We will describe how do we model customers long term value and build a multi-objective ranking and optimization system to optimize and balance multiple business objectives.

SpecialtyScribe: Enhancing SOAP note Scribing for Medical Specialties using LLMs

Eti Rastogi
Sagar Goyal
Fen Zhao
Dong Yuan

The healthcare industry has accumulated vast amounts of unstructured clinical data, including medical records, patient communications, and visit notes. Clinician-patient conversations are central to medical records, with the clinician's final summary (the medical note) serving as the key reference for future interactions and treatments. Creating a concise and accurate medical SOAP note is crucial for quality patient care and is especially challenging for specialty care, which requires added focus on relevance to the specialty, clarity, absence of hallucinations, and adherence to doctor preferences. This makes it very challenging for a general-purpose LLM to create satisfactory notes. Some recent LLMs, like GPT-4, have shown promise in medical note generation; however, the high cost, size, latency, and privacy concerns associated with closed models make them impractical for many healthcare facilities. In this talk, we will present our method ''SpecialtyScribe'', which is a modular pipeline for generating specialty-specific medical notes. It consists of three main components: an Information Extractor module that captures relevant specialty data, a Context Retriever module that retrieves and verifies the relevant context from the transcript, and a Note Writer module that generates medically acceptable notes based on the extracted information. Our framework outperforms any naively prompt-engineered model by more than 32% on expert scoring, and our in-house models surpass similarly sized open-source models by more than 100% on ROUGE based metrics. The in-house models also match the overall performance of the best closed-source LLMs while being less than 1% the estimated size of them.

We'll showcase multiple ablations across our pipeline, mitigation of hallucinations, the role of retrievers, and the importance of scalable pipelines for multiple specialties. We'll also discuss the design of our human-expert scoring mechanism for various language model use cases.

Fact-checking Multilingual Podcasts

Vinay Setty

Long-form content, such as YouTube videos and podcasts, has become widely popular, particularly among younger audiences. However, the extended format of this content presents unique challenges for fact-checking them. This talk explores the feasibility of an end-to-end solution for transcribing and fact-checking long-form content in a multilingual context. I will conclude by presenting the selected technologies and models for end-to-end fact-checking in this domain.

SESSION: WSDM Day Talks

Using Photon-Counting CT Images for Lung Nodule Classification

Leonie Basso
Zahra Ahmadi
Steffen Oeltze-Jafra
Eike Petersen
Hoen-oh Shin
Andrea Schenk

An automatic classification of the malignancy of lung nodules in computed tomography (CT) scans can support early detection of lung cancer, which is crucial for the treatment success. The novel photon-counting CT (PCCT) technology enables high image quality with a low radiation dose and provides additional spectral information. This research focuses on whether PCCT scans offer a benefit in the automatic classification of lung nodules. Establishing a dataset of PCCT images poses several challenges, such as the extraction of annotations or the data imbalance.

HyKG-CF: A Hybrid Approach for Counterfactual Prediction using Domain Knowledge

Hao Huang
Maria-Esther Vidal

Predictive models are gaining attention as powerful tools for aiding clinicians in diagnosis, prognosis, and treatment recommendations. However, their reliance on associative patterns may raise concerns about reliability of decision support, as association does not necessarily imply causation. To address this limit, we propose HyKG-CF, a hybrid approach to counterfactual prediction that leverages data and domain knowledge encoded in knowledge graph (KG). HyKG-CF integrates symbolic reasoning (on knowledge) with numerical learning (on data) using large language models (LLMs) and statistical models to learn causal Bayesian networks (CBNs) for accurate counterfactual prediction. Using data and knowledge, HyKG-CF improves the accuracy of causal discovery and counterfactual prediction. We evaluate HyKG-CF on a non-small cell lung cancer (NSCLC) KG, demonstrating that it outperforms other baselines. The results highlight the promise of combining domain knowledge with causal models to improve counterfactual prediction.

Enhancing Medical Knowledge Discovery: A Neuro-symbolic System for Inductive Learning over Medical KGs

Disha Purohit
Yashrajsinh Chudasama
Maria-Esther Vidal

Medical knowledge graphs (KGs) excel at integrating heterogeneous healthcare data with domain knowledge, but face challenges due to incompleteness. While Knowledge Graph Embedding (KGE) models show promise in link prediction, they often fail to incorporate crucial semantic constraints from medical ontologies and clinical guidelines. We propose a neuro-symbolic system that enhances medical knowledge discovery by combining symbolic learning from medical ontologies, inductive learning through KGE, and semantic constraint validation. Applied to lung cancer care, our system demonstrates enhanced performance in predicting novel medical relationships while maintaining semantic consistency with medical knowledge. Experimental results show our approach enhances the KGE model's performance while ensuring clinical validity and the implementation is publicly accessible on GitHub https://github.com/SDM-TIB/KOSMOS.

BioLinkerAI: Leveraging LLMs to Improve Biomedical Entity Linking and Knowledge Capture

Ahmad Sakor
Kuldeep Singh
Maria-Esther Vidal

We introduce BioLinkerAI, a neuro-symbolic framework for biomedical entity linking that integrates symbolic (domain-specific and linguistic rules) and sub-symbolic (large language models) components. Unlike traditional approaches requiring extensive labeled training data, BioLinkerAI harnesses a knowledge base and rules for candidate generation, while a pre-trained LLM handles final disambiguation. This combination ensures adaptability to diverse biomedical knowledge bases and complex entity mentions. Empirical evaluations show that BioLinkerAI surpasses state-of-the-art benchmarks, notably increasing unseen data accuracy from 65.4 to 78.5 without relying on extensive labeled datasets.

A Systematic Evaluation of Single-Cell Foundation Models on Cell-Type Classification Task

Nicolas Steiner
Ziteng Li
Omid Vosoughi
Johanna Schrader
Soumyadeep Roy
Wolfgang Nejdl
Ming Tang

This study presents a comprehensive benchmarking of three state-of-the-art single-cell foundation models scGPT, Geneformer, and scFoundation, on cell-type classification tasks. We evaluate the models on three datasets: myeloid, human pancreas, and multiple sclerosis, examining both standard fine-tuning and few-shot learning scenarios. Our work reveals that scFoundation consistently achieves the best performance while Geneformer performs poorly, yielding results sometimes even worse than those of the baseline models. Additionally, we demonstrate that a good foundation model can generalize well even when fine-tuned with out-of-distribution data, a capability that the baseline models lack. Our work highlights the potential of foundation models for addressing challenging biomedical questions, particularly in contexts where models are trained on one population but deployed on another.

SESSION: Workshops

1st Workshop on Detecting Trust, Authority, Sense and Knowledge in Online News Media Production

Giovanni Fulantelli
Davide Taibi
Sergio Splendore
Marco Fisichella

This workshop explores the intersection of media trust, journalistic authority and the role of Artificial Intelligence (AI) within platform-based media ecosystems. As social media becomes a primary source of news for global audiences media uses are changing public discourses and trust in journalism. The workshop includes papers that use innovative methods to analyze media trust and intertwining role of platforms (e.g. Facebook, Instagram, TikTok, and X). Key topics include AI-powered news curation, personalized news feeds and the impact of algorithm-driven content. Advanced techniques such as natural language processing, sentiment analysis and semantically enriched entity models are central to understanding and visualizing interactions between users, news providers and content. By integrating these approaches, the workshop aims to foster interdisciplinary discussions, propose new analytical frameworks and contribute to the development of balanced, transparent and fair news ecosystems in the digital age.

IWILDS'25: The 5th International Workshop on Investigating Learning During Web Search

Anett Hoppe
Ran Yu
Jiqun Liu
Nilavra Bhattacharya

Web-based learning is evolving rapidly as traditional search engines are complemented by Large Language Models (LLMs) and other AI technologies. This evolution offers new opportunities, such as automated information synthesis and personalized learning experiences. However, this also presents new challenges, including the need for learners to be aware of potential biases and misinformation in AI-generated content, and to maintain focus and depth in their learning journeys.

The Search as Learning (SAL) field investigates how individuals learn through web interactions, examining the interplay between search behaviors, information synthesis, and learning outcomes. SAL research aims to measure, predict, and support effective learning strategies in modern web environments. IWILDS'25 provides an interdisciplinary platform to explore these dynamics, bringing together researchers and practitioners from information retrieval, education, psychology, and related fields. This full-day workshop will feature keynotes, paper presentations, and discussions on advancing SAL research and practice.

Proposal for Workshop on Trust and Responsibility in Recommendation Systems at WSDM 2025

Xinghai Hu
Robin Burke
Xu Liu

This workshop aims to empower participants to design and audit recommendation systems that prioritize user trust and safety. Atten- dees will explore best practices and innovations in fairness, explain- ability, content safety, algorithm transparency, and the societal im- pacts of recommendation systems. The workshop will address tech- nical and ethical challenges, offering practical insights into risk mit- igation, social concerns, and technical innovations. Participants will leave equipped with actionable frameworks, methodologies, and tools to build responsible systems aligned with societal values and social good. Key activities include keynotes, paper presentations, panel discussions, and interactive sessions, covering topics such as transparency, security, fairness, robustness, and participatory AI. The website of the conference is https://sites.google.com/view/t-rrs.

LLM4Eval@WSDM 2025: Large Language Model for Evaluation in Information Retrieval

Hossein A. Rahmani
Clemencia Siro
Mohammad Aliannejadi
Nick Craswell
Charles L.A. Clarke
Guglielmo Faggioli
Bhaskar Mitra
Paul Thomas
Emine Yilmaz

Large language models (LLMs) have demonstrated increasing task-solving abilities not present in smaller models. Utilizing the capabilities and responsibilities of LLMs for automated evaluation (LLM4Eval) has recently attracted considerable attention in multiple research communities. For instance, LLM4Eval models have been studied in the context of automated judgments, natural language generation, and retrieval augmented generation systems. We believe that the information retrieval community can significantly contribute to this growing research area by designing, implementing, analyzing, and evaluating various aspects of LLMs with applications to LLM4Eval tasks. The main goal of LLM4Eval workshop is to bring together researchers from industry and academia to discuss various aspects of LLMs for evaluation in information retrieval, including automated judgments, retrieval-augmented generation pipeline evaluation, altering human evaluation, robustness, and trustworthiness of LLMs for evaluation in addition to their impact on real-world applications. We also plan to run an automated judgment challenge prior to the workshop, where participants will be asked to generate labels for a given dataset while maximising correlation with human judgments. The format of the workshop is interactive, including roundtable and keynote sessions and tends to avoid the one-sided dialogue of a mini-conference. This is the second iteration of the workshop. The first version was held in conjunction with SIGIR 2024, attracting over 50 participants.

Disinformation and Misinformation in the Age of Generative AI

Koustav Rudra
Niloy Ganguly
Jeanne Mifsud Bonnici
Eric Müller-Budack
Ritumbra Manuvie

The rapid rise of generative AI (GenAI) technologies has revolutionized the way content is created and disseminated. As a result, highly convincing human-like malicious content including disinformation, misinformation, and propaganda can now be easily produced and distributed across the web. The diversity of generation models combined with various manipulation strategies applied to different modalities presents significant challenges for fact-checking systems and content moderation. To address this issue, we organize a workshop that focuses on harmful content that has been created intentionally (disinformation) and unintentionally (misinformation) in the era of generative AI. The workshop features specialized tracks on multimodal solutions, investigating narratives, trustworthy AI systems, and policy interventions. By bringing together experts from computer science and law, the workshop offers a comprehensive framework for combating fake content online.