Emotions shape every aspect of social life, yet their role in digital communication has been under-explored. Research on social media has largely focused on how people share and consume information online, while comparatively less attention has been paid to how emotions organize attention, shape identity, and drive collective behavior. This talk advocates the idea of social media platforms as emotional ecosystems, where affect is not merely expressed but also spreads between people, interacts with beliefs and psychological states, and transforms social dynamics of entire populations.
Advances in natural language processing have given us tools to detect discrete emotions and moral sentiments from online text. These tools helped reveal that online platforms do more than transmit emotion: they enable emotional contagion, whereby exposure to others' emotions shapes users' own emotional expressions, beliefs, sense of identity, and feelings of trust and belonging.
The talk shows how these emotional dynamics underlie a range of emergent social phenomena. In the political domain, emotional dynamics contribute to affective polarization, characterized by in-group favoritism and out-group animosity. The talk shows that interactions with ideological out-groups contain more anger, disgust, and toxic language, while in-group interactions express more joy and shared fear, reinforcing group cohesion and a sense of safety. These emotional asymmetries help explain why echo chambers feel psychologically protective while simultaneously deepening ideological divides and eroding trust.
Beyond politics, emotional dynamics also shape mental health outcomes. The talk examines communities organized around harmful identities and behaviors, such as pro-eating disorders spaces, where emotional validation and peer support coexist with the normalization of self-harm and psychopathologies. In these settings, emotional contagion and group dynamics draw vulnerable individuals into feedback loops that entrench maladaptive beliefs and impede recovery. These dynamics are similar to those of online radicalization, highlighting common emotional pathways across seemingly disparate domains.
The talk concludes by examining emergent phenomenon in digital emotional life: emotionally intelligent AI. Modern AI chatbots trained on large corpora of human conversation possess remarkable emotional intelligence and ability to mirror emotions of their conversation partners. This capacity enables them to emulate core mechanisms of human intimacy formation, thereby accelerating emotional bonding and fostering illusions of intimacy, particularly among vulnerable users. This presents novel risks for emotional manipulation and psychological distress.
Through large-scale empirical studies and dynamical systems modeling, the talk demonstrates how emotional dynamics---rather than factual disagreement or disinformation alone---drive societal division and mental health challenges. Understanding the social psychology of digital emotions is not just a scientific opportunity but a necessary step toward creating healthier, more resilient online spaces.
Temporal information is crucial for information retrieval, yet most dense retrieval systems focus exclusively on semantic similarity while neglecting temporal alignment between queries and documents. We propose TempRetriever, a lightweight framework that explicitly incorporates temporal information into dense passage retrieval through learned fusion techniques. Unlike existing approaches requiring extensive architectural modifications or specialized pre-training, TempRetriever enhances standard dense retrievers by combining semantic embeddings with temporal representations using four fusion strategies: Feature Stacking, Vector Summation, Relative Embeddings, and Element-Wise Interaction. Our approach introduces a learned temporal encoder and time-based negative sampling strategy to address temporal misalignment during training. We evaluate TempRetriever on three temporal question answering datasets (ArchivalQA, ChroniclingAmericaQA, NobelPrize) spanning altogether years from 1800 to 2022. TempRetriever achieves substantial improvements over standard DPR: 6.86% on ArchivalQA (Recall@1) and 4.40% on ChroniclingAmericaQA (Recall@1). Our method also outperforms state-of-the-art temporal retrieval systems, obtaining 9.62% improvement over BiTimeBERT and 5.16% over TS-Retriever. Notably, TempRetriever's fusion techniques can enhance existing temporal methods, improving BiTimeBERT by 5.12% and TS-Retriever by 6.17%, demonstrating modularity and practical value. Zero-shot evaluation confirms strong generalization across domains, and integration with retrieval-augmented generation shows consistent end-to-end improvements.
Recommendation systems are an integral part of daily life, influencing how people interact with and access information. The content recommended to users shapes their perceptions, making it crucial to eliminate biases that could negatively impact those perceptions. One such bias is the popularity bias which causes the long-tail item recommendation problem, where systems tend to favor popular items while overlooking less popular yet relevant ones.
To address this issue and improve recommendations for long-tail items, we propose SAGERec which learns user representations by leveraging both user features and the representations of the most informative items they have interacted with. Since identifying these informative items in latent space is challenging, we employ a trainable sampler that learns to select them effectively. Furthermore, we incorporate two expert models that sample items and aggregate them differently and fuse their outputs through a tail-aware gating network. Extensive experiments demonstrate that SAGERec significantly outperforms state-of-the-art baselines across three publicly available datasets, highlighting its effectiveness in addressing the long-tail item recommendation problem. The source code is available at: https://github.com/alshabae/SAGERec.
Graphs model latent variable relationships in many real-world systems, and Message Passing Neural Networks (MPNNs) are widely used to learn such structures for downstream tasks. While edge-based MPNNs effectively capture local interactions, their expressive power is theoretically bounded, limiting the discovery of higher-order relationships. We introduce the Higher-Order Graph Attention (HoGA) module, which constructs a k-order attention matrix by sampling subgraphs to maximize diversity among feature vectors. Unlike existing higher-order attention methods that greedily resample similar k-order relationships, HoGA targets diverse modalities in higher-order topology, reducing redundancy and expanding the range of captured substructures. Applied to two single-hop attention models, HoGA achieves at least a 5% accuracy gain on all benchmark node classification datasets and outperforms recent baselines on six of eight datasets. Code is available at https://github.com/TB862/Higher_Order.
In app recommendation, user cold-start remains a fundamental challenge in recommender systems. Existing approaches primarily focus on efficiently leveraging limited data or transferring knowledge from active users to alleviate the user cold-start problem, yet they often overlook the influence of feature interactions on user cold-start. We group features according to their semantic types and identify an interesting phenomenon: user attribute features and behavioral sequence features interfere with each other, thereby constraining the model's ability to represent cold-start users effectively. We attribute this issue to differences in the latent space distributions and learning complexities of the two feature types, which hinder the model from accurately capturing cold-start users' interests. To address this challenge, we propose the AFIM architecture, which decouples the learning of user attribute and behavior sequential features. AFIM leverages a lightweight attention module to explicitly capture user interests from behavioral sequences, thereby reducing the learning burden on downstream recommendation networks. Additionally, it incorporates feature decoupling and dynamic fusion modules to mitigate learning bias arising from heterogeneous feature spaces. Extensive experiments on two public datasets and two industrial datasets demonstrate that AFIM consistently outperforms SOTA baselines, highlighting its effectiveness in user cold-start scenarios.
Efficiently retrieving a concise set of candidates from a large doc- ument corpus remains a pivotal challenge in Information Retrieval (IR). Neural retrieval models, particularly dense retrieval models built with transformers and pretrained language models, have been popular due to their superior performance. However, criticisms have also been raised on their lack of explainability and vulnerability to adversarial attacks. In response to these challenges, we propose to improve the robustness of dense retrieval models by enhancing their sensitivity of fine-grained relevance signals. A model achieving sensitivity in this context should exhibit high variances when doc- uments' key passages determining their relevance to queries have been modified, while maintaining low variances for other changes in irrelevant passages. This sensitivity allows a dense retrieval model to produce robust results with respect to attacks that try to promote documents without actually increasing their relevance. It also makes it possible to analyze which part of a document is actually relevant to a query, and thus improve the explainability of the retrieval model. Motivated by causality and counterfactual analysis, we propose a se- ries of counterfactual regularization methods based on game theory and unsupervised learning with counterfactual passages. Specifically, we first introduce a cooperative game theory-based counterfactual passage extraction method, identifying the key passages that can influence relevance. Then we propose several subsequent unsuper- vised learning tasks, based on these counterfactual passages, serve to regularize the model's learning process to improve the robustness and sensitivity. Experiments show that, our method can extract key passages without reliance on the passage-level relevance annotations. Moreover, the regularized dense retrieval models exhibit heightened robustness against adversarial attacks, surpassing the state-of-the-art anti-attack methods.
Multi-personality generation for LLMs, enabling simultaneous embodiment of multiple personalization attributes, is a fundamental challenge. Existing retraining-based approaches are costly and poorly scalable, while decoding-time methods often rely on external models or heuristics, limiting flexibility and robustness. In this paper, we propose a novel Multi-Personality Generation (MPG) framework under the decoding-time combination paradigm. It flexibly controls multi-personality without relying on scarce multi-dimensional models or extra training, leveraging implicit density ratios in single-dimensional models as a ''free lunch'' to reformulate the task as sampling from a target strategy aggregating these ratios. To implement MPG efficiently, we design Speculative Chunk-level based Rejection sampling (SCR), which generates responses in chunks and parallelly validates them via estimated thresholds within a sliding window. This significantly reduces computational overhead while maintaining high-quality generation. Experiments on MBTI personality and Role-Playing demonstrate the effectiveness of MPG, showing improvements up to 16%–18%. Code and data are available at https://github.com/Libra117/MPG.
Self-training for Named Entity Recognition (NER) aims at identifying named entities and their types in the text using self-training to fully make use of the limited labeled data and a large amount of unlabeled data. The major challenge in self-training is confirmation bias where incorrect pseudo-labels increase errors. Many efforts have been made to address this challenge, but few labeled data limit their performance. In this paper, we introduce Large Language Model (LLM) into self-training to select high-quality pseudo-labels leveraging its rich knowledge and few-shot learning capability. Specifically, we design a comprehensive prompt to improve the judgment performance of LLM, where the prompt incorporates task rules mined by LLM itself to fully leverage labeled data. In addition, to reduce the impact of LLM's hallucinations, we adopt a collaborative pseudo-label selection based on combined confidence and calibration-guided probability smoothing. Our empirical study conducted on several NER datasets shows that our method outperforms state-of-the-art approaches. The code is available at https://github.com/cheniison/llm-judged-ST.
Evaluating the coding capabilities of models through algorithmic code generation is challenging, as it requires deep problem understanding and complex algorithm design. Current benchmarks suffer from a narrow focus on final execution results (such as pass@k), neglecting the crucial reasoning and problem-solving processes inherent in code generation. To address this limitation, we introduce a multi-phase algorithmic code generation benchmark, MUPA, structured around human computational thinking. MUPA dissects the evaluation into four distinct phases: example understanding, algorithm selection, solution description, and code generation. This framework facilitates a comprehensive assessment by providing insights into the model's intermediate problem-solving steps, rather than just the final code. We manually curated 197 high-quality competitive programming problems from Codeforces. Utilizing an LLM-as-a-judge paradigm with specialized prompts, our rigorous evaluation of several existing code generation LLMs reveals significant across-the-board challenges. Notably, we establish a positive correlation, indicating that proficiency in an earlier phase directly impacts performance in subsequent phases, underscoring the interdependency of these algorithmic skills. The benchmark is publicly available at https://github.com/cheniison/MUPA.
Dynamic recommendation systems must efficiently retrieve relevant items for users while adapting to evolving interactions. Traditional methods often struggle with scalability and computational costs when modeling item representations. A key challenge is capturing short-term co-occurrence patterns that reflect dynamic user preferences. In this work, we introduce RPE4Rec, a lightweight and efficient plugin module that enhances dynamic node retrieval by leveraging Relative Position Encoding (RPE). Our method employs a hashtable-based neighborhood memory to capture short-term co-occurrence signals while maintaining low space and time complexity. By incorporating RPE into mainstream recommendation models, RPE4Rec significantly improves item retrieval performance with minimal additional cost. Extensive experiments on multiple datasets demonstrate that our approach consistently augments mainstream recommendation models, substantially improving item retrieval performance with limited additional cost.
Time series forecasting (TSF) traditionally relies on fast-thinking paradigms that map historical observations directly to future sequences of continuous values. While effective, such approaches often frame forecasting as a pattern-matching problem and tend to overlook explicit reasoning over temporal dynamics and contextual factors, which are critical for modeling long-range dependencies and non-stationary behaviors in real-world scenarios. Recent slow-thinking large language models (LLMs), such as OpenAI o1 and DeepSeek-R1, demonstrate strong inference-time multi-step reasoning abilities. This raises a fundamental question: can slow-thinking LLMs reason over temporal dynamics to support accurate TSF, even without task-specific training? To investigate this question, we present TimeReasoner, a systematic empirical study that reformulates TSF as a conditional reasoning process performed entirely at inference time. TimeReasoner integrates hybrid instructions consisting of task directives, timestamps, sequential values, and optional contextual features, and induces multi-step temporal reasoning in pretrained slow-thinking LLMs through chain-of-thought prompting and rollout-based reasoning strategies. Extensive experiments across diverse TSF benchmarks show that slow-thinking LLMs consistently outperform prior baselines or achieve competitive training-free forecasting performance. Beyond accuracy, we analyze how different inference-time reasoning strategies influence forecasting behaviors, highlighting both the potential and limitations of slow-thinking paradigms for TSF.
Taxonomies offer a powerful data structure for organizing information hierarchically and support practical web applications. However, the rapid emergence of new entities renders the manual taxonomy curation both time-consuming and costly. Research efforts have been focused on automatically integrating new entities to an appropriate hypernym-hyponym pair in the existing taxonomy. Most recent approaches formulate taxonomy completion as a plausibility-scoring task over candidate query–position pairs, optimized via contrastive learning (CL). However, these methods typically rely on static or randomly sampled negatives that are semantically trivial.
To address this, we propose TaxoDiff, a diffusion-guided dynamic negative sampling approach that reformulates negative sampling as a dynamic generative process in the latent space and enables flexible control over the hardness of negative examples during training. Specifically, TaxoDiff leverages a conditional denoising diffusion model to synthesize negative examples with adjustable semantic hardness, generating both easy and hard negatives at different timesteps. Altogether, a dynamic mixture of easy and hard negatives enables the model to produce robust feature representations needed for accurate taxonomy completion. Experimental results on three benchmark datasets show that TaxoDiff achieves state-of-the-art taxonomy completion performance.
Recently, joint advertising has gained significant attention as an effective approach to enhancing the efficiency and revenue of advertising slot allocation. Unlike traditional advertising, which allocates advertising slots exclusively to a single advertiser, joint advertising displays advertisements from brands and stores that have established a joint selling relationship within the same advertising slot. However, existing approaches often struggle to accommodate both joint and traditional advertising frameworks, thereby limiting the revenue potential and generalizability of joint advertising. Furthermore, these methods are constrained by two critical limitations: they generally neglect the influence of global externalities, and they fail to address the bidding variability stemming from multi-party advertiser participation. Collectively, these limitations present substantial challenges to the design of joint auction mechanisms. To address these challenges, we propose a Joint Auction Framework incorporating Externalities and Adaptation, and leverage the automated mechanism design (AMD) method through our proposed JEANet to compute joint auction mechanisms that satisfy the conditions of individual rationality (IR) and approximate dominant strategy incentive compatibility (DSIC). As the first AMD method to integrate global externalities into joint auctions, JEANet dynamically adapts to the bidding characteristics of multi-party advertiser and enables unified auctions that integrate both joint and traditional advertising. Extensive experimental results demonstrate that JEANet outperforms state-of-the-art baselines in multi-slot joint auctions.
Network tomography is a crucial problem in network monitoring, where the observable path performance metric values are used to infer the unobserved ones, making it essential for tasks such as route selection, fault diagnosis, and traffic control. However, most existing methods either assume complete knowledge of network topology and metric formulas—an unrealistic expectation in many real-world scenarios with limited observability—or rely entirely on black-box end-to-end models. To tackle this, in this paper, we argue that a good network tomography requires synergizing the knowledge from both data and appropriate inductive bias from (partial) prior knowledge. To see this, we propose Deep Network Tomography (DeepNT), a novel framework that leverages a path-centric graph neural network to predict path performance metrics without relying on predefined hand-crafted metrics, assumptions, or the real network topology. The path-centric graph neural network learns the path embedding by inferring and aggregating the embeddings of the sequence of nodes that compose this path. Training path-centric graph neural networks requires learning the neural netowrk parameters and network topology under discrete constraints induced by the observed path performance metrics, which motivates us to design a learning objective that imposes connectivity and sparsity constraints on topology and path performance triangle inequality on path performance. Extensive experiments on real-world and synthetic datasets demonstrate the superiority of DeepNT in predicting performance metrics and inferring graph topology compared to state-of-the-art methods.
Recent advancements in diffusion models have shown promising results in sequential recommendation (SR). Existing approaches predominantly rely on implicit conditional diffusion models, which compress user behaviors into a single representation during the forward diffusion process. While effective to some extent, this oversimplification often leads to the loss of sequential and contextual information, which is critical for understanding user behavior. Moreover, explicit information, such as user-item interactions or sequential patterns, remains underutilized, despite its potential to directly guide the recommendation process and improve precision. However, combining implicit and explicit information is non-trivial, as it requires dynamically integrating these complementary signals while avoiding noise and irrelevant patterns within user behaviors. To address these challenges, we propose Dual Conditional Diffusion Models for Sequential Recommendation (DCRec), which effectively integrates implicit and explicit information by embedding dual conditions into both the forward and reverse diffusion processes. This allows the model to retain valuable sequential and contextual information while leveraging explicit user-item interactions to guide the recommendation process. Specifically, we introduce the Dual Conditional Diffusion Transformer (DCDT), which dynamically integrate both implicit and explicit signals throughout the diffusion stages, ensuring contextual understanding and minimizing the influence of irrelevant patterns. Extensive experiments on public benchmark datasets demonstrate that DCRec significantly outperforms state-of-the-art methods.
Recommender systems (RecSys) are widely used across various modern digital platforms and have garnered significant attention. Traditional recommender systems usually focus only on fixed and simple recommendation scenarios, making it difficult to generalize to new and unseen recommendation tasks in an interactive paradigm. Recently, the advancement of large language models (LLMs) has revolutionized the foundational architecture of RecSys, driving their evolution into more intelligent and interactive personalized recommendation assistants. However, most existing studies rely on fixed task-specific prompt templates to generate recommendations and evaluate the performance of personalized assistants, which limits the comprehensive assessments of their capabilities. This is because commonly used datasets lack high-quality textual user queries that reflect real-world recommendation scenarios, making them unsuitable for evaluating LLM-based personalized recommendation assistants. To address this gap, we introduce RecBench+, a new dataset benchmark designed to assess LLMs' ability to handle intricate user recommendation needs in the era of LLMs. RecBench+ encompasses a diverse set of queries that span both hard conditions and soft preferences, with varying difficulty levels. We evaluated commonly used LLMs on RecBench+ and uncovered below findings: 1) LLMs demonstrate preliminary abilities to act as recommendation assistants, 2) LLMs are better at handling queries with explicitly stated conditions, while facing challenges with queries that require reasoning or contain misleading information. Our dataset has been released at https://github.com/jiani-huang/RecBenchPlus.
Decision tree optimization for regression tasks faces challenges in capturing complex feature interactions due to restrictive binary splits, leading to suboptimal predictive performance. Conventional methods such as greedy algorithms or genetic programming, struggle with scalability and semantic coherence in large search spaces. We propose GraphBeam-LLM, a novel framework that integrates large language models (LLMs) with beam search to optimize directed acyclic graphs (DAGs), augmented by a dynamic prompting mechanism. By replacing binary trees with multi-path graphs, our approach enhances expressiveness, while LLM-guided extensions ensure semantically meaningful conditions. Dynamic prompting adapts search goals across iterations, balancing diversity and performance. Experiments on regression datasets, including abalone, wine, and cholesterol, demonstrate that GraphBeam-LLM achieves state-of-the-art mean squared error (MSE) reductions of 6.7% to 14.5% compared to baselines like CART and LLEGO, with improved interpretability and robustness, validating its efficacy in addressing complex regression tasks.
As more content generated by large language models (LLMs) floods into the Internet, information retrieval (IR) systems now face the challenge of distinguishing and handling a blend of human-authored and machine-generated texts. Recent studies suggest that neural retrievers may exhibit a preferential inclination toward LLM-generated content, while classic term-based retrievers like BM25 tend to favor human-written documents. This paper investigates the influence of LLM-generated content on term-based retrieval models, which are valued for their efficiency and robust generalization across domains. Our linguistic analysis reveals that LLM-generated texts exhibit smoother high-frequency and steeper low-frequency Zipf slopes, higher term specificity, and greater document-level diversity. These traits are aligned with LLMs being trained to optimize reader experience through diverse and precise expressions. Our study further explores whether term-based retrieval models demonstrate source bias, concluding that these models prioritize documents whose term distributions closely correspond to those of the queries, rather than displaying an inherent source bias. This work provides a foundation for understanding and addressing potential biases in term-based IR systems managing mixed-source content. Code and supplementary material are available at https://github.com/Trustworthy-Information-Access/LLM-Impact-Term-Retrieval.
Given a user's risk preference and historical data, how can we generate diverse and high-quality portfolios for risk-aware portfolio optimization? Portfolio optimization is a core financial problem that balances returns and risks by determining asset allocations. While deterministic deep learning approaches can be directly optimized for a single solution, they offer limited flexibility to handle users' risk preferences. In contrast, stochastic methods typically rely on multi-stage processes, which complicate training and do not strictly align with the ultimate goal of generating optimal portfolios. In this paper, we propose Diffolio (Diffusion Models for Risk-Aware Portfolio Optimization), a novel diffusion-based framework that directly learns a pseudo-optimal portfolio distribution, addressing the drawbacks of (i) deterministic models lacking flexibility and (ii) stochastic models that are complex and misaligned with the true portfolio optimization objective. Instead of forecasting entire future time series, Diffolio directly samples portfolios, immediately adapting to user-specified risk levels through a dedicated risk guidance mechanism embedded in the denoising diffusion process. Empirical results on multiple real-world market datasets show that Diffolio significantly outperforms existing baselines in terms of return, risk control, and overall reliability. In particular, Diffolio achieves up to 12.1%p higher Annualized Rate of Return, demonstrating its strong potential as a risk-aware and objective-oriented solution to portfolio optimization.
Table reasoning requires models to jointly perform comprehensive semantic understanding and precise numerical operations. Although recent large language model (LLM)-based methods have achieved promising results, most of them still rely on a single-turn reasoning paradigm that processes flattened tables in a single forward pass. This paradigm suffers from inherent limitations, including context overflow on large tables, weak sensitivity to continuous numerical values, and the absence of explicit tool-use and reflection. In this paper, we propose TableMind, a tuning-based autonomous programmatic table agent that simulates the human-like cognitive schema of multi-turn interaction within a lightweight LLM. Instead of adopting a training-free workflow design, TableMind learns to internalize planning, action, and reflection through a principled two-stage training strategy. To bootstrap structured table reasoning capabilities, we construct and filter high-quality reasoning data for the supervised fine-tuning (SFT) stage. To enable precise code generation, we introduce a designed multi-perspective reward scheme and a novel optimization objective in the reinforcement learning (RL) stage. Extensive experiments on diverse benchmarks demonstrate that TableMind consistently outperforms previous baselines, validating the effectiveness of training autonomous agents to improve overall performance.
Homophily, the tendency of nodes from the same class to connect, is a fundamental property of real-world graphs, underpinning structural and semantic patterns in domains such as citation networks and social networks. Existing methods exploit homophily through designing homophily-aware GNN architectures or graph structure learning strategies, yet they primarily focus on GNN learning with training graphs. However, in real-world scenarios, test graphs often suffer from data quality issues and distribution shifts, such as domain shifts across users from different regions in social networks and temporal evolution shifts in citation network graphs collected over varying time periods. These factors significantly compromise the pre-trained model's robustness, resulting in degraded test-time performance. With empirical observations and theoretical analysis, we reveal that transforming the test graph structure by increasing homophily in homophilic graphs or decreasing it in heterophilic graphs can significantly improve the robustness and performance of pre-trained GNNs on node classifications, without requiring model training or update. Motivated by these insights, a novel test-time graph structural transformation method grounded in homophily, named GrapHoST, is proposed. Specifically, a homophily predictor is developed to discriminate test edges, facilitating adaptive test-time graph structural transformation by the confidence of predicted homophily scores. Extensive experiments on nine benchmark datasets under a range of test-time data quality issues demonstrate that GrapHoST consistently achieves state-of-the-art performance, with improvements of up to 10.92%. Our code has been released at https://github.com/YanJiangJerry/GrapHoST.
Graph Neural Networks (GNNs) have achieved notable success in a wide range of applications. However, their vulnerability to adversarial attacks, particularly graph injection attacks (GIAs), raises serious concerns for their deployment in security-sensitive domains. Existing GIA methods, despite their demonstrated effectiveness, face several inherent limitations. They typically require training surrogate models to approximate the victim model's behavior, which may lead to performance degradation when the surrogate mismatches the target model. Furthermore, the discrete nature of graph data poses challenges for generating effective adversarial features, often resulting in suboptimal solutions. Most critically, these methods show markedly reduced effectiveness when deployed against defended GNN models, limiting real-world applicability. To address these challenges, we introduce SoftHist, a novel gradient-free reinforcement learning framework for black-box graph injection attacks. Our approach incorporates a softened embedding mechanism to avoid suboptimal feature generation, ensuring stable and stealthy node injection. Moreover, we design a topology-aware edge sampler and a defense-aware policy learner with adaptive history reuse optimized for misclassification maximization. These innovations collectively balance attack effectiveness, stealthiness, and robustness against defensive measures. Extensive experiments on eight benchmark datasets demonstrate SoftHist's significant advantages in key scenarios: (1) On discrete-feature datasets like AMComputer, the misclassification rate is 10.34%~38.09% higher than baseline methods; (2) Against defensive models such as RGCN, it maintains 98.23% success rate, surpassing state-of-the-art methods by 12.36%.
Conversational recommender systems (CRSs) aim to provide personalized item recommendations along with explanations based on the conversations with users. While advancements in language models (LMs) have facilitated CRSs, limitations remain when LMs lack sufficient knowledge about item features that are essential for accurate recommendations and appropriate explanations. To alleviate this issue, retrieval-augmented language models (RALMs) have been introduced; however, they introduce a new challenge: the inclusion of less-relevant knowledge in retrieved passages. To address this limitation, we propose a novel CRS framework, MOCHA, which enhances RALMs through a multi-stage item/feature selection with Chain-of-Thought (CoT) reasoning. Specifically, MOCHA systematically identifies relevant knowledge by first selecting the item to recommend and then selecting its features to explain; each selection is performed via CoT reasoning. Experimental results on two public CRS datasets demonstrate that MOCHA significantly improves the recommendation accuracy, and provides informative and factually-correct explanations for the recommended items.
Generative recommendation plays a crucial role in personalized systems, predicting users' future interactions from their historical behavior sequences. A critical yet underexplored factor in training these models is data augmentation, the process of constructing training data from user interaction histories. By shaping the training distribution, data augmentation directly and often substantially affects model generalization and performance. Nevertheless, in much of the existing work, this process is simplified, applied inconsistently, or treated as a minor design choice, without a systematic and principled understanding of its effects.
Motivated by our empirical finding that different augmentation strategies can yield large performance disparities, we conduct an in-depth analysis of how they reshape training distributions and influence alignment with future targets and generalization to unseen inputs. To systematize this design space, we propose GenPAS, a generalized and principled framework that models augmentation as a stochastic sampling process over input–target pairs with three bias-controlled steps: sequence sampling, target sampling, and input sampling. This formulation unifies widely used strategies as special cases and enables flexible control of the resulting training distribution. Our extensive experiments on benchmark and industrial datasets demonstrate that GenPASyields superior accuracy, data efficiency, and parameter efficiency compared to existing strategies, providing practical guidance for principled training data construction in generative recommendation. Our code is available at https://github.com/snap-research/GenPAS.
Social media widely circulates harmful and conflict-laden narratives, and internet memes are a key multimodal vehicle for such content. We present RoMQD, a multimodal dataset purpose-built for distractor-aware meme interpretation, and MerFT (Meme Exploration via Multimodal Retrieval-Augmented Fine-tuning), a training framework that integrates images, captions, and associated documents within RAG pipelines. MerFT couples citation-aware chain-of-thought with a document-aligned loss to ground answers in oracle evidence while discounting semantically similar but misleading distractors. We evaluate MerFT under multiple input configurations (Base, Caption, Both) while systematically varying distractor frequency. The model shows graceful degradation as noise increases, with Both (image+caption) inputs yielding the most reliable behavior. On RoMQD, MerFT improves over RAG baselines (e.g., +7.7 F1 with Qwen2.5-VL) and delivers larger gains on categories requiring nuanced cultural grounding, such as satire/irony and image–text integration. A clustering-based strategy for constructing challenging distractor pools further enhances robustness, and MerFT remains complementary to modern rerankers. These results demonstrate the feasibility of retrieval-robust multimodal reasoning for meme-based socio-cultural conflict analysis and provide practical guidance for building dependable content analysis systems for policy, communication, and socio-political monitoring. Our code is available at https://github.com/dlwlsrnjs/MerFT/. and we release our dataset at https://drive.google.com/drive/folders/1L_QfQuH11sDGde9weFXiSCL_AM5mV2d7.
Understanding and continuously refining multimodal molecular knowledge is crucial for advancing biomedicine, chemistry, and materials science. Molecule language models (MoLMs) have become powerful tools in these domains, integrating structural representations (e.g., SMILES strings, molecular graphs) with rich contextual descriptions (e.g., physicochemical properties, biomedical applications). However, MoLMs can encode and propagate inaccuracies due to outdated web-mined training corpora or malicious manipulation, jeopardizing downstream discovery pipelines. While knowledge editing has been explored for general-domain AI, its application to MoLMs remains uncharted, presenting unique challenges due to the multifaceted and interdependent nature of molecular knowledge. In this paper, we take the first step toward MoLM editing for two critical tasks: molecule-to-caption generation and caption-to-molecule generation. To address molecule-specific challenges, we propose MolEdit, a powerful framework that enables targeted modifications while preserving unrelated molecular knowledge. MolEdit combines a Multi-Expert Knowledge Adapter that routes edits to specialized experts for different molecular facets with an Expertise-Aware Editing Switcher that activates the adapters only when input closely matches the stored edits across all expertise, minimizing interference with unrelated knowledge. To systematically evaluate editing performance, we introduce MEBench, a comprehensive benchmark assessing multiple dimensions, including Reliability (accuracy of the editing), Locality (preservation of irrelevant knowledge), and Generality (robustness to reformed queries). Across extensive experiments on two popular MoLM backbones, MolEdit delivers up to 18.8 % higher Reliability and 12.0 % better Locality than state-of-the-art editing baselines while maintaining efficiency. Our findings chart a clear path toward safer, continuously updatable scientific foundation models. The code is available at: https://github.com/LzyFischer/MolEdit.
One of many impediments to applying graph neural networks (GNNs) in processing large-volume real-world graph-structured data is that it disapproves of a centralized training scheme which involves gathering data belonging to different organizations due to privacy concerns. As a distributed data processing scheme, federated graph learning (FGL) enables learning GNN models collaboratively without sharing participants' private data. Though theoretically feasible, a core challenge in FGL systems is the variation of local training data distributions among clients, also known as the data heterogeneity problem. Most existing solutions suffer from two problems: (1) The typical optimizer based on empirical risk minimization tends to cause local models to fall into sharp valleys and weakens their generalization to out-of-distribution graph data. (2) The prevalent dimensional collapse in the learned representations of local graph data has an adverse impact on the classification capacity of the GNN model. To this end, we formulate a novel optimization objective that is aware of the sharpness (i.e., the curvature of the loss surface) of local GNN models. By minimizing the loss function and its sharpness simultaneously, we seek out model parameters in a flat region with uniformly low loss values, thus improving the generalization over heterogeneous data. By introducing a regularizer based on the correlation matrix of local representations, we relax the correlations of representations generated by individual local graph samples, so as to alleviate the dimensional collapse of the learned model. The proposed Sharpness-aware fEderated grAph Learning (SEAL) algorithm can enhance the classification accuracy and generalization ability of local GNN models in federated graph learning. Experimental studies on several graph classification benchmarks show that SEAL consistently outperforms SOTA FGL baselines and provides gains for more participants.
Knowledge-aware recommendation leverages rich item-related factual information in Knowledge Graphs (KGs) to enhance recommendation systems. However, most existing methods focus on developing complex models to extract information from a given KG. They essentially follow a model-centric paradigm, overlooking data quality problems. In practice, KG data exhibits two principal quality problems, namely the noisy knowledge problem and the incomplete knowledge problem, which severely impair the performance of downstream models. To address these problems, we adopt a data-centric paradigm to improve the quality of KG data. Inspired by diffusion models' superior denoising and generation ability by fitting true data distributions, we propose a novel spectral heterogeneous diffusion framework for knowledge-aware recommendation. This framework tailors a diffusion model to capture the recommendation-oriented heterogeneous distribution in the original KG and then converts the fitted distribution into a high-quality KG. Specifically, we design a spectral heterogeneous diffusion model that integrates recommendation prior knowledge to capture task-relevant distribution and aligns its diffusion process with the features of heterogeneous graphs to model heterogeneity. Furthermore, we propose a continuous-discrete mode adapter that transforms the learned continuous distribution into a high-quality discrete KG. The resulting KG is denoised and enriched with task-relevant triples, mitigating noisy and incomplete knowledge problems. Experiments show that our plug-and-play framework can be integrated with any knowledge-aware recommendation model and boost their performance by improving KG quality. The code and theoretical analyses are available at https://github.com/xiangmli/SHGD.
The blossoming of large language models (LLMs) has greatly shifted the paradigm of Sequential Recommender System (SRS). Numerous studies have attempted to integrate ID-based collaborative signals and text information for effectively capturing both ID semantics and text semantics to enhance LLM-based recommendation. However, existing fusion methods suffer from challenges like fusion noise and the semantic gap. To address these issues, we propose Hybrid Dual-Semantics Modeling for enhancing LLM-based Recommendation (HDRec), an effective hybrid fusion method based on a design of dual low-rank adaptation (LoRA). HDRec employs two LoRAs processes on a shared LLM decoder, with each process handling information from one of the two semantics. We further implement a dedicated fusion mechanism exclusively at the inference stage, allowing the robust textual representation to serve as the primary signal, which is adaptively enhanced by unique collaborative signals from ID semantics, ensuring stable and accurate final predictions. To mitigate gradient conflicts caused by the dual LoRA processes, we introduce the alternating training of dual low-rank adaptation strategy. This method effectively resolves gradient conflicts and enables successful optimization of HDRec. Extensive experiments show that HDRec outperforms existing non-LLM-based and LLM-based state-of-the-art methods. The implementation of HDRec is anonymously available at https://github.com/KDEGroup/HDRec.
In the era of information explosion, Recommender Systems (RS) are essential for alleviating information overload and providing personalized user experiences. Recent advances in diffusion-based generative recommenders have shown promise in capturing the dynamic nature of user preferences. These approaches explore a broader range of user interests by progressively perturbing the distribution of user-item interactions and recovering potential preferences from noise, enabling nuanced behavioral understanding. However, existing diffusion-based approaches predominantly operate in continuous space through encoded graph-based historical interactions, which may compromise potential information loss and suffer from computational inefficiency. As such, we propose CDRec, a novel Continuous-time Discrete-space Diffusion Recommendation framework, which models user behavior patterns through discrete diffusion on historical interactions over continuous time. The discrete diffusion algorithm operates via discrete element operations (e.g., masking) while incorporating domain knowledge through transition matrices, producing more meaningful diffusion trajectories. Furthermore, the continuous-time formulation enables flexible adaptive sampling. To better adapt discrete diffusion models to recommendations, CDRec introduces: (1) a novel popularity-aware noise schedule that generates semantically meaningful diffusion trajectories, and (2) an efficient training framework combining consistency parameterization for fast sampling and a contrastive learning objective guided by multi-hop collaborative signals for personalized recommendation. Extensive experiments on real-world datasets demonstrate CDRec's superior performance in both recommendation accuracy and computational efficiency.
Traditional conversion rate prediction models suffer from suboptimal performance due to sharing the same network parameters for all instances, failing to capture heterogeneous underlying distributions across instances. Recent parameter personalized network based models address this by grouping instances and adjust parameters for each group. However, existing parameter personalization methods face challenges: (1) taking prior information features as grouping condition for model parameter personalization leads to suboptimal performance due to human's limited understanding of data distribution, or (2) using all the features for both parameter generation module and deep neural network (DNN) of conversion prediction tasks causes gradient conflicts during backpropagation. A better approach is to automatically select features as grouping condition based on data distribution through iterative learning. Therefore, we propose Automatic Feature Partitioning-Based Parameter Personalized Network (APPNet), which consists of two components: Automatic Feature Partitioning (AFP) and Parameter Personalized Network (PPNet). The AFP module automatically partitions all the features into two parts: one part for DNN of conversion prediction tasks, and the other part for PPNet module to generate weights to adjust DNN parameters of conversion prediction tasks. Specifically, we implemented two versions of AFP: feature-wise AFP and bit-wise AFP. The feature-wise AFP partitions features at the feature field granularity, while the bit-wise AFP partitions each bit of the feature embeddings. The PPNet module adjusts model parameters of conversion prediction task for each group of instances by applying element-wise multiplication to the DNN parameters of conversion tasks. Extensive offline experiments demonstrate APPNet outperforms previous parameter personalized models. Furthermore, online A/B testing in production system achieved a 1.09% improvement on conversion rate, validating its practical effectiveness.
Generative retrieval (GR) has gained significant attention as an effective paradigm that integrates the capabilities of large language models (LLMs). It generally consists of two stages: constructing discrete semantic identifiers (IDs) for documents and retrieving documents by autoregressively generating ID tokens. The core challenge in GR is how to construct document IDs (DocIDS) with strong representational power. Good IDs should exhibit two key properties: similar documents should have more similar IDs, and each document should maintain a distinct and unique ID. However, most existing methods ignore native category information, which is common and critical in E-commerce. Therefore, we propose a novel ID learning method, CAtegory-Tree Integrated Document IDentifier (CAT-ID2), incorporating prior category information into the semantic IDs. CAT-ID2 includes three key modules: a Hierarchical Class Constraint Loss to integrate category information layer by layer during quantization, a Cluster Scale Constraint Loss for uniform ID token distribution, and a Dispersion Loss to improve the distinction of reconstructed documents. These components enable CAT-ID2 to generate IDs that make similar documents more alike while preserving the uniqueness of different documents' representations. Extensive offline and online experiments confirm the effectiveness of our method, with online A/B tests showing a (0.33%) increase in average orders per thousand users for ambiguous intent queries and (0.24%) for long-tail queries. The source code is available at https://github.com/lxbdtt/CAT-ID2.
The increasing use of deep neural networks (DNNs) in high-stakes domains such as hiring, healthcare, and finance has heightened concerns about algorithmic fairness. Because training data can encode historical and societal biases, learned models may exhibit disparate outcomes for underrepresented groups. Prior work is largely model-centric, improving fairness via specialized loss functions or architectural modifications, which can introduce additional training overhead and hinder deployment in modular or rapidly evolving pipelines. We instead study a data-centric alternative: constructing a fair training dataset that promotes equitable behavior without changing the model architecture. We propose FairData, which synthesizes a fair dataset by optimizing a gradient-matching objective that aligns the training dynamics of a randomly initialized model on the synthetic data with those on the original data, while explicitly regularizing for group fairness. The resulting dataset is model-agnostic, lightweight, and remains in the original input space, enabling straightforward reuse across downstream models. Experiments on four benchmark datasets show that FairData consistently reduces group disparities across diverse architectures while maintaining competitive predictive performance, suggesting fairness-aware dataset optimization as a practical complement to model-specific fairness techniques.
Incorporating item-side information, such as category and brand, into sequential recommendation is a well-established and effective approach for improving performance. However, despite significant advancements, current models are generally limited by three key challenges: they often overlook the fine-grained temporal dynamics inherent in timestamps, exhibit vulnerability to noise in user interaction sequences, and rely on computationally expensive fusion architectures. To systematically address these challenges, we propose the Time-Aware Adaptive Side Information Fusion (TASIF ) framework. TASIF integrates three synergistic components: (1) a simple, plug-and-play time span partitioning mechanism to capture global temporal patterns; (2) an adaptive frequency filter that leverages a learnable gate to denoise feature sequences adaptively, thereby providing higher-quality inputs for subsequent fusion modules; and (3) an efficient adaptive side information fusion layer, this layer employs a ''guide-not-mix'' architecture, where attributes guide the attention mechanism without being mixed into the content-representing item embeddings, ensuring deep interaction while ensuring computational efficiency. Extensive experiments on four public datasets demonstrate that TASIF significantly outperforms state-of-the-art baselines while maintaining excellent efficiency in training. Our source code is available at https://github.com/jluo00/TASIF.
In recent years, large language models (LLMs) have emerged as powerful tools for link prediction in knowledge graphs (KGs) due to their strong capabilities in understanding and generation. However, many LLM-based methods still heavily rely on textual descriptions of KGs, limiting their ability to capture structural information and to model complex relational patterns. Although some methods integrate structural embeddings into LLMs, their ability to harness the complementary strengths of both modalities and dynamically prioritize candidate entities based on query context remains limited. In this paper, we propose ST-KGLP, a novel framework that improves link prediction by aligning structural knowledge with textual knowledge and employing query-aware adaptive weighting for candidate selection. Specifically, our proposed ST-KGLP employs a knowledge aligner to bridge the information gap between structural and textual knowledge, and then utilizes a query-aware adaptive weighting strategy that dynamically computes attention weights between query representations and candidate entities, enabling contextually relevant candidate re-ranking for more accurate prediction. Extensive experiments on various datasets show that our ST-KGLP outperforms state-of-the-art approaches, achieving average improvements of 3.81%, 11.52%, 2.22%, and 1.55% across four evaluation metrics. Our code and datasets are available at https://github.com/shijielaw/ST-KGLP.
Contrastive learning has emerged as a promising paradigm by inherently generating self-supervised signals and uncovering latent patterns from interaction data to enhance recommendation performance. However, most current graph contrastive learning-based recommendation methods rely on random augmentation strategies,which may disrupt graph structural information and compromise model robustness. In addition, long-tail items suffer from insufficient exposure, making it difficult to learn high-quality feature rep- resentations, ultimately degrading recommendation effectiveness.To overcome these limitations, this paper presents DDGCL, a dual diffusion-based graph contrastive learning method. A contrastive view optimization module is designed, which employs singular value decomposition to perform low-rank approximation on the interaction graph, efficiently extracting global structural features while accelerating the diffusion process. The diffusion model then performs noise addition and denoising on this basis to generate contrastive views that preserve graph structural information. In addition, a method for embedding augmentation designed for long-tail items is proposed. This module utilizes a conditional diffusion model, where global graph information serves as conditional con- straints to guide the denoising process of long-tail items, thereby improving their representation learning. A comprehensive evaluation on multiple public benchmark datasets demonstrates that DDGCL significantly outperforms various baseline models, validating the effectiveness of the proposed approach.
Learning transferable representations from unlabeled time series is crucial for improving performance in data-scarce classification. Existing self-supervised methods often operate at the point level and rely on unidirectional encoding, leading to low semantic density and a mismatch between pre-training and downstream optimization. In this paper, we propose TimeMAE, a self-supervised framework that reformulates masked modeling for time series via semantic unit elevation and decoupled representation learning. Instead of modeling individual time steps, TimeMAE segments time series into non-overlapping sub-series to form semantically enriched units, enabling more informative masked reconstruction while reducing computational cost. To address the representation discrepancy introduced by masking, we design a decoupled masked autoencoder that separately encodes visible and masked regions, avoiding artificial masked tokens in the main encoder. To guide pre-training, we introduce two complementary objectives: masked codeword classification, which discretizes sub-series semantics via a learned tokenizer and masked representation regression, which aligns continuous representations through a momentum-updated target encoder. Extensive experiments on five datasets demonstrate that TimeMAE outperforms competitive baselines, particularly in label-scarce scenarios and transfer learning scenarios. Our codes are publicly available at https://github.com/Mingyue-Cheng/TimeMAE.
Recent advances in large language models (LLMs) promise more effective information extraction for review-based recommender systems, yet current methods still (i) mine free-form reviews without scope control, producing redundant and noisy representations, (ii) lack principled metrics that link LLM hallucination to downstream effectiveness, and (iii) leave the cost–quality trade-off across model scales largely unexplored. We address these gaps with the Hyper-Adaptive Dual-Stage Semantic Framework (HADSF), a two-stage approach that first induces a compact, corpus-level aspect vocabulary via adaptive selection and then performs vocabulary-guided, explicitly constrained extraction of structured aspect-opinion triples. To assess the fidelity of the resulting representations, we introduce Aspect Drift Rate (ADR) and Opinion Fidelity Rate (OFR) and empirically uncover a nonmonotonic relationship between hallucination severity and rating prediction error. Experiments on approximately 3 million reviews across LLMs spanning 1.5B-70B parameters show that, when integrated into standard rating predictors, HADSF yields consistent reductions in prediction error and enables smaller models to achieve competitive performance in representative deployment scenarios. We release code, data pipelines, and metric implementations to support reproducible research on hallucination-aware, LLM-enhanced explainable recommendation. Code is available at https://github.com/niez233/HADSF.
Social and information networks may become polarized, leading to echo chambers and political gridlock. Accurately measuring this phenomenon is a critical challenge. Existing measures often conflate genuine structural division with random topological features, yielding misleadingly high polarization scores on random networks, and failing to distinguish real-world networks from randomized null models. We introduce DSP, a Diffusion-based Structural Polarization measure designed from first principles to correct for such biases. DSP removes the arbitrary concept of 'influencers' used by the popular Random Walk Controversy (RWC) score, instead treating every node as a potential origin for a random walk. To validate our approach, we introduce a set of desirable properties for polarization measures, expressed through reference topologies with known structural properties. We show that DSP satisfies these desiderata, being near-zero for non-polarized structures such as cliques and random networks, while correctly capturing the expected polarization of reference topologies such as monochromatic-splittable networks. Our method applied to U.S. Congress datasets uncovers trends of increasing polarization in recent years. By integrating a null model into its core definition, DSP provides a reliable and interpretable diagnostic tool, highlighting the necessity of statistically-grounded metrics to analyze societal fragmentation.
In neural network-based recommendation models, numerous studies have adopted inverse propensity scoring (IPS) as a principal approach, namely, minimizing unbiased risk. IPS is commonly employed to address selection bias inherent in user-item interactions and to improve model generalization performance. In this study, we critically examine whether minimizing the unbiased risk obtained via IPS genuinely contributes to improvements in recommendation performance. To this end, we first analyze the mechanism through which IPS influences model behavior, particularly focusing on its role beyond bias correction. Under the data generation assumptions commonly adopted in prior work, we demonstrate that the effectiveness of IPS is more closely associated with enriching embedding space by multi-task learning, rather than the unbiasedness of the risk function. This observation suggests that IPS may serve as a regularization mechanism that indirectly enhances model expressiveness. Furthermore, our experimental results reveal that the reweighting process of IPS can, in certain scenarios, lead to a degradation in recommendation performance. These findings highlight the limitations of IPS when used solely for unbiased risk minimization. In light of this, we show that multi-task learning proves beneficial in effectively managing this complexity, thereby offering a more robust approach to improving recommendation quality.
Retrieval-augmented generation (RAG) enhances LLMs with external knowledge, yet generation remains vulnerable to retrieval-induced noise and uncertain placement of relevant chunks, often causing hallucinations. We present Ext2Gen, an extract-then-generate framework that strengthens LLMs via joint evidence selection and answer generation, dynamically identifying query-relevant content while suppressing noise, thereby removing the need for any independent pre-generation compression module. Optimized through preference alignment with well-curated pairwise feedback, Ext2Gen produces accurate and faithful answers even under noisy or imprecise retrieval. Experiments demonstrate that it substantially enhances the robustness of the generation backbone and yields greater performance gains than methods relying on independent compression models, (e.g., Recomp, CompAct, EXIT). It further benefits from improved retrieval techniques such as query rewriting, underscoring that generation-side enhancements address limitations that retrieval alone cannot overcome.
The value and copyright of training data are crucial in the artificial intelligence industry. Service platforms should protect data providers' legitimate rights and fairly reward them for their contributions. Shapley value, a potent tool for evaluating contributions, outperforms other methods in theory, but its computational overhead escalates exponentially with the number of data providers. Recent studies on Shapley values have proposed various approximation algorithms to address the computational complexity issues inherent in exact calculations. However, they need to retrain for each test sample, leading to intolerable costs. We propose Fast-DataShapley, a one-pass training framework that leverages the weighted least squares characterization of the Shapley value to train a reusable explainer model with real-time reasoning speed. Given new test samples, no retraining is required to calculate the Shapley values of the training data. Additionally, we propose three methods with theoretical guarantees to reduce training overhead from two aspects: the approximate calculation of the utility function and the reduction of the sample space complexity. We analyze time complexity to show the efficiency of our methods. The experimental evaluations on various image datasets demonstrate superior performance and efficiency compared to baselines. Specifically, the performance is improved to more than 2×, and the explainer's training speed can be increased by two orders of magnitude.
With the advancement of deep learning, DNNs have been widely deployed across diverse domains. Model-as-a-Service (MaaS) platforms allow enterprises to commercialize well-trained models, which are built through extensive data collection and substantial computational investment. Consequently, protecting these models from unauthorized use and intellectual property (IP) theft has become critical. While watermarking has emerged as a prominent IP protection technique, most existing approaches target centralized settings, leaving federated learning (FL) scenarios largely underexplored. To bridge this gap, we propose Federated Watermarking with Distributed Verification (FWDV), a novel framework tailored for FL. FWDV enables each client to independently verify watermark ownership and jointly defend the model against erasure attempts. To our knowledge, this is the first work to achieve both distributed verification and robustness against a broad spectrum of attacks. Extensive experiments demonstrate that FWDV embeds watermarks with minimal impact on model utility and resists removal through fine-tuning, pruning, and distillation.
User cold-start remains a long-standing challenge to recommendation systems due to the scarce interactions of new users. By enabling user-adaptive model customization, meta-augmented cold-start recommenders have shown significant capacity for fast adaptation to few-shot interaction records. Despite demonstrating promising generalization for new users, non-stationarity of user preference dynamics necessitates continuous model adaptation of cold-start recommenders under dynamic streaming environments. Our work focuses on User-Incremental Learning (User-IL) paradigm, specifically investigating how to effectively assimilate novel preferences of new users arriving in continual periods while preserving common preference patterns extracted for cold-start recommendation. By comparison with visual class-incremental learning, we highlight the essential necessity of forward compatibility in user-incremental learning. To this end, we propose an adaptive incremental learning approach for meta-augmented cold-start recommenders, namely IncMCR, and improve meta-knowledge evolution with dual-view forward compatibility adaptation. To combat meta-knowledge erosion, we design complementary-oriented meta-task replay with prototype-aware exemplar selection. Then, to strengthen the sensitivity of preference assimilation at each incremental retraining period, we design a novelty-sensitive meta-updater by modifying the vanilla meta-learning with adaptive meta-weighting. Extensive experiments conducted on three real-world datasets demonstrate the effectiveness of IncMCR and also show its robustness to both short-term and long-term user-increments.
Link-analysis algorithms, such as PageRank, are instrumental in understanding the structural dynamics of networks by evaluating the importance of individual vertices based on their connectivity. Recently, with the rising importance of responsible AI, the question of fairness in link-analysis algorithms has gained traction.
In this paper, we present a new approach for incorporating group fairness into the PageRank algorithm by reweighting the transition probabilities in the underlying transition matrix. We formulate the problem of achieving fair PageRank by seeking to minimize the fairness loss, which is the difference between the original group-wise PageRank distribution and a target PageRank distribution. We further define a group-adapted fairness notion, which accounts for group homophily by considering random walks with group-biased restart for each group. Since the fairness loss is non-convex, we propose an efficient projected gradient-descent method for computing locally-optimal edge weights. Unlike earlier approaches, we do not recommend adding new edges to the network, nor do we adjust the restart vector. Instead, we keep the topology of the underlying network unchanged and only modify the relative importance of existing edges. We empirically compare our approach with state-of-the-art baselines and demonstrate the efficacy of our method, where very small changes in the transition matrix lead to significant improvement in the fairness of the PageRank algorithm.
The faithful and interpretable opinion summarization aims to generate a summary that captures the diverse opinions expressed in a document set while providing explanations for the divergences between these opinions. In this paper, we propose an evidence-guided framework to enhance opinion coverage and provide divergence explanations. It first generates the majority opinion as an initial summary and partitions the source documents into multiple evidence sets based on their relevance to the majority opinion. Then, a summary extension strategy is employed to expand the initial summary by incorporating different opinions from these sets. The framework also employs a submodular optimization algorithm to select evidence from different evidence sets in order to reflect the divergences between opinions. Experiments on two benchmark datasets demonstrate that our method outperforms multiple baselines in terms of both the lexical and semantic consistency with reference summaries, while having low computational overhead.Ablation studies confirm that both the document partition and summary extension mechanisms contribute to the model performance.The LLM-based and human evaluation results also show that our method can identify more comprehensive evidence that better captures opinion divergences.
Dynamic knowledge graphs exhibit non-stationary growth characterized by three key features: heterogeneous structural connectivity, variable growth frequency and magnitude, and scale-semantic imbalance. Existing lifelong knowledge graph embedding methods, based on the assumption of stationary growth, hardly consider all these features, resulting in unsatisfactory embedding performance. Specifically, heterogeneous connectivity leads to inefficient knowledge transfer and embedding initialization bias; frequent, large growths cause catastrophic forgetting and embedding space drift; fixed strategies fail to prioritize high-value knowledge, thereby degrading long-term retention. To tackle these issues, we propose DyGM (Dynamic Global Memory Framework), a novel framework integrating three core components. The connectivity-aware structuring module leverages topological metrics, to enable reliable structural ordering and knowledge transfer. The cross-snapshot relay module preserves long-term knowledge through instance-level replay and feature-level distillation. The dynamic weight balancing mechanism adaptively adjusts the importance of historical and new knowledge, to prioritize core information during training and ensure stable retention of core structural-semantic dependencies. We construct two non-stationary KG datasets and conduct extensive experiments on both stationary and non-stationary datasets. Experimental results demonstrate that DyGM achieves competitive performance on stationary datasets and significantly outperforms on non-stationary datasets.
Major promotional events such as Black Friday and 618 Shopping Day cause drastic, heterogeneous shifts in user and advertiser behavior, posing persistent challenges for conversion rate (CVR) models trained on daily data. Existing methods often lack the flexibility to capture this periodic variability, resulting in poor modeling of diverse behavioral patterns. To address these challenges, we propose TemporalExpertNet(TEN), a cross-temporal transfer learning framework for industrial-scale CVR prediction during promotion cycles. TEN decomposes the model into a stable representation encoder and a promotion-sensitive expert, enabling reusable temporal knowledge transfer. Specifically, we propose BridgeNet to address the mismatch between historical knowledge and current features through temporal representation alignment. We further introduce TemporalExpertGate (TEG) to perform sample-aware expert fusion, enabling fine-grained prediction adjustment and adaptive knowledge reuse across promotion periods. By using a two-stage training strategy, TEN achieves stable alignment and adaptive expert fusion for robust prediction under shifting promotional distributions. TEN was deployed on a large-scale short-video ads platform during the 618 preheating phase, improving conversion rate by 7.52% and platform RPM by 4.27% with only 0.23% model size and 1.8% latency overhead. It was therefore fully launched to all traffic on 618 Shopping Day, bringing substantial commercial gains.
Local life service is a vital scenario in Kuaishou App, in which we recommend videos with stores' location information. Thus, recommendation in our scenario is challenging because we should take into account user's interest and real-time location at the same time. In complex scenarios, end-to-end generative recommendation has emerged as a new paradigm, such as OneRec in the short video scenario, OneSug in the search scenario, and EGA in the advertising scenario. However, in local life service, an end-to-end generative recommendation model has not yet been developed as there are some key challenges to be solved. The first challenge is how to make full use of geographic information. The second challenge is how to balance multiple objectives, including user interests, the distance between user and stores, and some other business objectives. To address the challenges, we propose OneLoc. Specifically, we leverage geographic information from different perspectives: (1) geo-aware semantic ID incorporates both video and geographic information for tokenization, (2) geo-aware attention injects video location similarity and user's real-time location in the encoder, and (3) neighbor-aware prompt captures rich context information surrounding users for generation. To balance multiple objectives, we use reinforcement learning and propose a geographic reward and a GMV reward. With the above design, OneLoc achieves outstanding offline and online performance. In fact, OneLoc has been deployed in local life service of Kuaishou App and achieved 21.016% and 17.891% improvements in terms of gross merchandise value (GMV) and orders numbers.
Ensuring consistent persona in interactive AI systems presents a significant challenge, especially in diverse application scenarios ranging from virtual assistants to customer service bots. Such capability is often constrained by the system's understanding of direct and explicit persona conflicts. Traditional approaches primarily focus on detecting discrepancies between machine responses and its predefined profile, or the contextual inconsistencies between the responses at the semantic level rather than the persona level. Due to the lack of a comprehensive persona-specific Commonsense Knowledge Graph, some indirect and implicit persona inconsistencies between machine responses can hardly be identified. In this paper, we build the first persona commonsense knowledge graph (PersonaKG), based on which we then construct a large-scale persona consistency dialogue dataset (PersonaCOM) containing both explicit and implicit persona conflicts between machine responses. With the guidance of the persona commonsense knowledge, we propose a Recognize-Rewrite framework (R2) which first recognizes the responses that are inconsistent in persona with the previous responses, and then rewrites them into consistent ones. The empirical study demonstrates that utilizing R2 method on PersonaCOM with PersonaKG results in a significant improvement of 12.20% in automatic metrics and 10.09% in manual evaluation compared to not using the R2 method and PersonaKG.
On-device recommendation is critical for a number of real-world applications, especially in scenarios that have agreements on execution latency, user privacy, and robust functionality when internet connectivity is unstable or even impossible. While large language models (LLMs) can now provide exceptional capabilities that model user behavior for sequential recommendation tasks, their substantial memory footprint and computational overhead make the deployment on resource-constrained devices a high risk proposition. In this paper, we propose OD-LLM, the first task-adaptive compression framework explicitly designed to provide efficient and accurate on-device deployment of LLMs for sequential recommendation tasks. OD-LLM uniquely integrates two complementary compression strategies: a low-rank structural compression algorithm which uses Singular Value Decomposition (SVD) to significantly reduce parameter redundancy in the model, and a novel tokenization normalization technique that better complements the low-rank decomposition process being used. Additionally, to minimize any potential performance degradation when using higher compression ratios, a novel progressive alignment algorithm is used to iteratively refine the parameters required layerwise in the target model. Empirical evaluations conducted on sequential recommendation benchmarks show that OD-LLM exhibits no loss in effectiveness when compared to the original recommendation model, when the deployed model size is halved. These promising results demonstrate the efficacy and scalability of OD-LLM, making this novel solution a practical alternative for real-time, on-device solutions wishing to replace expensive, remotely executed LLMs.
Hypergraph-based distillation methods have been proposed to mitigate the high computational cost of Hypergraph Neural Networks (HGNNs) in modeling high-order relationships. However, most existing methods use static and uniform distillation strategies for all nodes and hyperedges, ignoring their individual characteristics. In addition, they neglect the student model's capability to independently extract useful internal features. As a result, they are not effective in transferring higher-order structural knowledge from the teacher. To overcome these limitations, we propose ARCHER, an Adaptive and Reinforcement-Guided Contrastive HypER graph Distillation framework that enables a lightweight MLP student model to outperform its HGNN teacher model. First, we design an adaptive strategy that leverages node- and hyperedge-level confidence to mediate error guidance from the teacher model. Second, we introduce a contrastive learning module that guides the student to learn from both the teacher's outputs and its own internal representations, producing more expressive embeddings. Finally, we propose a multi-armed bandit-based reinforcement learning module that dynamically balances multiple loss objectives during training. Experiments on six benchmark datasets demonstrate that our method outperforms existing hypergraph distillation methods.
Recommender systems traditionally represent items using unique identifiers (ItemIDs), but this approach struggles with large, dynamic item corpora and sparse long-tail data, limiting scalability and generalization. Semantic IDs, derived from multimodal content such as text and images, offer a promising alternative by mapping items into a shared semantic space, enabling knowledge transfer and improving recommendations for new or rare items. However, existing methods face two key challenges: (1) balancing cross-modal synergy with modality-specific uniqueness, and (2) bridging the semantic-behavioral gap, where semantic representations may misalign with actual user preferences. To address these challenges, we propose Multimodal Mixture-of-Quantization (MMQ), a two-stage framework that trains a novel multimodal tokenizer. First, a shared-specific tokenizer leverages a multi-expert architecture with modality-specific and modality-shared experts, using orthogonal regularization to capture comprehensive multimodal information. Second, behavior-aware fine-tuning dynamically adapts semantic IDs to downstream recommendation objectives while preserving modality information through a multimodal reconstruction loss. Extensive offline experiments and online A/B tests demonstrate that MMQ effectively unifies multimodal synergy, specificity, and behavioral adaptation, providing a scalable and versatile solution for both generative retrieval and discriminative ranking tasks.
Multi-scenario multi-task recommendation (MSMTR) systems must address recommendation demands across diverse scenarios while simultaneously optimizing multiple objectives, such as click-through rate and conversion rate. Existing MSMTR models typically consist of four information units: scenario-shared, scenario-specific, task-shared, and task-specific networks. These units interact to generate four types of relationship information flows, directed from scenario-shared or scenario-specific networks to task-shared or task-specific networks. However, these models face two main limitations: 1) They often rely on complex architectures, such as mixture-of-experts (MoE) networks, which increase the complexity of information fusion, model size, and training cost. 2) They extract all available information flows without filtering out irrelevant or even harmful content, introducing potential noise. Regarding these challenges, we propose a lightweight Automated Information Flow Selection (AutoIFS) framework for MSMTR. To tackle the first issue, AutoIFS incorporates low-rank adaptation (LoRA) to decouple the four information units, enabling more flexible and efficient information fusion with minimal parameter overhead. To address the second issue, AutoIFS introduces an information flow selection network that automatically filters out invalid scenario-task information flows based on model performance feedback. It employs a simple yet effective pruning function to eliminate useless information flows, thereby enhancing the impact of key relationships and improving model performance. Finally, we evaluate AutoIFS and confirm its effectiveness through extensive experiments on two public benchmark datasets and an online A/B test.
Online A/B testing, the gold standard for evaluating new advertising policies, consumes substantial engineering resources and risks significant revenue loss from deploying underperforming variations. This motivates the use of Off-Policy Evaluation (OPE) for rapid, offline assessment. However, applying OPE to ad auctions is fundamentally more challenging than in domains like recommender systems, where stochastic policies are common. In online ad auctions, it is common for the highest-bidding ad to win the impression, resulting in a deterministic, winner-takes-all setting. This results in zero probability of exposure for non-winning ads, rendering standard OPE estimators inapplicable. We introduce the first principled framework for OPE in deterministic auctions by repurposing the bid landscape model to approximate the propensity score. This model allows us to derive robust approximate propensity scores, enabling the use of stable estimators like Self-Normalized Inverse Propensity Scoring (SNIPS) for counterfactual evaluation. We validate our approach on the AuctionNet simulation benchmark and against 2-weeks online A/B test from a large-scale industrial platform. Our method shows remarkable alignment with online results, achieving a 92% Mean Directional Accuracy (MDA) in CTR prediction, significantly outperforming the parametric baseline. MDA is the most critical metric for guiding deployment decisions, as it reflects the ability to correctly predict whether a new model will improve or harm performance. This work contributes the first practical and validated framework for reliable OPE in deterministic auction environments, offering an efficient alternative to costly and risky online experiments.
While political biases in Large Language Models (LLMs) have been consistently reported, the underlying question of whether such biases stem from systematic distributions in pretraining data remains insufficiently examined. Existing work has largely focused on post-hoc bias mitigation, with limited content-level analysis in large-scale web corpora. This study presents the first comprehensive statistical analysis of political bias in a large-scale web corpus, C4, a widely used web corpus that underpins major LLMs such as T5, PaLM, and LaMDA. We analyze 15 politically sensitive topics using a multi-perspective, persona-based LLM annotation framework, combined with rigorous statistical validation through equivalence testing and multiple testing correction. Our findings reveal clear patterns of left-leaning and supportive bias in C4 across two dimensions: political orientation and topic-specific stance. In particular, strong progressive leanings are observed in social and cultural domains (e.g., gender equality, LGBTQ rights, abortion rights), while economic topics exhibit relatively balanced distributions. To explore data-to-model bias transfer, we conduct experiments examining correlations between corpus-level bias and LLM responses, and evaluate the effect of fine-tuning on model behavior. The results provide evidence that political bias in web corpora can propagate into model outputs. These findings highlight the importance of dataset-level bias analysis in understanding and mitigating political bias in LLMs. We argue that responsible AI development must incorporate systematic data curation practices. All source code and scripts used in this study are publicly available at: https://anonymous.4open.science/r/C4_analysis-D7C2.
The attribution technique enhances the credibility of LLMs by adding citations to the generated sentences, enabling users to trace back to the original sources and verify the reliability of the output. However, existing instruction-tuned attributed LLMs often fail to properly interpret the contextual semantics of citation symbols (e.g., [i]) during text generation. This shortcoming arises from their insufficient awareness of the context information surrounding citation markers, which in turn leads to disjointed references and poor integration of retrieved knowledge into the generated content. To address this issue, we propose a novel Contextual-aware Citation generation framework (C²-Cite) that explicitly integrates the semantic relationships between citation markers and their referenced content. Specifically, a contextual citation alignment mechanism is adopted: it first encodes the retrieved document contexts into the symbol representation of citations, then aligns the marker numbers by decoding information from a citation router function. This mechanism enables the transformation of citation markers from generic placeholders into active knowledge pointers that link to the referenced source information. Experimental results on the ALCE benchmark across three datasets validate our framework C²-Cite++: it outperforms the SOTA baseline by an average of 5.8% in citation quality and 17.4% in response correctness. The implementation is publicly available at https://github.com/BAI-LAB/c2cite
Graph Transformers (GTs) have emerged as a promising graph learning tool, leveraging their all-pair connected property to effectively capture global information. To address the over-smoothing problem in deep GNNs, global attention was initially introduced, eliminating the necessity for using deep GNNs. However, through empirical and theoretical analysis, we verify that the introduced global attention exhibits severe over-smoothing, causing node representations to become indistinguishable due to its inherent low-pass filtering. This effect is even stronger than that observed in GNNs. To mitigate this, we propose PageRank Transformer (ParaFormer), which features a PageRank-enhanced attention module designed to mimic the behavior of deep Transformers. We theoretically and empirically demonstrate that ParaFormer mitigates over-smoothing by functioning as an adaptive-pass filter. Experiments show that ParaFormer achieves consistent performance improvements across both node classification and graph classification tasks on 11 datasets ranging from thousands to millions of nodes, validating its efficacy. The supplementary material, including code and appendix, can be found in https://github.com/chaohaoyuan/ParaFormer.
Effective design of traffic network layouts is crucial for large urban areas to effectively manage traffic congestion, emergency vehicle routing, and balanced economic development. However, the current decision-making procedure for traffic network layout design heavily relies on past experiences of domain experts, which is highly likely to be sub-optimal due to the large search space and heterogeneity across different cities. Thanks to the recent advances in high-fidelity traffic simulators, one can test various layouts in a simulated environment instead of the physical world and select the best-performing layout among them. However, large-scale traffic simulation is time-consuming, and the selected layout may not be directly deployable due to various regulations and real-world constraints. While several approaches have been proposed to solve the traffic network layout optimization problem via mathematical programming or metaheuristics such as genetic algorithms, these methods rely on unrealistic assumptions and struggle to find effective layouts in high-dimensional spaces. To this end, we propose a novel framework that leverages diffusion models to design effective traffic network layouts given a traffic pattern. Specifically, our framework consists of four stages: (1) Collect a dataset with the randomly generated layouts, (2) Train a discrete diffusion model and a prediction model, (3) Sample from the diffusion model with guidance from the prediction model, (4) Evaluate the sampled layouts and update the dataset. By repeating these processes, we accelerate the procedure of planning the traffic network layout design. We conduct extensive experiments on synthetic grid layouts and two real-world traffic scenarios, Manhattan and Monaco City, to validate the superiority of our method compared to other baselines. Our code is publicly available here.
The recently introduced task of Conversational Entity Retrieval from a Knowledge Graph (CER-KG) presents unique research challenges due to the complexity of the queries along with the necessity to consider KG structure and the context of an information-seeking dialog. This paper proposes a novel approach to CER-KG that first constructs a sub-graph around each candidate response entity, which includes its neighboring KG components, such as other entities, literals, categories and predicates, and then scores and ranks each candidate answer entity with Diverse Relevance signal Aggregation via Graph cONvolution (DRAGON), a novel learning-to-rank neural architecture for CER-KG. Unlike previous approaches to CER-KG, DRAGON directly takes a large number of fine-grained relevance signals as input and learns to effectively aggregate and transform those signals into the ranking scores of candidate response entities. In particular, a set of sparse and structured vectors of relevance features used as input to DRAGON measure lexical and semantic similarity between a query in the current turn or responses from the past turns of an information-seeking dialog and each node in the candidate response entity's sub-graph. DRAGON then propagates the relevance signals in feature vectors around the sub-graph using graph convolution layers and aggregates those signals into the candidate response entity ranking score with multi-head attention and fully-connected layers. This design enables DRAGON to attenuate noisy relevance signals from the local KG neighborhood during propagation and attend to the signals from the most important nodes in the candidate entity sub-graph. Our results demonstrate that DRAGON yields significant gains in retrieval accuracy over the previously proposed approach for CER-KG and performs comparably to a much larger fine-tuned cross-encoder architecture.
Machine unlearning offers a practical technical means for fulfilling users' requests to remove personally identifiable information (PII) under ''right to be forgotten'' regulations such as GDPR and COPPA. Traditionally, unlearning is performed with the removal of entire data samples (sample unlearning) or whole features across the dataset (feature unlearning). However, when the removal request targets only certain parts of the PII, such as specific objects within a sample, these traditional unlearning approaches fall short of meeting such finer-grained unlearning requirements. To address this gap, we propose a scene graph-based object unlearning framework. This framework utilizes scene graphs, rich in semantic representation, transparently translate unlearning requests into actionable steps. The result, is the preservation of the overall semantic integrity of the generated image, bar the unlearned object. Furthermore, we develop three distinct approaches for object unlearning, grounded in the mainstream unlearning techniques of fine-tuning and model redaction. For validation, we evaluate the unlearned object's fidelity in outputs under the tasks of image reconstruction and image synthesis. Our proposed framework demonstrates improved object unlearning outcomes, with the preservation of unrequested samples in contrast to sample and feature learning methods. This work addresses critical privacy issues by increasing the granularity of targeted machine unlearning through forgetting specific object-level details without sacrificing the utility of the whole data sample or dataset feature.
With the rapid advancement of e-commerce, exploring general representations rather than task-specific ones has attracted increasing research attention. For product understanding, although existing discriminative dual-flow architectures drive progress in this field, they inherently struggle to model the many-to-one alignment between multiple images and texts of products. Therefore, we argue that generative Multimodal Large Language Models (MLLMs) hold significant potential for improving product representation learning. Nevertheless, achieving this goal still remains non-trivial due to several key challenges: the lack of multimodal and aspect-aware modeling modules in typical LLMs; the common presence of background noise in product images; and the absence of a standard benchmark for evaluation. To address these issues, we propose the first generative MLLM-based model named MOON for product representation learning. Our method (1) employs a guided Mixture-of-Experts (MoE) module for targeted modeling of multimodal and aspect-specific product content; (2) effectively detects core semantic regions in product images to mitigate the distraction and interference caused by background noise; and (3) introduces the specialized negative sampling strategy to increase the difficulty and diversity of negative samples. In addition, we release a large-scale multimodal benchmark MBE for various product understanding tasks. Experimentally, our model demonstrates competitive zero-shot performance on both our benchmark and the public dataset, showcasing strong generalization across various downstream tasks, including cross-modal retrieval, product classification, and attribute prediction. Furthermore, the case study and visualization illustrate the effectiveness of MOON for product understanding.
Graph unlearning methods aim to efficiently remove the impact of sensitive data from trained GNNs without full retraining, assuming that deleted information cannot be recovered. In this work, we challenge this assumption by introducing the graph unlearning inversion attack: given only black-box access to an unlearned GNN and partial graph knowledge, can an adversary reconstruct the removed edges? We identify two key challenges: varying probability-similarity thresholds for unlearned versus retained edges, and the difficulty of locating unlearned edge endpoints, and address them with TrendAttack. First, we derive and exploit the confidence pitfall, a theoretical and empirical pattern showing that nodes adjacent to unlearned edges exhibit a large drop in model confidence. Second, we design an adaptive prediction mechanism that applies different similarity thresholds to unlearned and other membership edges. Our framework flexibly integrates existing membership inference techniques and extends them with trend features. Experiments on four real-world datasets demonstrate that TrendAttack significantly outperforms state-of-the-art GNN membership inference baselines, exposing a critical privacy vulnerability in current graph unlearning methods. For additional implementation details and technical proofs, please refer to our supplementary materials.
Large language models (LLMs) are increasingly deployed as intelligent agents capable of executing complex real-world tasks through external tool interactions, but effective tool selection remains challenging due to the inherent limitations of real-world training data. These datasets suffer from severe tool imbalance following long-tail distributions, data scarcity for specialized tools, logic conflicts between user queries and available tools, rapidly evolving toolsets, and the presence of subpar samples including partially correct and dirty examples. Existing supervised fine-tuning (SFT) approaches struggle with these multifaceted challenges as they require abundant high-quality data, treat all labeled examples as ground truth regardless of quality, and lack the flexibility to generalize beyond specific query-tool pairings seen during training. While reinforcement learning (RL) offers a promising alternative through outcome-based learning, vanilla approaches like Group Relative Policy Optimization (GRPO) suffer from training instability due to conflicting reward signals and inefficient learning from weak signals. To address these issues, we propose TOOL-CURE, a novel method with two key improvements to GRPO: Proficiency-Scaled Curriculum Learning (PSCL), which organizes training into a two-stage curriculum that builds foundational skills on easier samples before progressing to harder ones, and Online Policy Guarding via Sample Screening (OPGSS), which continuously assesses rollout quality and masks dirty samples to prevent noisy gradients from destabilizing policy updates. Our approach enables stable and efficient learning from heterogeneous real-world data, resulting in a robust tool-selection agent that demonstrates significant improvements in accuracy and generalization capability. Our code is available at: https://github.com/einnullnull/TOOL-CURE.git.
When fact-checking methods based on large language models (LLMs) use external evidence to validate claims, knowledge conflicts often arise. These conflicts typically stem from inconsistencies between the external evidence and LLMs' internal pre-existing knowledge. Such an inconsistency could lead LLMs to draw incorrect answers when validating claims, especially when they are overly confident in their internal incorrect knowledge. Previous works on LLM-based fact-checking have overlooked this issue. This paper, for the first time, proposes a framework (namely KnowFC) to navigate this issue. Our key insight is dividing and adaptively utilizing the knowledge that LLMs know and do not know, thereby avoiding conflicts while enhancing the correctness and efficiency of fact-checking. Specifically, in KnowFC, we propose an adaptive retrieval method, where we train an LLM using a reinforcement learning algorithm coupled with the Dunning-Kruger effect-inspired reward mechanism to identify its knowledge boundaries through confidence calibration, thereby realizing adaptive evidence retrieval. Besides, we propose a reliable and debiased fact verification method, where we organize and construct reasoning graphs using retrieved evidence to verify claims, followed by a causal intervention method using causal mediation analysis to mitigate internal knowledge interference. Experimental results on both FEVEROUS and AVeriTeC datasets show that our method outperforms baseline methods in terms of accuracy and F1 score, while also improving fact-checking efficiency.
Recent research aims to improve cross-lingual transfer learning in low-resource scenarios by optimizing the internal representations of multilingual models. However, previous methods typically rely on large-scale parallel corpora and overlook the subtle semantic differences between languages, leading to excessive clustering or collapse of local semantic units in the embedding space. This weakens the model's generalization ability on unseen data and its robustness under noisy conditions. To address this, we propose XLingLearn, a collaborative optimization framework that enhances cross-lingual transfer by expanding robust embedding regions and refining multilingual embedding space. Specifically, we design a distance-dispersion constraint strategy to push overly clustered semantic units apart to prevent semantic collapse, and introduce a direction-consistency constraint to prevent semantic bias. Additionally, we introduce an attention consistency module to stabilize the robust region. Finally, to mitigate the embedding space and context space mismatch caused by data augmentation, we introduce a debiasing-optimization regularization term, which enhances transfer efficiency and stability. Experimental results show that XLingLearn improves cross-lingual transfer performance across 17 target languages in the XNLI and PAWS-X tasks, enhancing generalization and robustness under low-resource and non-parallel conditions.
Code vulnerability detection aims to identify potential risks in code to prevent malicious attacks. Previous sequence-based models mostly focus on capturing local syntactic patterns, but they fail to model the non-linear structural dependencies and intricate context found in real-world code. Recent graph-based methods integrate structural information such as abstract syntax trees (AST) and control flow graphs (CFG). However, they often rely on shallow features of code and struggle to capture rich contextual semantic relations, which limits the effectiveness of detection. To address the issues above, we propose a novel framework, named Vul-GRT. We firstly design an unsupervised-based joint feature extraction method, which integrates Graph Attention Networks (GAT) and Restricted Boltzmann Machines (RBM) with joint training to capture structural information and further mine latent features from code graphs. Then, we leverage code language models to incorporate node-level semantic information, which is fused with structural features to obtain comprehensive node representations. Moreover, we introduce a hybrid loss function that integrates both vulnerability detection and path tracking objectives, which enables the model to focus on path context during vulnerability localization, enhancing the traceability of identified vulnerabilities. In the training step, we utilize Graph Transformer to model long-range dependencies and further capture critical relationships across the entire code graph. Finally, we conduct extensive experiments on two benchmark datasets, NVD and SARD, and experimental results demonstrate the effectiveness and robustness of our proposed framework compared with existing state-of-the-art methods.
Graph Neural Networks (GNNs) have shown remarkable capability in modeling complex relational data, yet their effectiveness is constrained by subtle but critical limitations in the message-passing paradigm. We identify a fundamental issue -- termed aggregation collapse -- arising from the aggregation step that combines representations of central and neighboring nodes with varying sizes. This bottleneck manifests in two forms of inherent information loss: (i) feature information loss, occurring when value distributions in learned representations become concentrated and indistinguishable after aggregation, and (ii) structure information loss, where variable node degrees are obscured by aggregation functions. To address these challenges, we propose complementary solutions: (i) nonlinear feature mapping and distribution re-scaler, which diversify input feature distributions prior to aggregation to preserve valuable feature-level details, and (ii) post-aggregation structural encoding, which retains essential structure information. Our methods are theoretically grounded, easy to implement, and broadly applicable to existing GNN models. Empirical evaluations demonstrate that our techniques successfully alleviate aggregation collapse, leading to improved performance and scalability in GNNs.
The swift proliferation of false information on social media presents significant risks to public confidence and the stability of democracy, which motivates the development of machine learning and deep learning-based fake news detectors. While these systems can effectively analyze news content and user interactions, they remain vulnerable to adversarial attacks. Prior research has focused mainly on modifying article text or retrieving generic user comments, leaving comment-based attack strategies underexplored. Existing comment-based attacks often rely on generating synthetic text that can be unrealistic or retrieving existing comments without strategic guidance that ignores feature importance. In this work, we introduce a novel attack surface that combines model interpretability with generative language models. Our approach uses SHAP (SHapley Additive exPlanations) to identify influential tokens driving fake or real classifications and prompts a large language model (LLM) to generate contextually credible human-like comments that utilize influential tokens. The generated comments are then appended to the article, evaluated against multiple state?of?the?art detectors (dEFEND, TextCNN, and RoBERTa), and compared against existing comment-based attacks such as MALCOM, CopyCat, and retrieval?based methods. Our XAI-guided LLM-based approach is competitive compared to existing generative and retrieval-based attack methods, with higher attack success rates while maintaining naturalness and contextual relevance.
We study active correlation clustering where pairwise similarities are not provided upfront and must be queried in a cost-efficient manner through active learning. Specifically, we focus on the cold-start scenario, where no true initial pairwise similarities are available for active learning. To address this challenge, we propose a coverage-aware method that encourages diversity early in the process. We demonstrate the effectiveness of our approach through several synthetic and real-world experiments.
Modeling user purchase behavior is a critical challenge in display advertising systems, necessary for real-time bidding. The difficulty arises from the sparsity of positive user events and the stochasticity of user actions, leading to severe class imbalance and irregular event timing. Predictive systems usually rely on hand-crafted ''counter'' features, overlooking the fine-grained temporal evolution of user intent. Meanwhile, current sequential models extract direct sequential signal, missing useful event-counting statistics. We enhance deep sequential models with self-supervised pretraining strategies for display advertising. Especially, we introduce Abacus, a novel approach of predicting the empirical frequency distribution of user events. We further propose a hybrid objective unifying Abacus with sequential learning objectives, combining stability of aggregated statistics with the sequence modeling sensitivity. Experiments on two real-world datasets show that Abacus pretraining outperforms existing methods accelerating downstream task convergence, while hybrid approach yields up to +6.1% AUC compared to the baselines.
Katz centrality is an important metric for measuring the relative importance of nodes in a graph, commonly used in social networks. This measure is prohibitively expensive to compute exactly, as it requires up to cubic-time complexity. Although several approximation methods exist, most fail to exploit parallelism, limiting their scalability to large graphs. To address this limitation, we propose two highly parallelizable Monte Carlo algorithms based on random walks named MC-Katz and MC-KatzP, which approximate the truncated Katz centrality and its original form, respectively. Empirical results on nine real-world datasets show that they achieve low mean relative error and high accuracy in ranking the most influential nodes in the networks, while delivering two to three orders of magnitude speedups over the baselines.
LLM-based relevance judgment generation has become a crucial approach in advancing evaluation methodologies in Information Retrieval (IR). It has progressed significantly, often showing high correlation with human judgments as reflected in LLMJudge leaderboards. However, existing methods for relevance judgments, rely heavily on sensitive prompting strategies, lacking standardized workflows for generating reliable labels. To fill this gap, we reintroduce our method, Task-aware Rubric-based Evaluation (TRUE), for relevance judgment generation. Originally developed for usefulness evaluation in search sessions, we extend TRUE to mitigate the gap in relevance judgment due to its demonstrated effectiveness and reproducible workflow. This framework leverages iterative data sampling and reasoning to evaluate relevance judgments across multiple factors including intent, coverage, specificity, accuracy and usefulness. In this paper, we evaluate TRUE on the TREC DL 2019, 2020 and LLMJudge datasets and our results show that TRUE achieves strong performance on the system-ranking LLM leaderboards. The primary focus of this work is to provide a reproducible framework for LLM-based relevance judgments, and we further analyze the effectiveness of TRUE across multiple dimensions.
Large Language Models (LLMs) have empowered AI agents with advanced capabilities for understanding, reasoning, and interacting across diverse tasks. The addition of memory further enhances them by enabling continuity across interactions, learning from past experiences, and improving the relevance of actions and responses over time; termed as memory-enhanced personalization. Although such personalization through memory offers clear benefits, it also introduces risks of bias. While several previous studies have highlighted bias in ML and LLMs, bias due to memory-enhanced personalized agents is largely unexplored. Using recruitment as an example use case, we simulate the behavior of a memory-enhanced personalized agent, and study whether and how bias is introduced and amplified in and across various stages of operation. Our experiments on agents using safety-trained LLMs reveal that bias is systematically introduced and reinforced through personalization, emphasizing the need for additional protective measures or agent guardrails in memory-enhanced LLM-based AI agents.
Recent advances in large language models (LLMs), such as GPT and Llama, have driven significant progress in natural language processing and diverse AI applications. In this paper, we explore how LLMs can enhance the construction of heterogeneous citation networks by integrating rich contextual information derived from LLMs. We propose a metadata-driven augmentation that generates concise factual descriptions for sparse fields in citation metadata, including keywords, venues, and author affiliations. These contexts are encoded with DeBERTa and integrated as node features in a knowledge-enriched heterogeneous network. Additionally, to mitigate LLM hallucinations, we employed Chain-of-Thought (CoT)-based prompting and evaluated the quality of the generated context. Experimental results demonstrate that our LLM-powered context augmentation improves author classification by 2.0%-4.5% and author clustering by 8.9%-18.1%, outperforming traditional feature engineering methods. The dataset and source code are available at https://github.com/inthwan/Metadata-Meets-LLMs.
In cross-domain recommendation, the cold-start recommendation problem often arises in scenarios where users have interacted with items in a source domain but not in a target domain. A key challenge in this cross-domain recommendation setting is how to effectively transfer user preferences from the source domain to the target domain. Most existing transfer learning models address this challenge but typically require extensive computations and incremental operations, which limit their scalability and efficiency. To overcome these limitations, we propose a novel similarity-based framework, called Similarity-based Transfer Graph Convolution Network (SimTranGCN), designed specifically for cold-start users. Our approach combines item-KNN, deep learning, and graph convolutional models such as LightGCN. SimTranGCN first constructs a similarity matrix across domains, and then uses this matrix to infer user preferences in the target domain based on their interactions in the source domain. Empirical experiments demonstrate that SimTranGCN is highly competitive against existing methods, achieving state-of-the-art performance on two paired domain transfer tasks.
Although generative recommenders demonstrate improved performance with longer sequences, their real-time deployment is hindered by substantial computational costs. To address this challenge, we propose a simple yet effective method for compressing long-term user histories by leveraging inherent item categorical features, thereby preserving user interests while enhancing efficiency. Experiments on two large-scale datasets demonstrate that, compared to the influential HSTU model, our approach achieves up to a 6× reduction in computational cost and up to 39% higher accuracy at comparable cost (i.e., similar sequence length). The source code will be available at https://github.com/Genemmender/CAUSE.
Large Language Models (LLMs) have been increasingly studied as neural knowledge bases for supporting knowledge-intensive applications. However, the structural organization of their knowledge remains unexplored. Inspired by cognitive neuroscience, such as semantic clustering and priming, where knowing one fact increases the likelihood of recalling related facts, we investigate an analogous knowledge homophily pattern in LLMs. To this end, we map LLM knowledge into a graph representation through knowledge checking at triplet/entity levels. After that, we analyze the knowledgeability relationship between an entity and its neighbors, discovering that LLMs tend to possess a similar level of knowledge about relevant entities positioned closer in the graph. Motivated by this homophily principle, we propose a Graph Neural Network (GNN) regression model to estimate entity-level knowledgeability scores for triplets by leveraging their neighborhood scores. The predicted knowledgeability enables us to prioritize checking less well-known triplets, thereby maximizing knowledge coverage under the same labeling budget. This not only improves the efficiency of active labeling for fine-tuning to inject knowledge into LLMs but also enhances multi-hop path retrieval in reasoning-intensive question answering. Our code and supplementary is available at https://github.com/utkarshxsahu/kgc.
Many critical web applications, from e-commerce price prediction to user engagement forecasting, rely on regression models trained on tabular data. These models often face a dual challenge: the inherent imbalance in continuous target values and, more critically, the unpredictable distribution shifts that occur when the model is deployed online. While data imbalance in classification is well-studied, its intersection with regression tasks in dynamic, real-world settings is underexplored. Existing methods for imbalanced regression often assume that the test data distribution is known and stable, an assumption that rarely holds true for live web systems and can lead to significant performance degradation. To address this gap, we propose a novel framework featuring two key innovations: (i) a Region-Aware Mixture of Experts that leverages a Gaussian Mixture Model to identify distinct data sub-populations. This allows us to synthesize targeted training data and train specialized experts, each tailored to a specific data region. (ii) a Test-Time Self-Supervised Expert Aggregation mechanism. This is the core of our adaptation strategy, dynamically adjusting the weights of each expert based on the features of incoming test instances. This enables our model to adapt on-the-fly to varying test distributions without costly retraining. We evaluated our method on four real-world tabular regression datasets: house pricing, bike sharing, and age prediction. These tasks are representative of real-world scenarios that inherently involve both target imbalance and dynamic distribution shifts (e.g., temporal or market-driven changes). The results demonstrate that our approach significantly outperforms existing imbalanced regression methods, especially under these shifts, achieving an average MAE improvement of 7.1%.
The integration of Large Language Models (LLMs) has led to substantial advancements in recommender systems (RS) by leveraging their vast knowledge and reasoning abilities. However, the semantic gap between the linguistic knowledge of LLMs and the collaborative patterns in RS hinders their effective fusion. This issue results in a fundamental limitation where models, despite achieving high prediction accuracy, are unable to provide coherent rationales justifying their recommendations. In this paper, we propose ARROW (Adaptive Reasoning for LLM-based RecommendatiOn With explainability), a novel framework that effectively elicits the intrinsic reasoning capabilities of LLMs to bridge this semantic gap. ARROW is carefully designed to guide the model in generating an explicit reasoning process for its recommendation decisions using chain-of-thought prompting. Furthermore, we introduce the Adaptive Reasoning Modulator, which quantifies the uncertainty of the reasoning process and adaptively adjusts its weight to maximize the model's reasoning efficacy. Our extensive experiments demonstrate that ARROW achieves significant performance improvements over strong baseline models while providing human-interpretable explanations. Our code is available at https://github.com/yunwooseong/ARROW.
Causality provides the foundation for understanding cause–effect relationships and supporting actionable decision making across domains such as healthcare, economics, and industrial production. However, most causal frameworks are designed for flat, fully observed, and independent data, whereas knowledge graphs (KGs) represent heterogeneous entities and multi-relational links enriched with ontologies and logical constraints under the Open World Assumption. These characteristics introduce unique challenges for causal analysis—semantic unawareness leading to inaccurate estimation, data incompleteness violating causal assumptions, and relational dependencies causing interference between entities. This thesis addresses these challenges by proposing a series of semantics-aware frameworks that integrate causal reasoning with knowledge representation. Leveraging ontological semantics, logical entailment, and relational structure, the proposed approach enables causal discovery, causal inference, and counterfactual prediction directly over KGs. Through the development of CauseKG, SemMatch, HyKG-CF, and BaLu, we establish an end-to-end pipeline that unifies causal knowledge learning, formal modeling, and query-based reasoning—paving the way for trustworthy and interpretable causal intelligence on knowledge graphs.
Recommender systems are expected to provide information based on the user's preference. Nonetheless, in fact, recommendations may mainly represent the user's main interests and narrow out their lesser interests because recommender systems are mostly optimized towards accuracy. To address this issue, Calibration is introduced to recommender systems, ensuring recommendations presenting the corresponding proportions of user's interests. Though previous research extended calibration beyond the original formation, most research is based on static recommender system settings, where user preference is static and recommender system is one-shot without systematic dynamics, like the feedback loop effect.
My thesis aims to further extend our understanding towards calibration in a user-centered dynamic recommender system by 1) proposing adaptive calibration aware of dynamic user preferences with two dynamic types, 2) conducting synthetic analysis to investigate calibration's benefits and limitations from a long-term perspective, and 3) gaining user centered perceptions on calibration with a user study. This research aims to provide a more realistic and thorough view of calibration in dynamic recommender systems.
When outcomes hinge on human actions or interpretations, achieving effective and trustworthy AI performance requires systems that can reason about human behavior, communication, and social context. Building such human-aligned intelligence demands AI that not only processes data but also comprehends the richness of human expression and the structure of human interactions. To achieve this, this research develops methods for augmenting and structuring data representations across modalities to capture what humans express through language, visuals, and behavior, and across graphs to represent how humans and information are connected within social and semantic networks. In addition, this research introduces alignment evaluation frameworks that assess whether models can reason consistently across cultural, linguistic, and professional contexts.
Misinformation on social media undermines informed discourse, yet most detection systems overlook how people actually judge credibility. This research bridges human perception and linguistic construction through two complementary analyses: large-scale measurement of user believability judgments in real social-media news discussions, and content-level modeling of linguistic ambivalence, the co-occurrence of conflicting emotional, cognitive, and temporal cues. Results show that perceived believability varies systematically with textual (verbs, proper nouns, interrogatives) and user features (personality proxies, writing complexity, emotional tone). Incorporating these explicit believability signals as edge features in graph neural networks substantially improves detection over content-only baselines. Complementary analyses reveal that domain specific ambivalence patterns—emotion–cognition tension in politics, sentiment mixing in entertainment, and present–future framing in health—serve as linguistic markers of deception. Together, these findings show that mitigating misinformation requires modeling both human credibility judgments and the linguistic conflicts that exploit them.
Conversational information-seeking systems increasingly rely on neural architectures and large language models (LLMs), yet they remain limited in their ability to retrieve structured, contextually relevant knowledge from large-scale knowledge graphs (KGs). My doctoral research addresses this gap through the novel task of Conversational Entity Retrieval from a Knowledge Graph (CER-KG)—the problem of identifying the correct KG entity in response to a context-dependent query within a multi-turn dialog. To support reproducible research, I introduced QBLink-KG, the first benchmark for CER-KG, adapted from conversational reading comprehension data. Building on this foundation, I proposed two neural ranking architectures: NACER, which aggregates lexical and semantic relevance signals from the local KG neighborhood, and DRAGON, which employs graph convolution and self-attention to model fine-grained, dialog-aware relationships across KG components. DRAGON achieves significant performance gains demonstrating the effectiveness of integrating graph-structured reasoning with conversational context modeling. Collectively, this research advances context-sensitive, structure-aware retrieval, bridging symbolic reasoning from KGs with neural representation learning for conversational information access.
Large language models (LLMs) excel at open-domain reasoning but often generate inconsistent or unverifiable answers. Retrieval-augmented generation (RAG) improves factual grounding, yet current KG-RAG systems rely on heuristic retrieval and lack interpretability. This dissertation proposes a planner–executor framework that formalizes retrieval as a structured planning problem. The planner analyzes a question, identifies relational constraints, and infers the underlying KG topology to generate an optimized retrieval plan. The executor follows this plan on Wikidata with bounded exploration, early stopping, and re-planning when constraints fail. This design enables controlled, auditable reasoning that balances completeness and efficiency. Evaluation will focus on retrieval faithfulness, reasoning accuracy, and computational cost using Wikidata-based QA benchmarks. Additional studies will examine constraint prioritization, query topology, and cross-model plan transfer between large and small LLMs. By integrating explicit planning into KG-RAG, we aim to develop scalable and interpretable reasoning systems that combine the structure of symbolic search with the adaptability of neural generation.
The rapid growth of online services has heightened concerns about user protection from cyber threats, particularly phishing, which poses significant risks to cyber-social security. To this end, we propose a novel tool for phishing detection called U-Proof. Our tool uses both state-of-the-art LLMs and traditional ML models to detect phishing websites. In particular, we evaluate the phishing detection capabilities of different LLMs and compare them with several ML models to analyze the impact of different model architectures on the identification of phishing websites. For a comprehensive experimental evaluation, we use a combination of public and custom datasets. These include active phishing websites from September 2024, as well as URLs from banks and postal services. Furthermore, the tool includes explanations to enhance user awareness of phishing tactics, supporting broader educational efforts to reduce risks.
Auditing large-scale recommender systems like YouTube remains a methodological challenge due to the trade-off between behavioral realism, scalability, and reproducibility. We present TRACE, an open-source framework that integrates Large Language Models to simulate context-aware, persona-driven user journeys. TRACE combines containerized browser automation, database-backed traceability, and asynchronous data enrichment to enable reproducible large-scale audits of YouTube's recommendation ecosystem. By decoupling experimental contexts from personas and supporting multiple behavioral modes, it allows researchers to model diverse user identities and explore how recommendation dynamics evolve over time. The framework's modular architecture, reproducible design, and web-based control interface facilitate transparent, comparative studies of algorithmic personalization and potential filter bubble formation. TRACE establishes a scalable foundation for empirically grounded, extensible, and ethically sound auditing of recommender systems.
Local railway committees need timely situational awareness after highway–rail grade crossing incidents, yet official Federal Railroad Administration (FRA) investigations can take days to weeks. We present a demo system that populates Highway–Rail Grade Crossing Incident Data (Form 57) from news in real time. Our approach addresses two core challenges: the form is visually irregular and semantically dense, and news is noisy. To solve these problems, we design a pipeline that first converts Form 57 into a JSON schema using a vision language model with sample aggregation, and then performs grouped question answering following the intent of the form layout to reduce ambiguity. In addition, we build an evaluation dataset by aligning scraped news articles with official FRA records and annotating retrievable information. We then assess our system against various alternatives in terms of information retrieval accuracy and coverage.
Advancements in conversational artificial intelligence (AI) enable scalable, web-based training environments beyond traditional classroom settings to support learning. In emergency communication centers such as 9-1-1, trainees have limited opportunities to practice diverse call-handling scenarios, as it relies on role-play instruction and constrained instructor availability. We introduce SHIELD (Strengthening Human Intervention in Emergencies through Learning with Data and AI), a framework and online conversational AI system designed to support scenario-based training through interactive simulations, adaptive feedback, and learning analytics. SHIELD integrates AI-generated call simulations with data science techniques to capture fine-grained trainee interaction data, including decision-making behavior, response timing, and corrective actions. These logged interactions are analyzed to provide real-time metacognitive nudges during simulated calls and post-performance feedback through AI-assisted analytics. The framework is informed by principles of self-regulated learning to structure training across planning, execution, and reflection phases. This demo showcases SHIELD as a web-based prototype deployed for emergency call-taker training and discusses design insights from initial testing sessions conducted in collaboration with a public safety communications agency. The system also highlights potential applicability to other high-stakes operational training settings.
Food Sharing Initiatives (FSIs) are a vital but often hidden part of urban life. The EU CULTIVATE project has developed the SHARECITY 200 database and an interactive graphical exploration application to make these practices visible across 200 cities. In this demonstration, we present the CULTIVATE system, which automates the discovery, classification, and updating of FSIs from online sources, but our interactive tool enables audiences to engage directly with the results through an interactive geo-spatial map. Our demonstration combines multilingual query construction with LLM-based rewriting, web searching, automated FSI classification with final expert verification, scheduled re-crawls to sustain accuracy and user navigation of the database through our graphical Food Sharing Map. The methods used in this system can easily be adapted for the exploration of online information in other domains.
The emergence of Mobility as a Service (MaaS) highlights the need for personalized and exploration-oriented recommendation that account for users' subjective preferences and multisensory experiences. However, existing approaches lack real-world sensory and contextual data to support such personalization. To address this gap, we developed FreePOST, a location-based information collection system that enables users to submit photos together with multisensory and location impressions tied to specific places. In a field study spanning over three months in Kyoto, 69 participants contributed more than 8,000 posts. The collected data were visualized in real time via a web application with an interactive map interface. Because the sensory tags assigned to locations may depend not only on the characteristics of the place itself but also on the personality traits of the participants, we also collected personality and behavioral data using the Big Five Inventory and the Brief Sensation Seeking Scale. Our results demonstrate the ability to effectively capture Kyoto's sensory landscapes and context-aware experiential features, providing a valuable foundation for integrating subjective and multisensory data into future personalized search and recommendation in MaaS contexts. This demo will showcase FreePOST's core functions of multisensory data submission and interactive map visualization.
Enterprises face a dual challenge when deploying large language models (LLMs) in customer-facing environments. The first challenge is ensuring knowledge retrieval accuracy and alignment with legal compliance. The second is ensuring alignment with their brand identity while optimizing engagement outcomes. Generic LLMs are trained on public data and world knowledge but do not represent enterprise-specific behaviour. When LLMs are supported with context-aware retrieval methods, they can address the first challenge. The second challenge, however, requires careful fine-tuning. For an enterprise, the fine-tuning process is similar to hiring someone with public knowledge about business, but is not yet trained to act as an experienced employee. This talk presents a hybrid fine-tuning and context-orchestration framework, an important step towards constructing an enterprise world model that unifies structured, unstructured, and conversational data for compliant, brand-aligned LLMs.
Industrial recommendation systems increasingly operate across heterogeneous products, user journeys, and feedback loops, yet most systems still optimize each stage—data curation, model training, and inference—largely in isolation. We present RankGraph-Context, different from a graph neural network model, which is a knowledgeable and agile graph-centric context framework that unifies these stages by (i) catching implicit relational signals during data construction, (ii) conditioning training on structured relational context, and (iii) adapting at inference time through post-training or test-time learning. Across several production-scale scenarios, RankGraph-Context delivers consistent improvements on cold-start retrieval, long-tail coverage, and cross-surface data curation, while enabling safe online adaptation through test-time updates. We detail the framework, instantiate it on multiple surfaces and use cases, and report offline and online results, showing that RankGraph-Context can empower different recommendation system stages with affordable engineering overhead.
The explosive proliferation of Short-Form Video (SFV) has created a high-velocity ecosystem that defies traditional analysis. Recommendation and discovery systems now face two critical adversarial behaviors at scale: Modality Misalignment, where creators insert distinctive text, images, or hidden audio/video clips with hidden agendas to manipulate algorithmic distribution, and Content Theft, where viral content is slightly tweaked and reposted as original to capture monetization. While monolithic Multi-Modal Large Language Models (MLLMs) offer strong zero-shot capabilities, they are increasingly insufficient for this production reality; their high computational cost, tendency for Object and Cross-Modal Hallucination, and lack of external context make them ill-suited for detecting fine-grained manipulation.
This work details a foundational Multi-Agent System (MAS) architecture designed to solve these problems by decomposing video understanding into three specialized roles: a Perceiver Agent for granular signal acquisition and representation; a Retriever Agent that utilizes hybrid, adaptive retrieval strategies; and a Reviewer Agent—a high reasoning orchestration model that iteratively adjudicates originality. We present an industrial blueprint for scaling this architecture, utilizing metadata-driven filtering, Semantic Caching, and Token Frugality to prune candidate pools from billions to millions, achieving massive compute savings. Finally, we propose a holistic evaluation framework to quantify operational reality, measuring cost per query, reasoning quality, and hallucination rates to ensure robust enterprise deployment.
Yupp is a consumer product that provides users with responses from multiple AI models in side-by-side comparisons, allowing them to seek diverse perspectives from many models at once. Through various incentive mechanisms, we encourage users to provide feedback on the response they prefer, which is then aggregated into a leaderboard that ranks models along many dimensions. While much progress in AI has been driven by benchmark datasets, gathering ''vibes'' from diverse populations for real-world use cases is also highly valuable. Looking further ahead, we envision a future where AIs collaborate and compete in multi-party dialogues. To this end, we discuss attempts to translate innovations from the research lab into an engaging consumer product.
Despite the demonstrated success of Pre-trained Language Models (PLM) in enhancing various recommendation tasks, their performance in friend recommendation systems, which are heavily influenced by complex social network dynamics, remains unproven. On the other hand, for social networks like Facebook, graph-based machine learning approaches in friend suggestion, like Graph Neural Networks (GNNs), struggle with the computational demands posed by social graphs. In is paper we investigate the application of PLM and GNN in the context of PYMK features within Facebook's social graph. We propose a representation learning scheme that captures both the local structure and semantic content within a second-degree connection space. Our approach leverages pre-trained transformer models with a feature aggregation scheme that enables efficient node representation learning in large social networks. We detail our model's architecture and discuss our training methods, accompanied by detailed experiments. Our model has improved the amount of friending on Facebook, facilitating new connections for users each day in online experiment.
Recent advances in causal machine learning introduced a plethora of new causal discovery and causal inference models. Yet, these models exhibit different performances when they train on different data or different hardware/software platforms, making it challenging for users to select the appropriate setup pertinent to their specific problem instance. The situation is complicated by the fact that, until recently, the field lacked a unified, publicly available, and configurable benchmarks that support major causal inference tasks. We argue that the causal learning community can achieve the same by meticulously surveying the emerging field of vibrant research, systematically categorizing existing benchmarking efforts into technically meaningful groups, and discovering the areas where further efforts are in urgent need. A concerted effort towards benchmarking of causal learning can be extremely valuable for not only causal learning algorithm design but also for comparison and benchmarking of available solutions. This workshop aims to boost the advancement of research in causal learning by facilitating scientific collaboration in novel algorithms, datasets, and metrics and promotes scientific objectivity, reproducibility, fairness, and awareness of bias in causal learning research. Thus, CausalBench calls for papers on benchmarking data, algorithms, models, and metrics for causal learning, impacting the needs of a broad range of scientific and engineering disciplines, including the Web.
Streaming media has become a popular medium for consumers of all ages, with people spending several hours a day streaming videos, games, music, audiobooks or podcasts across devices. Most global streaming services have introduced Generative Artificial Intelligence (GenAI) into their operations to personalize consumer experience, improve content, and further enhance the value proposition of streaming services. Despite the rapid growth, there is a need to bridge the gap between academic research and industry requirements and build connections between researchers and practitioners in the field. This workshop aims to provide a unique forum for practitioners and researchers interested in GenAI to get together, exchange ideas and get a pulse for the state of the art in research and burning issues in the industry.
Building personalized recommender systems and search experiences is a cornerstone of the modern data mining and applied machine learning (ML) community. Modern online platforms have a confluence of data including user-item interaction graphs, user and item-associated semantics (text, visual content, etc.), and metadata. Recent advancements in generative models and semantic encoders via large language models (LLMs), visual and audio encoders have significantly impacted research in relevant domains, enabling new directions in knowledge discovery and ability of models to better incorporate semantic context. These techniques are quickly advancing in the academic sphere, and adoption in industrial environments is growing. These advances force large questions about the future of search, recommendation and personalized experiences in the future. This workshop bridges the research gap between the use of generative models and recommendation for personalized systems. We will focus on topics spanning the interplay between such models and conventional personalized systems. Building upon the momentum of previous successful forums, we seek to engage a diverse audience from academia and industry, fostering a dialogue that incorporates fresh insights and anticipates over 100 attendees, including key stakeholders in the field.