[
  {
    "path": ".github/ISSUE_TEMPLATE.md",
    "content": "---\ntitle: Latest 15 Papers - April 20, 2026\nlabels: documentation\n---\n**Please check the [Github](https://github.com/zezhishao/MTS_Daily_ArXiv) page for a better reading experience and more papers.**\n\n## Time Series\n| **Title** | **Date** | **Comment** |\n| --- | --- | --- |\n| **[On the robustness of Mann-Kendall tests used to forecast critical transitions](https://arxiv.org/abs/2604.15230v1)** | 2026-04-16 | <details><summary>26 pa...</summary><p>26 pages including appendices, 10 figures, 2 tables</p></details> |\n| **[Parameter estimation for land-surface models using Neural Physics](https://arxiv.org/abs/2505.02979v3)** | 2026-04-16 | <details><summary>18 pa...</summary><p>18 pages, 5 figures, 3 tables</p></details> |\n| **[TempusBench: An Evaluation Framework for Time-Series Forecasting](https://arxiv.org/abs/2604.11529v2)** | 2026-04-16 |  |\n| **[Time-RA: Towards Time Series Reasoning for Anomaly Diagnosis with LLM Feedback](https://arxiv.org/abs/2507.15066v5)** | 2026-04-16 | <details><summary>ACL 2...</summary><p>ACL 2026 Findings. 27 pages, 11 figures, 15 tables. Code and dataset are publicly available</p></details> |\n| **[MambaSL: Exploring Single-Layer Mamba for Time Series Classification](https://arxiv.org/abs/2604.15174v1)** | 2026-04-16 | <details><summary>accep...</summary><p>accepted at ICLR 2026</p></details> |\n| **[Assessing the Potential of Masked Autoencoder Foundation Models in Predicting Downhole Metrics from Surface Drilling Data](https://arxiv.org/abs/2604.15169v1)** | 2026-04-16 |  |\n| **[Universal hidden monotonic trend estimation with contrastive learning](https://arxiv.org/abs/2210.09817v3)** | 2026-04-16 |  |\n| **[Logo-LLM: Local and Global Modeling with Large Language Models for Time Series Forecasting](https://arxiv.org/abs/2505.11017v2)** | 2026-04-16 |  |\n| **[Assessing the Performance-Efficiency Trade-off of Foundation Models in Probabilistic Electricity Price Forecasting](https://arxiv.org/abs/2604.14739v1)** | 2026-04-16 | <details><summary>Submi...</summary><p>Submitted to the 7th International Workshop on Energy Data and Analytics (EDA), held in conjunction with ACM e-Energy 2026</p></details> |\n| **[From Time Series to State: Situation-Aware Modeling for Air Traffic Flow Prediction](https://arxiv.org/abs/2604.11198v3)** | 2026-04-16 | <details><summary>There...</summary><p>There are issues with the authors of the paper I submitted, as well as problems with the content of the article, so it needs to be withdrawn. Thank you for your understanding</p></details> |\n| **[CSRA: Controlled Spectral Residual Augmentation for Robust Sepsis Prediction](https://arxiv.org/abs/2604.14532v1)** | 2026-04-16 |  |\n| **[AIBuildAI: An AI Agent for Automatically Building AI Models](https://arxiv.org/abs/2604.14455v1)** | 2026-04-15 |  |\n| **[Frame forecasting in cine MRI using the PCA respiratory motion model: comparing recurrent neural networks trained online and transformers](https://arxiv.org/abs/2410.05882v3)** | 2026-04-15 | <details><summary>43 pa...</summary><p>43 pages, 19 figures. Revised version with minor corrections and improved figures and language. Accepted for publication in Computerized Medical Imaging and Graphics</p></details> |\n| **[Unsupervised Anomaly Detection in Process-Complex Industrial Time Series: A Real-World Case Study](https://arxiv.org/abs/2604.13928v1)** | 2026-04-15 |  |\n| **[ASTER: Latent Pseudo-Anomaly Generation for Unsupervised Time-Series Anomaly Detection](https://arxiv.org/abs/2604.13924v1)** | 2026-04-15 |  |\n\n## Trajectory\n| **Title** | **Date** | **Comment** |\n| --- | --- | --- |\n| **[LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories](https://arxiv.org/abs/2604.15311v1)** | 2026-04-16 | <details><summary>Accep...</summary><p>Accepted by CVPR 2026. Project page: https://rockeycoss.github.io/leapalign/</p></details> |\n| **[OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis](https://arxiv.org/abs/2604.15093v1)** | 2026-04-16 | Work in progress |\n| **[Trajectory Planning for a Multi-UAV Rigid-Payload Cascaded Transportation System Based on Enhanced Tube-RRT*](https://arxiv.org/abs/2604.15074v1)** | 2026-04-16 | <details><summary>15 pa...</summary><p>15 pages, 7 figures. Under review at IEEE Transactions on Aerospace and Electronic Systems (TAES). This work has been submitted to the IEEE for possible publication</p></details> |\n| **[Predicting Power-System Dynamic Trajectories with Foundation Models](https://arxiv.org/abs/2604.14991v1)** | 2026-04-16 | 10 pages |\n| **[Momentum-constrained Hybrid Heuristic Trajectory Optimization Framework with Residual-enhanced DRL for Visually Impaired Scenarios](https://arxiv.org/abs/2604.14986v1)** | 2026-04-16 | <details><summary>24 pa...</summary><p>24 pages, 14 figures. arXiv admin note: text overlap with arXiv:2509.15582</p></details> |\n| **[Reward-Aware Trajectory Shaping for Few-step Visual Generation](https://arxiv.org/abs/2604.14910v1)** | 2026-04-16 |  |\n| **[Benchmarks for Trajectory Safety Evaluation and Diagnosis in OpenClaw and Codex: ATBench-Claw and ATBench-CodeX](https://arxiv.org/abs/2604.14858v1)** | 2026-04-16 | 18 pages, 3 figures |\n| **[Trajectory-based actuator identification via differentiable simulation](https://arxiv.org/abs/2604.10351v2)** | 2026-04-16 |  |\n| **[Tora3: Trajectory-Guided Audio-Video Generation with Physical Coherence](https://arxiv.org/abs/2604.09057v2)** | 2026-04-16 | <details><summary>12 pa...</summary><p>12 pages, 5 tables, 5 figures</p></details> |\n| **[FinTrace: Holistic Trajectory-Level Evaluation of LLM Tool Calling for Long-Horizon Financial Tasks](https://arxiv.org/abs/2604.10015v2)** | 2026-04-15 |  |\n| **[Staying on Track: Efficient Trajectory Discovery with Adaptive Batch Sampling](https://arxiv.org/abs/2510.18099v2)** | 2026-04-15 |  |\n| **[HINTBench: Horizon-agent Intrinsic Non-attack Trajectory Benchmark](https://arxiv.org/abs/2604.13954v1)** | 2026-04-15 |  |\n| **[Bayesian Joint Modelling of Longitudinal Creatinine Trajectories in Children with Auto-Immune Disorders to Predict Paediatric Kidney Disease Risk in a Single Centre Study](https://arxiv.org/abs/2604.12740v1)** | 2026-04-14 |  |\n| **[FeaXDrive: Feasibility-aware Trajectory-Centric Diffusion Planning for End-to-End Autonomous Driving](https://arxiv.org/abs/2604.12656v1)** | 2026-04-14 | 21 pages, 6 figures |\n| **[SparseWorld-TC: Trajectory-Conditioned Sparse Occupancy World Model](https://arxiv.org/abs/2511.22039v3)** | 2026-04-14 | <details><summary>Accep...</summary><p>Accepted by CVPR2026 as an oral</p></details> |\n\n## Graph Neural Networks\n| **Title** | **Date** | **Comment** |\n| --- | --- | --- |\n| **[How Embeddings Shape Graph Neural Networks: Classical vs Quantum-Oriented Node Representations](https://arxiv.org/abs/2604.15273v1)** | 2026-04-16 | <details><summary>6 pag...</summary><p>6 pages. Accepted at IJCNN 2026</p></details> |\n| **[Sampling Transferable Graph Neural Networks with Limited Graph Information](https://arxiv.org/abs/2410.16593v6)** | 2026-04-16 | <details><summary>Submi...</summary><p>Submitted to IEEE TSP</p></details> |\n| **[Beyond the Laplacian: Doubly Stochastic Matrices for Graph Neural Networks](https://arxiv.org/abs/2604.15069v1)** | 2026-04-16 |  |\n| **[Adaptive Canonicalization with Application to Invariant Anisotropic Geometric Networks](https://arxiv.org/abs/2509.24886v3)** | 2026-04-16 |  |\n| **[Dense Neural Networks are not Universal Approximators](https://arxiv.org/abs/2602.07618v3)** | 2026-04-16 |  |\n| **[Using deep learning to construct stochastic local search SAT solvers with performance bounds](https://arxiv.org/abs/2309.11452v3)** | 2026-04-16 | <details><summary>24 pa...</summary><p>24 pages, significantly updated version with new datasets and experiments. Code available at https://github.com/porscheofficial/sls_sat_solving_with_deep_learning. Accepted for publication in Machine Learning: Science and Technology 7 (2026) 025057</p></details> |\n| **[Leveraging graph neural networks and mobility data for COVID-19 forecasting](https://arxiv.org/abs/2501.11711v2)** | 2026-04-16 |  |\n| **[PUFFIN: Protein Unit Discovery with Functional Supervision](https://arxiv.org/abs/2604.14796v1)** | 2026-04-16 | <details><summary>21 pa...</summary><p>21 pages, 9 figures, to appear in ISMB 2026 proceedings</p></details> |\n| **[Graph-Based Alternatives to LLMs for Human Simulation](https://arxiv.org/abs/2511.02135v2)** | 2026-04-16 | <details><summary>Confe...</summary><p>Conference: ACL 2026 Long Main Code: https://github.com/schang-lab/gems</p></details> |\n| **[CPGRec+: A Balance-oriented Framework for Personalized Video Game Recommendations](https://arxiv.org/abs/2604.14586v1)** | 2026-04-16 | <details><summary>Publi...</summary><p>Published in ACM Transactions on Information Systems (TOIS). 43 pages, 9 figures</p></details> |\n| **[Behavior-Aware Dual-Channel Preference Learning for Heterogeneous Sequential Recommendation](https://arxiv.org/abs/2604.14581v1)** | 2026-04-16 |  |\n| **[Unsupervised Learning of Local Updates for Maximum Independent Set in Dynamic Graphs](https://arxiv.org/abs/2505.13754v3)** | 2026-04-15 | <details><summary>10 pa...</summary><p>10 pages, 3 figures, 1 table, 3 algorithms. To appear at IJCNN 2026</p></details> |\n| **[ID and Graph View Contrastive Learning with Multi-View Attention Fusion for Sequential Recommendation](https://arxiv.org/abs/2604.14114v1)** | 2026-04-15 |  |\n| **[Log-based vs Graph-based Approaches to Fault Diagnosis](https://arxiv.org/abs/2604.14019v1)** | 2026-04-15 | <details><summary>8 pag...</summary><p>8 pages, 7 figures, student project</p></details> |\n| **[Leveraging LLM-GNN Integration for Open-World Question Answering over Knowledge Graphs](https://arxiv.org/abs/2604.13979v1)** | 2026-04-15 | <details><summary>18 pa...</summary><p>18 pages,6 figures,10 tables. https://aclanthology.org/2026.eacl-long.26/</p></details> |\n\n"
  },
  {
    "path": ".github/workflows/update.yaml",
    "content": "name: Update\n\non:\n  label:\n    types:\n      - created # for test\n  schedule:\n      - cron: '30 16 * * 0-4' # 00:30 Beijing time every Monday to Friday\n\npermissions:\n  contents: write\n  issues: write \n\njobs:\n  update_daily_papers:\n    runs-on: ubuntu-latest\n    steps:\n    - name: Checkout repository\n      uses: actions/checkout@v3\n\n    - name: Set up Python\n      uses: actions/setup-python@v4\n      with:\n        python-version: '3.9'\n\n    - name: Install dependencies\n      run: pip install -r requirements.txt\n\n    - name: Update papers\n      run: |\n        python main.py\n\n    - name: Commit and push changes\n      uses: github-actions-x/commit@v2.9\n      with:\n        github-token: ${{ secrets.GITHUB_TOKEN }}\n        push-branch: 'master'\n        commit-message: '✏️ Update papers automatically.'\n        force-add: 'true'\n        files: README.md .github/ISSUE_TEMPLATE.md\n        name: Zezhishao\n        email: 864453277@qq.com\n\n    - name: Create an issue to notify\n      uses: JasonEtco/create-an-issue@v2\n      env:\n        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}\n"
  },
  {
    "path": ".gitignore",
    "content": "# dir\n__pycache__/\n.vscode/\n\n# file\n*.npz\n*.npy\n*.csv\n*.pkl\n*.h5\n*.pt\ncore*\n*.p\n*.pickle\n*.pyc\n*.txt\n\n*.py[cod]\n*$py.class\n\n# C extensions\n# *.so\n\n# Distribution / packaging\n.Python\nbuild/\ndevelop-eggs/\ndist/\ndownloads/\neggs/\n.eggs/\nlib/\nlib64/\nparts/\nsdist/\nvar/\nwheels/\nshare/python-wheels/\n*.egg-info/\n.installed.cfg\n*.egg\nMANIFEST\n\n# PyInstaller\n#  Usually these files are written by a python script from a template\n#  before PyInstaller builds the exe, so as to inject date/other infos into it.\n*.manifest\n*.spec\n\n# Installer logs\npip-log.txt\npip-delete-this-directory.txt\n\n# Unit test / coverage reports\nhtmlcov/\n.tox/\n.nox/\n.coverage\n.coverage.*\n.cache\nnosetests.xml\ncoverage.xml\n*.cover\n*.py,cover\n.hypothesis/\n.pytest_cache/\ncover/\n\n# Translations\n*.mo\n*.pot\n\n# Django stuff:\n*.log\nlocal_settings.py\ndb.sqlite3\ndb.sqlite3-journal\n\n# Flask stuff:\ninstance/\n.webassets-cache\n\n# Scrapy stuff:\n.scrapy\n\n# Sphinx documentation\ndocs/_build/\n\n# PyBuilder\n.pybuilder/\ntarget/\n\n# Jupyter Notebook\n.ipynb_checkpoints\n\n# IPython\nprofile_default/\nipython_config.py\n\n# pyenv\n#   For a library or package, you might want to ignore these files since the code is\n#   intended to run in multiple environments; otherwise, check them in:\n# .python-version\n\n# pipenv\n#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.\n#   However, in case of collaboration, if having platform-specific dependencies or dependencies\n#   having no cross-platform support, pipenv may install dependencies that don't work, or not\n#   install all needed dependencies.\n#Pipfile.lock\n\n# poetry\n#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.\n#   This is especially recommended for binary packages to ensure reproducibility, and is more\n#   commonly ignored for libraries.\n#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control\n#poetry.lock\n\n# pdm\n#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.\n#pdm.lock\n#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it\n#   in version control.\n#   https://pdm.fming.dev/#use-with-ide\n.pdm.toml\n\n# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm\n__pypackages__/\n\n# Celery stuff\ncelerybeat-schedule\ncelerybeat.pid\n\n# SageMath parsed files\n*.sage.py\n\n# Environments\n.env\n.venv\nenv/\nvenv/\nENV/\nenv.bak/\nvenv.bak/\n\n# Spyder project settings\n.spyderproject\n.spyproject\n\n# Rope project settings\n.ropeproject\n\n# mkdocs documentation\n/site\n\n# mypy\n.mypy_cache/\n.dmypy.json\ndmypy.json\n\n# Pyre type checker\n.pyre/\n\n# pytype static type analyzer\n.pytype/\n\n# Cython debug symbols\ncython_debug/\n\n# PyCharm\n#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can\n#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore\n#  and can be added to the global gitignore or merged into this file.  For a more nuclear\n#  option (not recommended) you can uncomment the following to ignore the entire idea folder."
  },
  {
    "path": "README.md",
    "content": "# Daily Papers\nThe project automatically fetches the latest papers from arXiv based on keywords.\n\nThe subheadings in the README file represent the search keywords.\n\nOnly the most recent articles for each keyword are retained, up to a maximum of 100 papers.\n\nYou can click the 'Watch' button to receive daily email notifications.\n\nLast update: 2026-04-20\n\n## Time Series\n| **Title** | **Date** | **Abstract** | **Comment** |\n| --- | --- | --- | --- |\n| **[On the robustness of Mann-Kendall tests used to forecast critical transitions](https://arxiv.org/abs/2604.15230v1)** | 2026-04-16 | <details><summary>Show</summary><p>Non-parametric approaches to test for trends in time series make use of the Mann-Kendall statistic. Based on asymptotic arguments, these tests assume that its distribution follows a Gaussian distribution, even for autocorrelated time series. Recent results on the lack of validity of this assumption urge a robustness analysis of these approaches. While the issue is relevant across a wide range of applications, we illustrate it here in the context of detecting early warning signals (EWS) of critical transitions, which are used across a variety of research domains, and where commonly applied methods generate autocorrelation. We present a broad analysis, covering all types of critical transitions commonly investigated in EWS studies. We compare empirical distributions of the Mann-Kendall statistic computed from classical EWS indicators preceding critical transitions to the theoretical distributions hypothesized by Mann-Kendall tests. We detect mismatches leading to inflated type I error rates, which would routinely lead to announcing a critical transition while it is not occurring. In contrast to a recent recommendation, we conclude that the use of Mann-Kendall tests for trend detection in the context of forecasting critical transitions should be avoided. We point out several alternative methods available instead.</p></details> | <details><summary>26 pa...</summary><p>26 pages including appendices, 10 figures, 2 tables</p></details> |\n| **[Parameter estimation for land-surface models using Neural Physics](https://arxiv.org/abs/2505.02979v3)** | 2026-04-16 | <details><summary>Show</summary><p>We propose a novel inverse-modelling approach which estimates the parameters of a simple land-surface model (LSM) by assimilating data into a differentiable physics-based forward model. The governing equations are expressed within a machine-learning framework using the Neural Physics approach, allowing direct gradient-based optimisation of time-dependent parameters without the need to derive and maintain adjoint formulations. The model parameters are updated by minimising the mismatch between model predictions and synthetic or observational data. Although differentiability is achieved through functionality in machine-learning libraries, the forward model itself remains entirely physics-based and no part of either the forward model or the parameter estimation involves training. In order to test the approach, a synthetic dataset is generated by running the forward model with known parameter values to create a time series of soil temperature that serves as observations for an inverse problem in which the parameters are assumed unknown and subsequently estimated. We show that it is not possible to obtain a reliable estimate of the parameters using a time series of soil temperature observed at a single depth. Using measurements at two depths, reliable parameter estimates can be obtained although it is not possible to differentiate between latent and sensible heat fluxes. We also apply the approach to urban flux tower data from Phoenix, United States, and show that the thermal conductivity, volumetric heat capacity and the combined sensible-latent heat transfer coefficient can be reliably estimated whilst using an observed value for the effective surface albedo.</p></details> | <details><summary>18 pa...</summary><p>18 pages, 5 figures, 3 tables</p></details> |\n| **[TempusBench: An Evaluation Framework for Time-Series Forecasting](https://arxiv.org/abs/2604.11529v2)** | 2026-04-16 | <details><summary>Show</summary><p>Foundation models have transformed natural language processing and computer vision, and a rapidly growing literature on time-series foundation models (TSFMs) seeks to replicate this success in forecasting. While recent open-source models demonstrate the promise of TSFMs, the field lacks a comprehensive and community-accepted model evaluation framework. We see at least four major issues impeding progress on the development of such a framework. First, existing evaluation frameworks comprise benchmark forecasting tasks derived from often outdated datasets (e.g., M3), many of which lack clear metadata and overlap with the corpora used to pre-train TSFMs. Second, these frameworks evaluate models along a narrowly defined set of benchmark forecasting tasks, such as forecast horizon length or domain, but overlook core statistical properties such as non-stationarity and seasonality. Third, domain-specific models (e.g., XGBoost) are often compared unfairly, as existing frameworks do not enforce a systematic and consistent hyperparameter tuning convention for all models. Fourth, visualization tools for interpreting comparative performance are lacking. To address these issues, we introduce TempusBench, an open-source evaluation framework for TSFMs. TempusBench consists of 1) new datasets which are not included in existing TSFM pretraining corpora, 2) a set of novel benchmark tasks that go beyond existing ones, 3) a model evaluation pipeline with a standardized hyperparameter tuning protocol, and 4) a tensorboard-based visualization interface. We provide access to our code on GitHub: https://github.com/Smlcrm/TempusBench and maintain a live leaderboard at https://benchmark.smlcrm.com/.</p></details> |  |\n| **[Time-RA: Towards Time Series Reasoning for Anomaly Diagnosis with LLM Feedback](https://arxiv.org/abs/2507.15066v5)** | 2026-04-16 | <details><summary>Show</summary><p>Time series anomaly detection (TSAD) has traditionally focused on binary classification and often lacks the fine-grained categorization and explanatory reasoning required for transparent decision-making. To address these limitations, we propose Time-series Reasoning for Anomaly (Time-RA), a novel task that reformulates TSAD from a discriminative into a generative, reasoning-intensive paradigm. To facilitate this, we introduce RATs40K, the first real-world large-scale multimodal benchmark with ~40,000 samples across 10 domains, integrating raw time series, textual context, and visual plots with structured reasoning annotations. Extensive benchmarking shows that while supervised fine-tuning and visual representations boost diagnostic accuracy and reasoning consistency, performance varies across complex scenarios. Notably, fine-tuned models demonstrate strong \"plug-and-play\" transferability, outperforming traditional baselines on unseen real-world datasets. Our work establishes a foundation for interpretable, multimodal time series analysis. All code (https://github.com/yyysjz1997/Time-RA) and the RATs40K dataset (https://huggingface.co/datasets/Time-RA/RATs40K) are fully open-sourced to facilitate future research.</p></details> | <details><summary>ACL 2...</summary><p>ACL 2026 Findings. 27 pages, 11 figures, 15 tables. Code and dataset are publicly available</p></details> |\n| **[MambaSL: Exploring Single-Layer Mamba for Time Series Classification](https://arxiv.org/abs/2604.15174v1)** | 2026-04-16 | <details><summary>Show</summary><p>Despite recent advances in state space models (SSMs) such as Mamba across various sequence domains, research on their standalone capacity for time series classification (TSC) has remained limited. We propose MambaSL, a framework that minimally redesigns the selective SSM and projection layers of a single-layer Mamba, guided by four TSC-specific hypotheses. To address benchmarking limitations -- restricted configurations, partial University of East Anglia (UEA) dataset coverage, and insufficiently reproducible setups -- we re-evaluate 20 strong baselines across all 30 UEA datasets under a unified protocol. As a result, MambaSL achieves state-of-the-art performance with statistically significant average improvements, while ensuring reproducibility via public checkpoints for all evaluated models. Together with visualizations, these results demonstrate the potential of Mamba-based architectures as a TSC backbone.</p></details> | <details><summary>accep...</summary><p>accepted at ICLR 2026</p></details> |\n| **[Assessing the Potential of Masked Autoencoder Foundation Models in Predicting Downhole Metrics from Surface Drilling Data](https://arxiv.org/abs/2604.15169v1)** | 2026-04-16 | <details><summary>Show</summary><p>Oil and gas drilling operations generate extensive time-series data from surface sensors, yet accurate real-time prediction of critical downhole metrics remains challenging due to the scarcity of labelled downhole measurements. This systematic mapping study reviews thirteen papers published between 2015 and 2025 to assess the potential of Masked Autoencoder Foundation Models (MAEFMs) for predicting downhole metrics from surface drilling data. The review identifies eight commonly collected surface metrics and seven target downhole metrics. Current approaches predominantly employ neural network architectures such as artificial neural networks (ANNs) and long short-term memory (LSTM) networks, yet no studies have explored MAEFMs despite their demonstrated effectiveness in time-series modeling. MAEFMs offer distinct advantages through self-supervised pre-training on abundant unlabeled data, enabling multi-task prediction and improved generalization across wells. This research establishes that MAEFMs represent a technically feasible but unexplored opportunity for drilling analytics, recommending future empirical validation of their performance against existing models and exploration of their broader applicability in oil and gas operations.</p></details> |  |\n| **[Universal hidden monotonic trend estimation with contrastive learning](https://arxiv.org/abs/2210.09817v3)** | 2026-04-16 | <details><summary>Show</summary><p>In this paper, we describe a universal method for extracting the underlying monotonic trend factor from time series data. We propose an approach related to the Mann-Kendall test, a standard monotonic trend detection method and call it contrastive trend estimation (CTE). We show that the CTE method identifies any hidden trend underlying temporal data while avoiding the standard assumptions used for monotonic trend identification. In particular, CTE can take any type of temporal data (vector, images, graphs, time series, etc.) as input. We finally illustrate the interest of our CTE method through several experiments on different types of data and problems.</p></details> |  |\n| **[Logo-LLM: Local and Global Modeling with Large Language Models for Time Series Forecasting](https://arxiv.org/abs/2505.11017v2)** | 2026-04-16 | <details><summary>Show</summary><p>Time series forecasting is critical across multiple domains, where time series data exhibit both local patterns and global dependencies. While Transformer-based methods effectively capture global dependencies, they often overlook short-term local variations in time series. Recent methods that adapt large language models (LLMs) into time series forecasting inherit this limitation by treating LLMs as black-box encoders, relying solely on the final-layer output and underutilizing hierarchical representations. To address this limitation, we propose Logo-LLM, a novel LLM-based framework that explicitly extracts and models multi-scale temporal features from different layers of a pre-trained LLM. Through empirical analysis, we show that shallow layers of LLMs capture local dynamics in time series, while deeper layers encode global trends. Moreover, Logo-LLM introduces lightweight Local-Mixer and Global-Mixer modules to align and integrate features with the temporal input across layers. Extensive experiments demonstrate that Logo-LLM achieves superior performance across diverse benchmarks, with strong generalization in few-shot and zero-shot settings while maintaining low computational overhead.</p></details> |  |\n| **[Assessing the Performance-Efficiency Trade-off of Foundation Models in Probabilistic Electricity Price Forecasting](https://arxiv.org/abs/2604.14739v1)** | 2026-04-16 | <details><summary>Show</summary><p>Large-scale renewable energy deployment introduces pronounced volatility into the electricity system, turning grid operation into a complex stochastic optimization problem. Accurate electricity price forecasting (EPF) is essential not only to support operational decisions, such as optimal bidding strategies and balancing power preparation, but also to reduce economic risk and improve market efficiency. Probabilistic forecasts are particularly valuable because they quantify uncertainty stemming from renewable intermittency, market coupling, and regulatory changes, enabling market participants to make informed decisions that minimize losses and optimize expected revenues. However, it remains an open question which models to employ to produce accurate forecasts. Should these be task-specific machine learning (ML) models or Time Series Foundation Models (TSFMs)? In this work, we compare four models for day-ahead probabilistic EPF (PEPF) in European bidding zones: a deterministic NHITS backbone with Quantile-Regression Averaging (NHITS+QRA) and a conditional Normalizing-Flow forecaster (NF) are compared with two TSFMs, namely Moirai and ChronosX. On the one hand, we find that TSFMs outperform task-specific deep learning models trained from scratch in terms of CRPS, Energy Score, and predictive interval calibration across market conditions. On the other hand, we find that well-configured task-specific models, particularly NHITS combined with QRA, achieve performance very close to TSFMs, and in some scenarios, such as when supplied with additional informative feature groups or adapted via few-shot learning from other European markets, they can even surpass TSFMs. Overall, our findings show that while TSFMs offer expressive modeling capabilities, conventional models remain highly competitive, emphasizing the need to weigh computational expense against marginal performance improvements in PEPF.</p></details> | <details><summary>Submi...</summary><p>Submitted to the 7th International Workshop on Energy Data and Analytics (EDA), held in conjunction with ACM e-Energy 2026</p></details> |\n| **[From Time Series to State: Situation-Aware Modeling for Air Traffic Flow Prediction](https://arxiv.org/abs/2604.11198v3)** | 2026-04-16 | <details><summary>Show</summary><p>Accurate air traffic prediction in the terminal airspace (TA) is pivotal for proactive air traffic management (ATM). However, existing data-driven approaches predominantly rely on time series-based forecasting paradigms, which inherently overlook critical aircraft state information, such as real-time kinematics and proximity to airspace boundaries. To address this limitation, we propose \\textit{AeroSense}, a direct state-to-flow modeling framework for air traffic prediction. Unlike classical time series-based methods that first aggregate aircraft trajectories into macroscopic flow sequences before modeling, AeroSense explicitly represents the real-time airspace situation as \\textit{a dynamic set of aircraft states}, enabling the direct processing of a variable number of aircraft instead of time series as inputs. Specifically, we introduce a situation-aware state representation that enables AeroSense to sense the instantaneous terminal airspace situation directly from microscopic aircraft states. Furthermore, we design a model architecture that incorporates masked self-attention to capture inter-aircraft interactions, together with two decoupled prediction heads to model heterogeneous flow dynamics across two key functional areas of the TA. Extensive experiments on a large-scale real-world airport dataset demonstrate that AeroSense consistently achieves state-of-the-art performance, validating that direct modeling of microscopic aircraft states yields substantially higher predictive fidelity than time series-based baselines. Moreover, the proposed framework exhibits superior robustness during peak traffic periods, achieves Pareto-optimal performance under dayparting multi-object evaluation, and provides meaningful interpretability through attention-based visualizations.</p></details> | <details><summary>There...</summary><p>There are issues with the authors of the paper I submitted, as well as problems with the content of the article, so it needs to be withdrawn. Thank you for your understanding</p></details> |\n| **[CSRA: Controlled Spectral Residual Augmentation for Robust Sepsis Prediction](https://arxiv.org/abs/2604.14532v1)** | 2026-04-16 | <details><summary>Show</summary><p>Accurate prediction of future risk and disease progression in sepsis is clinically important for early warning and timely intervention in intensive care. However, short-window sepsis prediction remains challenging, because shorter observation windows provide limited historical evidence, whereas longer prediction horizons reduce the number of patient trajectories with valid future supervision. To address this problem, we propose CSRA, a Controlled Spectral Residual Augmentation framework for short-window multi-system ICU time series. CSRA first groups variables by clinical systems and extracts system-level and global representations. It then performs input-adaptive residual perturbation in the spectral domain to generate structured and clinically plausible trajectory variations. To improve augmentation stability and controllability, CSRA is trained end-to-end with the downstream predictor under a unified objective, together with anchor consistency loss and controller regularization. Experiments on a MIMIC-IV sepsis cohort across multiple downstream models show that CSRA is consistently competitive and often superior, reducing regression error by 10.2\\% in MSE and 3.7\\% in MAE over the non-augmentation baseline, while also yielding consistent gains on classification. CSRA further maintains more favorable performance under shorter observation windows, longer prediction horizons, and smaller training data scales, while also remaining effective on an external clinical dataset~(ZiGongICUinfection), indicating stronger robustness and generalizability in clinically constrained settings.</p></details> |  |\n| **[AIBuildAI: An AI Agent for Automatically Building AI Models](https://arxiv.org/abs/2604.14455v1)** | 2026-04-15 | <details><summary>Show</summary><p>AI models underpin modern intelligent systems, driving advances across science, medicine, finance, and technology. Yet developing high-performing AI models remains a labor-intensive process that requires expert practitioners to iteratively design architectures, engineer representations, implement training pipelines and refine approaches through empirical evaluation. Existing AutoML methods partially alleviate this burden but remain limited to narrow aspects such as hyperparameter optimization and model selection within predefined search spaces, leaving the full development lifecycle largely dependent on human expertise. To address this gap, we introduce AIBuildAI, an AI agent that automatically builds AI models from a task description and training data. AIBuildAI adopts a hierarchical agent architecture in which a manager agent coordinates three specialized sub-agents: a designer for modeling strategy, a coder for implementation and debugging, and a tuner for training and performance optimization. Each sub-agent is itself a large language model (LLM) based agent capable of multi-step reasoning and tool use, enabling end-to-end automation of the AI model development process that goes beyond the scope of existing AutoML approaches. We evaluate AIBuildAI on MLE-Bench, a benchmark of realistic Kaggle-style AI development tasks spanning visual, textual, time-series and tabular modalities. AIBuildAI ranks first on MLE-Bench with a medal rate of 63.1%, outperforming all existing baseline methods and matching the capability of highly experienced AI engineers. These results demonstrate that hierarchical agent systems can automate the full AI model development process from task specification to deployable model, suggesting a pathway toward broadly accessible AI development with minimal human intervention.</p></details> |  |\n| **[Frame forecasting in cine MRI using the PCA respiratory motion model: comparing recurrent neural networks trained online and transformers](https://arxiv.org/abs/2410.05882v3)** | 2026-04-15 | <details><summary>Show</summary><p>Respiratory motion complicates accurate irradiation of thoraco-abdominal tumors during radiotherapy, as treatment-system latency entails target-location uncertainties. This work addresses frame forecasting in chest and liver cine MRI to compensate for such delays. We investigate RNNs trained with online learning algorithms, enabling adaptation to changing respiratory patterns via on-the-fly parameter updates, and transformers, increasingly common in time-series forecasting for their ability to capture long-term dependencies. Experiments used 12 sagittal thoracic and upper-abdominal cine-MRI sequences from ETH Zürich and OvGU; the OvGU data exhibited higher motion variability, noise, and lower contrast. PCA decomposes the Lucas-Kanade optical-flow field into static deformation modes and low-dimensional, time-dependent weights. We compare various methods for forecasting these weights: linear filters, population and sequence-specific transformer encoders, and RNNs trained with real-time recurrent learning (RTRL), unbiased online recurrent optimization, decoupled neural interfaces, and sparse one-step approximation (SnAp-1). Predicted displacements were used to warp the reference frame and generate future images. Prediction accuracy decreased with the horizon h. Linear regression performed best at short horizons (1.3mm geometrical error at h=0.32s, ETH Zürich dataset), while RTRL and SnAp-1 outperformed the other algorithms at medium-to-long horizons, with geometrical errors below 1.4mm and 2.8mm on the sequences from ETH Zürich and OvGU, respectively. The sequence-specific transformer was competitive for low-to-medium horizons, but transformers remained overall limited by data scarcity and domain shift between datasets. Predicted frames visually resembled the ground truth, with notable errors occurring near the diaphragm at end-inspiration and regions affected by out-of-plane motion.</p></details> | <details><summary>43 pa...</summary><p>43 pages, 19 figures. Revised version with minor corrections and improved figures and language. Accepted for publication in Computerized Medical Imaging and Graphics</p></details> |\n| **[Unsupervised Anomaly Detection in Process-Complex Industrial Time Series: A Real-World Case Study](https://arxiv.org/abs/2604.13928v1)** | 2026-04-15 | <details><summary>Show</summary><p>Industrial time-series data from real production environments exhibits substantially higher complexity than commonly used benchmark datasets, primarily due to heterogeneous, multi-stage operational processes. As a result, anomaly detection methods validated under simplified conditions often fail to generalize to industrial settings. This work presents an empirical study on a unique dataset collected from fully operational industrial machinery, explicitly capturing pronounced process-induced variability. We evaluate which model classes are capable of capturing this complexity, starting with a classical Isolation Forest baseline and extending to multiple autoencoder architectures. Experimental results show that Isolation Forest is insufficient for modeling the non-periodic, multi-scale dynamics present in the data, whereas autoencoders consistently perform better. Among them, temporal convolutional autoencoders achieve the most robust performance, while recurrent and variational variants require more careful tuning.</p></details> |  |\n| **[ASTER: Latent Pseudo-Anomaly Generation for Unsupervised Time-Series Anomaly Detection](https://arxiv.org/abs/2604.13924v1)** | 2026-04-15 | <details><summary>Show</summary><p>Time-series anomaly detection (TSAD) is critical in domains such as industrial monitoring, healthcare, and cybersecurity, but it remains challenging due to rare and heterogeneous anomalies and the scarcity of labelled data. This scarcity makes unsupervised approaches predominant, yet existing methods often rely on reconstruction or forecasting, which struggle with complex data, or on embedding-based approaches that require domain-specific anomaly synthesis and fixed distance metrics. We propose ASTER, a framework that generates pseudo-anomalies directly in the latent space, avoiding handcrafted anomaly injections and the need for domain expertise. A latent-space decoder produces tailored pseudo-anomalies to train a Transformer-based anomaly classifier, while a pre-trained LLM enriches the temporal and contextual representations of this space. Experiments on three benchmark datasets show that ASTER achieves state-of-the-art performance and sets a new standard for LLM-based TSAD.</p></details> |  |\n| **[Forecasting Multivariate Time Series under Predictive Heterogeneity: A Validation-Driven Clustering Framework](https://arxiv.org/abs/2604.13748v1)** | 2026-04-15 | <details><summary>Show</summary><p>We study adaptive pooling under predictive heterogeneity in high-dimensional multivariate time series forecasting, where global models improve statistical efficiency but may fail to capture heterogeneous predictive structure, while naive specialization can induce negative transfer. We formulate adaptive pooling as a statistical decision problem and propose a validation-driven framework that determines when and how specialization should be applied. Rather than grouping series based on representation similarity, we define partitions through out-of-sample predictive performance, thereby aligning data organization with predictive risk, defined as expected out-of-sample loss and approximated via validation error. Cluster assignments are iteratively updated using validation losses for both point (Huber) and probabilistic (pinball) forecasting, improving robustness to heavy-tailed errors and local anomalies. To ensure reliability, we introduce a leakage-free fallback mechanism that reverts to a global model whenever specialization fails to improve validation performance, providing a safeguard against performance degradation under a strict training-validation-test protocol. Experiments on large-scale traffic datasets demonstrate consistent improvements over strong baselines while avoiding degradation when heterogeneity is weak. Overall, the proposed framework provides a principled and practically reliable approach to adaptive pooling in high-dimensional forecasting problems.</p></details> |  |\n| **[Precision spectral estimation at sub-Hz frequencies: closed-form posteriors and Bayesian noise projection](https://arxiv.org/abs/2507.20846v2)** | 2026-04-15 | <details><summary>Show</summary><p>We consider the problem of estimating cross-spectral quantities in the low-frequency regime, where long observation times limit averaging over large ensembles of periodograms, thereby preventing the use of approximate Gaussian statistics. This case is relevant for precision low-frequency gravitational experiments such as LISA and LISA Pathfinder. We present a Bayesian method for estimating spectral quantities in multivariate Gaussian time series. The approach, based on periodograms and Wishart statistics, yields closed-form expressions at any given frequency for the marginal posterior distributions of the individual power spectral densities, the pairwise coherence, and the multiple coherence, as well as for the joint posterior distribution of the full cross-spectral density matrix. In the context of noise projection -- where one series is modeled as a linear combination of filtered versions of the others, plus a background component -- the method also provides closed-form posteriors for both the susceptibilities, i.e., the filter transfer functions, and the power spectral density of the background. We apply the method to data from the LISA Pathfinder mission, showing effective decorrelation of temperature-induced acceleration noise and reliable estimation of its coupling coefficient.</p></details> | <details><summary>This ...</summary><p>This work has been submitted for possible publication</p></details> |\n| **[Fractional lower-order covariance-based measures for cyclostationary time series with heavy-tailed distributions: application to dependence testing and model order identification](https://arxiv.org/abs/2604.13689v1)** | 2026-04-15 | <details><summary>Show</summary><p>This article introduces new methods for the analysis of cyclostationary time series with infinite variance. Traditional cyclostationary analysis, based on periodically correlated (PC) processes, relies on the autocovariance function (ACVF). However, the ACVF is not suitable for data exhibiting a heavy-tailed distribution, particularly with infinite variance. Thus, we propose a novel framework for the analysis of cyclostationary time series with heavy-tailed distribution, utilizing the fractional lower-order covariance (FLOC) as an alternative to covariance. This leads to the introduction of two new autodependence measures: the periodic fractional lower-order autocorrelation function (peFLOACF) and the periodic fractional lower-order partial autocorrelation function (peFLOPACF). These measures generalize the classical periodic autocorrelation function (peACF) and periodic partial autocorrelation function (pePACF), offering robust tools for analyzing infinite-variance processes. Two practical applications of the proposed measures are explored: a portmanteau test for testing dependence in cyclostationary series and a method for order identification in periodic autoregressive (PAR) and periodic moving average (PMA) models with infinite variance. Both applications demonstrate the potential of new tools, with simulations validating their efficiency. The methodology is further illustrated through the analysis of real-world air pollution data, which showcases its practical utility. The results indicate that the proposed measures based on FLOC provide reliable and efficient techniques for analyzing cyclostationary processes with heavy-tailed distributions.</p></details> | 26 pages, 17 figures |\n| **[Irregularly Sampled Time Series Interpolation for Binary Evolution Simulations Using Dynamic Time Warping](https://arxiv.org/abs/2604.13604v1)** | 2026-04-15 | <details><summary>Show</summary><p>Binary stellar evolution simulations are computationally expensive. Stellar population synthesis relies on these detailed evolution models at a fundamental level. Producing thousands of such models requires hundreds of CPU hours, but stellar track interpolation provides one approach to significantly reduce this computational cost. Although single-star track interpolation is straightforward, stellar interactions in binary systems introduce significant complexity to binary evolution, making traditional single-track interpolation methods inapplicable. Binary tracks present fundamentally different challenges compared to single stars, which possess relatively straightforward evolutionary phases identifiable through distinct physical properties. Binary systems are complicated by mutual interactions that can dramatically alter evolutionary trajectories and introduce discontinuities difficult to capture through standard interpolation. In this work, we introduce a novel approach for track alignment and iterative track averaging based on Dynamic Time Warping to address misalignments between neighboring tracks. Our method computes a single shared warping path across all physical parameters simultaneously, placing them on a consistent temporal grid that preserves the causal relationships between parameters. We demonstrate that this joint-alignment strategy maintains key physical relationships such as the Stefan-Boltzmann law in the interpolated tracks. Our comprehensive evaluation across multiple binary configurations demonstrates that proper temporal alignment is crucial for track interpolation methods. The proposed method consistently outperforms existing approaches and enables the efficient generation of more accurate binary population samples for astrophysical studies.</p></details> | <details><summary>25 pa...</summary><p>25 pages, 11 figures. Submitted to ApJ</p></details> |\n| **[Minimax Optimality and Spectral Routing for Majority-Vote Ensembles under Markov Dependence](https://arxiv.org/abs/2604.13414v1)** | 2026-04-15 | <details><summary>Show</summary><p>Majority-vote ensembles achieve variance reduction by averaging over diverse, approximately independent base learners. When training data exhibits Markov dependence, as in time-series forecasting, reinforcement learning (RL) replay buffers, and spatial grids, this classical guarantee degrades in ways that existing theory does not fully quantify. We provide a minimax characterization of this phenomenon for discrete classification in a fixed-dimensional Markov setting, together with an adaptive algorithm that matches the rate on a graph-regular subclass. We first establish an information-theoretic lower bound for stationary, reversible, geometrically ergodic chains in fixed ambient dimension, showing that no measurable estimator can achieve excess classification risk better than $Ω(\\sqrt{\\Tmix/n})$. We then prove that, on the AR(1) witness subclass underlying the lower-bound construction, dependence-agnostic uniform bagging is provably suboptimal with excess risk bounded below by $Ω(\\Tmix/\\sqrt{n})$, exhibiting a $\\sqrt{\\Tmix}$ algorithmic gap. Finally, we propose \\emph{adaptive spectral routing}, which partitions the training data via the empirical Fiedler eigenvector of a dependency graph and achieves the minimax rate $\\mathcal{O}(\\sqrt{\\Tmix/n})$ up to a lower-order geometric cut term on a graph-regular subclass, without knowledge of $\\Tmix$. Experiments on synthetic Markov chains, 2D spatial grids, the 128-dataset UCR archive, and Atari DQN ensembles validate the theoretical predictions. Consequences for deep RL target variance, scalability via Nyström approximation, and bounded non-stationarity are developed as supporting material in the appendix.</p></details> |  |\n| **[Bias-Corrected Adaptive Conformal Inference for Multi-Horizon Time Series Forecasting](https://arxiv.org/abs/2604.13253v1)** | 2026-04-14 | <details><summary>Show</summary><p>Adaptive Conformal Inference (ACI) provides distribution-free prediction intervals with asymptotic coverage guarantees for time series under distribution shift. However, ACI only adapts the quantile threshold -- it cannot shift the interval center. When a base forecaster develops persistent bias after a regime change, ACI compensates by widening intervals symmetrically, producing unnecessarily conservative bands. We propose Bias-Corrected ACI (BC-ACI), which augments standard ACI with an online exponentially weighted moving average (EWM) estimate of forecast bias. BC-ACI corrects nonconformity scores before quantile computation and re-centers prediction intervals, addressing the root cause of miscalibration rather than its symptom. An adaptive dead-zone threshold suppresses corrections when estimated bias is indistinguishable from noise, ensuring no degradation on well-calibrated data. In controlled experiments across 688 runs spanning two base models, four synthetic regimes, and three real datasets, BC-ACI reduces Winkler interval scores by 13--17% under mean and compound distribution shifts (Wilcoxon p < 0.001) while maintaining equivalent performance on stationary data (ratio 1.002x). We provide finite-sample analysis showing that coverage guarantees degrade gracefully with bias estimation error.</p></details> | <details><summary>14 pa...</summary><p>14 pages, 3 figures, 2 tables. Preprint</p></details> |\n| **[Predicting Time Pressure of Powered Two-Wheeler Riders for Proactive Safety Interventions](https://arxiv.org/abs/2601.03173v2)** | 2026-04-14 | <details><summary>Show</summary><p>Time pressure critically influences risky maneuvers and crash proneness among powered two-wheeler riders, yet its prediction remains underexplored in intelligent transportation systems. We present a large-scale dataset of 129,000+ labeled multivariate time-series sequences from 153 rides by 51 participants under No, Low, and High Time Pressure conditions. Each sequence captures 63 features spanning vehicle kinematics, control inputs, behavioral violations, and environmental context. Our empirical analysis shows High Time Pressure induces 48% higher speeds, 36.4% greater speed variability, 58% more risky turns at intersections, 36% more sudden braking, and 50% higher rear brake forces versus No Time Pressure. To benchmark this dataset, we propose MotoTimePressure, a deep learning model combining convolutional preprocessing, dual-stage temporal attention, and Squeeze-and-Excitation feature recalibration, achieving 91.53% accuracy and 98.93% ROC AUC, outperforming eight baselines, with only 172K parameters, 2.16 MB model size, and 0.04 ms inference on CPU. Since time pressure cannot be directly measured in real time, we demonstrate its utility in collision prediction and threshold determination. Using MTPS-predicted time pressure as a feature improves collision risk accuracy for both Informer (91.25% to 93.51%) and TimesNet (92.10% to 93.90%), approaching oracle performance (93.72% and 94.06%, respectively). Thresholded time pressure states capture rider cognitive stress and enable proactive ITS interventions, including adaptive alerts, haptic feedback, V2I signaling, and speed guidance, supporting safer two-wheeler mobility under the Safe System Approach.</p></details> | 13 pages |\n| **[Invariant Features for Global Crop Type Classification](https://arxiv.org/abs/2509.03497v3)** | 2026-04-14 | <details><summary>Show</summary><p>Accurate global crop type mapping supports agricultural monitoring and food security, yet remains limited by the scarcity of labeled data in many regions. A key challenge is enabling models trained in one geography to generalize reliably to others despite shifts in climate, phenology, and spectral characteristics. In this work, we show that geographic transfer in crop classification is primarily governed by the ability to learn invariant structure in multispectral time series. To systematically study this, we introduce CropGlobe, a globally distributed benchmark dataset of 300,000 samples spanning eight countries and five continents, and define progressively harder transfer settings from cross-country to cross-hemisphere. Across all settings, we find that simple spectral-temporal representations outperform both handcrafted features and modern geospatial foundation model embeddings. We propose CropNet, a lightweight convolutional architecture that jointly convolves across spectral and temporal dimensions to learn invariant crop signatures. Despite its simplicity, CropNet consistently outperforms larger transformer-based and foundation-model approaches under geographic domain shift. To further improve robustness to geographic variation, we introduce augmentations that simulate shifts in crop phenology and reflectance. Combined with CropNet, this yields substantial gains under large domain shifts. Our results demonstrate that inductive bias toward joint spectral-temporal structure is more critical for transfer than model scale or pretraining, pointing toward a scalable and data-efficient paradigm for worldwide agricultural mapping. Data and code are available at https://github.com/x-ytong/CropNet/.</p></details> |  |\n| **[A Shiny micromapST App](https://arxiv.org/abs/2604.12773v1)** | 2026-04-14 | <details><summary>Show</summary><p>The linked micromaps approach was originally developed as an improvement to choropleth maps for displaying statistical summaries connected with spatial areal units, such as countries, states, and counties. Two R packages to create linked micromaps were published in 2015. These are the micromap and micromapST packages. The latter was originally for data indexed to the 50 US states and DC, but the latest version accommodates arbitrary geographies. The micromapST package handles the formatting needed for linked micromaps and offers several options for statistical displays (scatterplots, boxplots, time series plots, and more). The micromapST package is very useful and takes care of most details of the layouts, but it can be problematic specifying the data frames needed to create the desired graphic. Furthermore, exploring data through visualization is easier, faster, and more intuitive using a graphical user interface. This is the motivation behind the R Shiny micromapST app. This paper will serve as a brief tutorial and introduction to micromapST and the Shiny app using real-world data and applications. In this paper, we provide background information on visualizing geographically indexed data and linked micromaps in Section 1. Section 2 discusses the data sets used in two illustrative examples. Sections 3 and 4 describe the application interface and show how it can create linked micromaps. The paper concludes with comments and future work.</p></details> |  |\n| **[Automated Batch Distillation Process Simulation for a Large Hybrid Dataset for Deep Anomaly Detection](https://arxiv.org/abs/2604.09166v2)** | 2026-04-14 | <details><summary>Show</summary><p>Anomaly detection (AD) in chemical processes based on deep learning offers significant opportunities but requires large, diverse, and well-annotated training datasets that are rarely available from industrial operations. In a recent work, we introduced a large, fully annotated experimental dataset for batch distillation under normal and anomalous operating conditions. In the present study, we augment this dataset with a corresponding simulation dataset, creating a novel hybrid dataset. The simulation data is generated in an automated workflow with a novel Python-based process simulator that employs a tailored index-reduction strategy for the underlying differential-algebraic equations. Leveraging the rich metadata and structured anomaly annotations of the experimental database, experimental records are automatically translated into simulation scenarios. After calibration to a single reference experiment, the dynamics of the other experiments are well predicted. This enabled the fully automated, consistent generation of time-series data for a large number of experimental runs, covering both normal operation and a wide range of actuator- and control-related anomalies. The resulting hybrid dataset is released openly. From a process simulation perspective, this work demonstrates the automated, consistent simulation of large-scale experimental campaigns, using batch distillation as an example. From a data-driven AD perspective, the hybrid dataset provides a unique basis for simulation-to-experiment style transfer, the generation of pseudo-experimental data, and future research on deep AD methods in chemical process monitoring.</p></details> |  |\n| **[Do VLMs Truly \"Read\" Candlesticks? A Multi-Scale Benchmark for Visual Stock Price Forecasting](https://arxiv.org/abs/2604.12659v1)** | 2026-04-14 | <details><summary>Show</summary><p>Vision-language models(VLMs) are increasingly applied to visual stock price forecasting, yet existing benchmarks inadequately evaluate their understanding of stock price in candlestick charts. First, prior studies fail to isolate VLMs' comprehension of visual inputs genuinely improves predictive performance and whether VLMs truly comprehend candlestick patterns. Further, most existing datasets and evaluation setups are designed around single-period or tabular inputs. However, human analysts strongly rely on multi-scale candlestick charts, where longer-term horizons capture trend direction and shorter-term horizons provide cues for inflection points, making it difficult to systematically assess VLMs' ability to integrate short-term and long-term visual market dynamics. To bridge this gap, we construct a multi-scale candlestick charts dataset and a standardized evaluation framework to assess VLMs' ability to utilize multi-scale visual market signals. Evaluation combines confusion-matrix-based diagnostics with information coefficient(IC) time series metrics and includes XGBoost as a feature-based temporal baseline. Using this dataset, we benchmark representative VLMs and analyze their ability to leverage multi-scale stock price data. Experimental results show that most VLMs perform well only under persistent uptrend or downtrend conditions, while exhibiting weak predictive capability in more common market scenarios. We also identify significant prediction biases and limited sensitivity to explicitly specified forecast horizons in prompts, indicating inherent limitations in precise temporal reasoning.</p></details> | <details><summary>We ev...</summary><p>We evaluate whether VLMs can comprehend multi-scale visual stock price data like human analysts with a proposed benchmark, identifying current VLMs' weak predictive power, significant biases, and limited sensitivity to forecast horizons and prompts</p></details> |\n| **[TimeSAF: Towards LLM-Guided Semantic Asynchronous Fusion for Time Series Forecasting](https://arxiv.org/abs/2604.12648v1)** | 2026-04-14 | <details><summary>Show</summary><p>Despite the recent success of large language models (LLMs) in time-series forecasting, most existing methods still adopt a Deep Synchronous Fusion strategy, where dense interactions between textual and temporal features are enforced at every layer of the network. This design overlooks the inherent granularity mismatch between modalities and leads to what we term semantic perceptual dissonance: high-level abstract semantics provided by the LLM become inappropriately entangled with the low-level, fine-grained numerical dynamics of time series, making it difficult for semantic priors to effectively guide forecasting. To address this issue, we propose TimeSAF, a new framework based on hierarchical asynchronous fusion. Unlike synchronous approaches, TimeSAF explicitly decouples unimodal feature learning from cross-modal interaction. It introduces an independent cross-modal semantic fusion trunk, which uses learnable queries to aggregate global semantics from the temporal and prompt backbones in a bottom-up manner, and a stage-wise semantic refinement decoder that asynchronously injects these high-level signals back into the temporal backbone. This mechanism provides stable and efficient semantic guidance while avoiding interference with low-level temporal dynamics. Extensive experiments on standard long-term forecasting benchmarks show that TimeSAF significantly outperforms state-of-the-art baselines, and further exhibits strong generalization in both few-shot and zero-shot transfer settings.</p></details> |  |\n| **[Fun-TSG: A Function-Driven Multivariate Time Series Generator with Variable-Level Anomaly Labeling](https://arxiv.org/abs/2604.14221v1)** | 2026-04-14 | <details><summary>Show</summary><p>Reliable evaluation of anomaly detection methods in multivariate time series remains an open challenge, largely due to the limitations of existing benchmark datasets. Current resources often lack fine-grained anomaly annotations, do not provide explicit intervariable and temporal dependencies, and offer little insight into the underlying generative mechanisms. These shortcomings hinder the development and rigorous comparison of detection models, especially those targeting interpretable and variable-specific outputs. To address this gap, we introduce Fun-TSG, a fully customizable time series generator designed to support high-quality evaluation of anomaly detection systems. Our tool enables both fully automated generation, based on randomly sampled dependency structures and anomaly types, and manual generation through user-defined equations and anomaly configurations. In both cases, it provides full transparency over the data generation process, including access to ground-truth anomaly labels at the variable and timestamp levels. Fun-TSG supports the creation of diverse, interpretable, and reproducible benchmarking scenarios, enabling fine-grained performance analysis for both classical and modern anomaly detection models.</p></details> |  |\n| **[GTCN-G: A Residual Graph-Temporal Fusion Network for Imbalanced Intrusion Detection](https://arxiv.org/abs/2510.07285v3)** | 2026-04-14 | <details><summary>Show</summary><p>The escalating complexity of network threats and the inherent class imbalance in traffic data present formidable challenges for modern Intrusion Detection Systems (IDS). While Graph Neural Networks (GNNs) excel in modeling topological structures and Temporal Convolutional Networks (TCNs) are proficient in capturing time-series dependencies, a framework that synergistically integrates both while explicitly addressing data imbalance remains an open challenge. This paper introduces a novel deep learning framework, named Gated Temporal Convolutional Network and Graph (GTCN-G), engineered to overcome these limitations. Our model uniquely fuses a Gated TCN (G-TCN) for extracting hierarchical temporal features from network flows with a Graph Convolutional Network (GCN) designed to learn from the underlying graph structure. The core innovation lies in the integration of a residual learning mechanism, implemented via a Graph Attention Network (GAT). This mechanism preserves original feature information through residual connections, which is critical for mitigating the class imbalance problem and enhancing detection sensitivity for rare malicious activities (minority classes). We conducted extensive experiments on two public benchmark datasets, UNSW-NB15 and ToN-IoT, to validate our approach. The empirical results demonstrate that the proposed GTCN-G model achieves state-of-the-art performance, significantly outperforming existing baseline models in both binary and multi-class classification tasks.</p></details> | <details><summary>This ...</summary><p>This preprint was submitted to IEEE TrustCom 2025. The accepted version will be published under copyright 2025 IEEE</p></details> |\n| **[Self-Normalization for CUSUM-based Change Detection in Locally Stationary Time Series](https://arxiv.org/abs/2509.07112v3)** | 2026-04-14 | <details><summary>Show</summary><p>A new bivariate partial sum process for locally stationary time series is introduced and its weak convergence to a Brownian sheet is established. This construction enables the development of a novel self-normalized CUSUM test statistic for detecting changes in the mean of a locally stationary time series. For stationary data, self-normalization relies on the factorization of a constant long-run variance and a stochastic factor. In this case, the CUSUM statistic can be divided by another statistic proportional to the long-run variance, so that the latter cancels, avoiding estimation of the long-run variance. Under local stationarity, the partial sum process converges to $\\int_0^t σ(x) d B_x$ and no such factorization is possible. To overcome this obstacle, a bivariate partial-sum process is introduced, allowing the construction of self-normalized test statistics under local stationarity. Weak convergence of the process is proven, and it is shown that the resulting self-normalized tests attain asymptotic level $α$ under the null hypothesis of no change, while being consistent against abrupt, gradual, and multiple changes under mild assumptions. Simulation studies show that the proposed tests have accurate size and substantially improved finite-sample power relative to existing approaches. Two data examples illustrate practical performance.</p></details> | <details><summary>Keywo...</summary><p>Keywords: Change point analysis, gradual changes, local stationarity, self-normalization, CUSUM test</p></details> |\n| **[Predicting Energy Demand with Tensor Factor Models](https://arxiv.org/abs/2502.06213v2)** | 2026-04-14 | <details><summary>Show</summary><p>Hourly consumption from multiple providers displays pronounced intra-day, intra-week, and annual seasonalities, as well as strong cross-sectional correlations. We introduce a novel approach for forecasting high-dimensional U.S. electricity demand data by accounting for multiple seasonal patterns via tensor factor models. To this end, we restructure the hourly electricity demand data into a sequence of weekly tensors. Each weekly tensor is a three-mode array whose dimensions correspond to the hours of the day, the days of the week, and the number of providers. This multi-dimensional representation enables a factor decomposition that distinguishes among the various seasonal patterns along each mode: factor loadings over the hour dimension highlight intra-day cycles, factor loadings over the day dimension capture differences across weekdays and weekends, and factor loadings over the provider dimension reveal commonalities and shared dynamics among the different entities. We rigorously compare the predictive performance of our tensor factor model against several benchmarks, including traditional vector factor models and cutting-edge functional time series methods. The results consistently demonstrate that the tensor-based approach delivers superior forecasting accuracy at different horizons and provides interpretable factors that align with domain knowledge. Beyond its empirical advantages, our framework offers a systematic way to gain insight into the underlying processes that shape electricity demand patterns. In doing so, it paves the way for more nuanced, data-driven decision-making and can be adapted to address similar challenges in other high-dimensional time series applications.</p></details> |  |\n| **[Mortality Forecasting as a Flow Field in Tucker Decomposition Space](https://arxiv.org/abs/2603.24299v2)** | 2026-04-13 | <details><summary>Show</summary><p>Mortality forecasting methods in the Lee-Carter tradition extrapolate temporal components via time-series models, often producing forecasts that systematically underpredict life expectancy at long horizons. This bias is consequential for planning pension funding, healthcare capacity, and social security solvency. The dominant alternative - the Bayesian double-logistic model underlying the UN World Population Prospects - forecasts scalar life expectancy and requires a separate model life table system to recover age-specific rates. We reframe forecasting as integrating a flow field through the low-dimensional score space of a Tucker tensor decomposition of the Human Mortality Database. PCA reduction reveals that the mortality transition is essentially a one-dimensional flow: a scalar speed function advances the level, trajectory functions supply the structural scores, and the Tucker reconstruction produces complete sex-specific, single-year-of-age mortality schedules at each horizon. In leave-country-out cross-validation (9,507 test points, 50-year horizon), the flow-field achieves bias of +1.058 years - substantially smaller than Lee-Carter (-3.2), Hyndman-Ullah (-3.5), and pyBayesLife (+3.3) - because it navigates a score space parameterised by mortality level rather than extrapolating temporal trends into unobserved territory. On 1.66 million sex-age-specific test points, it achieves 2.7x lower error than our de novo Python reimplementation of the UN pipeline trained on the same data - with lower error at every age, every forecast horizon, and for both sexes.</p></details> |  |\n| **[A unified data format for managing diabetes time-series data: DIAbetes eXchange (DIAX)](https://arxiv.org/abs/2604.11944v1)** | 2026-04-13 | <details><summary>Show</summary><p>Diabetes devices, including Continuous Glucose Monitoring (CGM), Smart Insulin Pens, and Automated Insulin Delivery systems, generate rich time-series data widely used in research and machine learning. However, inconsistent data formats across sources hinder sharing, integration, and analysis. We present DIAX (DIAbetes eXchange), a standardized JSON-based format for unifying diabetes time-series data, including CGM, insulin, and meal signals. DIAX promotes interoperability, reproducibility, and extensibility, particularly for machine learning applications. An open-source repository provides tools for dataset conversion, cross-format compatibility, visualization, and community contributions. DIAX is a translational resource, not a data host, ensuring flexibility without imposing data-sharing constraints. Currently, DIAX is compatible with other standardization efforts and supports major datasets (DCLP3, DCLP5, IOBP2, PEDAP, T1Dexi, Loop), totaling over 10 million patient-hours of data. https://github.com/Center-for-Diabetes-Technology/DIAX</p></details> | 7 pages, 2 figures |\n| **[INTARG: Informed Real-Time Adversarial Attack Generation for Time-Series Regression](https://arxiv.org/abs/2604.11928v1)** | 2026-04-13 | <details><summary>Show</summary><p>Time-series forecasting aims to predict future values by modeling temporal dependencies in historical observations. It is a critical component of many real-world systems, where accurate forecasts improve operational efficiency and help mitigate uncertainty and risk. More recently, machine learning (ML), and especially deep learning (DL)-based models, have gained widespread adoption for time-series forecasting, but they remain vulnerable to adversarial attacks. However, many state-of-the-art attack methods are not directly applicable in time-series settings, where storing complete historical data or performing attacks at every time step is often impractical. This paper proposes an adversarial attack framework for time-series forecasting under an online bounded-buffer setting, leveraging an informed and selective attack strategy. By selectively targeting time steps where the model exhibits high confidence and the expected prediction error is maximal, our framework produces fewer but substantially more effective attacks. Experiments show that our framework can increase the prediction error up to 2.42x, while performing attacks in fewer than 10% of time steps.</p></details> |  |\n| **[SmellNet: A Large-scale Dataset for Real-world Smell Recognition](https://arxiv.org/abs/2506.00239v5)** | 2026-04-13 | <details><summary>Show</summary><p>The ability of AI to sense and identify various substances based on their smell alone can have profound impacts on allergen detection (e.g. smelling gluten or peanuts in a cake), monitoring the manufacturing process, and sensing hormones that indicate emotional states, stress levels, and diseases. Despite these broad impacts, there are few standardized datasets, and therefore little progress, for training and evaluating AI systems' ability to `smell' in the real-world. In this paper, we use small gas and chemical sensors to create SmellNet, a comparatively large dataset for sensor-based machine olfaction that digitizes a diverse range of smells in the natural world. SmellNet contains about 828,000 time-series data points across 50 substances, spanning nuts, spices, herbs, fruits, and vegetables, and 43 mixtures among them with fixed ingredient volumetric ratios, with 68 hours of data collected. Using SmellNet, we developed ScentFormer, a Transformer-based architecture combining temporal differencing and sliding-window augmentation for smell data. For the SmellNet-Base classification tasks, ScentFormer achieves 63.3% Top-1 accuracy with GC-MS supervision, and for the SmellNet-Mixture distribution prediction tasks, ScentFormer achieves 50.2% Top-1@0.1 on the test-seen split. ScentFormer's ability to generalize across conditions and capture transient chemical dynamics demonstrates the promise of temporal modeling in sensor-based olfactory AI. SmellNet and ScentFormer lay the groundwork for sensor-based olfactory applications across healthcare, food and beverage, environmental monitoring, manufacturing, and entertainment.</p></details> | <details><summary>Accep...</summary><p>Accepted to ICLR 2026; published as a conference paper at ICLR 2026. 32 pages; 21 figures</p></details> |\n| **[MSTN: A Lightweight and Fast Model for General TimeSeries Analysis](https://arxiv.org/abs/2511.20577v3)** | 2026-04-13 | <details><summary>Show</summary><p>Real-world time series often exhibit strong non-stationarity, complex nonlinear dynamics, and behavior expressed across multiple temporal scales, from rapid local fluctuations to slow-evolving long-range trends. However, many contemporary architectures impose rigid, fixed-scale structural priors -- such as patch-based tokenization, predefined receptive fields, or frozen backbone encoders -- which can over-regularize temporal dynamics and limit adaptability to abrupt high-magnitude events. To handle this, we introduce the Multi-scale Temporal Network (MSTN), a hybrid neural architecture grounded in an Early Temporal Aggregation principle. MSTN integrates three complementary components: (i) a multi-scale convolutional encoder that captures fine-grained local structure; (ii) a sequence modeling module that learns long-range dependencies through either recurrent or attention-based mechanisms; and (iii) a self-gated fusion stage incorporating squeeze-excitation and a single dense layer to dynamically reweight and fuse multi-scale representations. This design enables MSTN to flexibly model temporal patterns spanning milliseconds to extended horizons, while avoiding the computational burden typically associated with long-context models. Across extensive benchmarks covering imputation, long term forecasting, short term forecasting, classification, and cross-dataset generalization, MSTN achieves state-of-the-art performance, establishing new best results on 33 of 40 datasets, while remaining lightweight ($\\sim$278,520 params for MSTN-BiLSTM and $\\sim$950,776 $\\approx$ 1M for MSTN-Transformer) and suitable for low-latency inference ($<$1 sec, often in milliseconds), resource-constrained deployment.</p></details> | 34 pages |\n| **[SLALOM: Simulation Lifecycle Analysis via Longitudinal Observation Metrics for Social Simulation](https://arxiv.org/abs/2604.11466v1)** | 2026-04-13 | <details><summary>Show</summary><p>Large Language Model (LLM) agents offer a potentially-transformative path forward for generative social science but face a critical crisis of validity. Current simulation evaluation methodologies suffer from the \"stopped clock\" problem: they confirm that a simulation reached the correct final outcome while ignoring whether the trajectory leading to it was sociologically plausible. Because the internal reasoning of LLMs is opaque, verifying the \"black box\" of social mechanisms remains a persistent challenge. In this paper, we introduce SLALOM (Simulation Lifecycle Analysis via Longitudinal Observation Metrics), a framework that shifts validation from outcome verification to process fidelity. Drawing on Pattern-Oriented Modeling (POM), SLALOM treats social phenomena as multivariate time series that must traverse specific SLALOM gates, or intermediate waypoint constraints representing distinct phases. By utilizing Dynamic Time Warping (DTW) to align simulated trajectories with empirical ground truth, SLALOM offers a quantitative metric to assess structural realism, helping to differentiate plausible social dynamics from stochastic noise and contributing to more robust policy simulation standards.</p></details> | <details><summary>CHI 2...</summary><p>CHI 2026 PoliSim@CHI 2026: LLM Agent Simulation for Policy Workshop</p></details> |\n| **[Detection and Mode-Identification of Multiple Change Points in Tensor Factor Models](https://arxiv.org/abs/2604.11300v1)** | 2026-04-13 | <details><summary>Show</summary><p>We study the problems arising from modeling high-dimensional tensor-valued time series under a Tucker decomposition-based factor model with multiple structural change points. First, we propose an algorithm for detecting the multiple change points, which utilizes the low-rank structure of the data for statistical and computational efficiency. Also, the multi-dimensional array setting poses unique challenges, as some changes are associated with a subset of the modes, and the changes in different modes may interact with one another. Recognizing these, we investigate the problem of identifying each change with the tensor modes post-segmentation. To this end, we formalize the mode-identifiability of each change and propose an algorithm for detecting the modes at which the data are undergoing a mode-identifiable shift. We establish the consistency of both change point detection and mode-identification methods under a weak moment condition, and demonstrate their good performance on simulated datasets where, in particular, it is shown that the mode-identification step can improve the post-segmentation estimation of the mode-wise loading space. Additionally we analyze the datasets on New York City taxi usage and Fama--French portfolio returns using the proposed suite of methods.</p></details> | 165 pages |\n| **[Detection of Anomalous Network Nodes via Hierarchical Prediction and Extreme Value Theory](https://arxiv.org/abs/2304.13941v2)** | 2026-04-13 | <details><summary>Show</summary><p>Continuously evolving cyber-attacks against industrial networks reduce the effectiveness of signature-based detection methods. Once malware has infiltrated a network (for example, entering via an unsecured device), it can infect further network nodes and carry out malicious activity. Infected nodes can exhibit unusual behaviour in their use of Address Resolution Protocol (ARP) calls within the network. In order to detect such anomalous nodes, we propose a two-stage method: (i) modelling of ARP call behaviour via hierarchical time series prediction methods, and (ii) exploiting Extreme Value Theory (EVT) to robustly detect whether deviations from expected behaviour are anomalous. EVT is able to handle heavy-tailed distributions which are exhibited by internet traffic. Empirical evaluations on a real-life dataset containing over 10M ARP calls from 362 nodes show that the proposed method results in considerably reduced number of false positives, addressing the problem of alert fatigue commonly reported by security professionals.</p></details> |  |\n| **[TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis](https://arxiv.org/abs/2506.20380v7)** | 2026-04-12 | <details><summary>Show</summary><p>Satellite Earth-observation (EO) time series in the optical and microwave ranges of the electromagnetic spectrum are often irregular due to orbital patterns and cloud obstruction. Compositing addresses these issues but loses information with respect to vegetation phenology, which is critical for many downstream tasks. Instead, we present TESSERA, a pixel-wise foundation model for multi-modal (Sentinel-1/2) EO time series that learns robust, label-efficient embeddings. During model training, TESSERA uses Barlow Twins and sparse random temporal sampling to enforce invariance to the selection of valid observations. We employ two key regularizers: global shuffling to decorrelate spatial neighborhoods and mix-based regulation to improve invariance under extreme sparsity. We find that for diverse classification, segmentation, and regression tasks, TESSERA embeddings deliver state-of-the-art accuracy with high label efficiency, often requiring only a small task head and minimal computation. To democratize access, adhere to FAIR - principles, and simplify use, we release global, annual, 10m, pixel-wise int8 embeddings together with open weights/code and lightweight adaptation heads, thus providing practical tooling for large-scale retrieval and inference at planetary scale. All code and data are available at: https://github.com/ucam-eo/tessera.</p></details> |  |\n| **[TS-Haystack: A Multi-Scale Retrieval Benchmark for Time Series Language Models](https://arxiv.org/abs/2602.14200v4)** | 2026-04-12 | <details><summary>Show</summary><p>Time Series Language Models (TSLMs) are emerging as unified models for reasoning over continuous signals in natural language. However, long-context retrieval remains a major limitation: existing models are typically trained and evaluated on short sequences, while real-world time-series sensor streams can span millions of datapoints. This mismatch requires precise temporal localization under strict computational constraints, a regime that is not captured by current benchmarks. We introduce TS-Haystack, a long-context temporal retrieval benchmark comprising ten task types across four categories: direct retrieval, temporal reasoning, multi-step reasoning and contextual anomaly. The benchmark uses controlled needle insertion by embedding short activity bouts into longer longitudinal accelerometer recordings, enabling systematic evaluation across context lengths ranging from seconds to 2 hours per sample. We hypothesize that existing TSLM time series encoders overlook temporal granularity as context length increases, creating a task-dependent effect: compression aids classification but impairs retrieval of localized events. Across multiple model and encoding strategies, we observe a consistent divergence between classification and retrieval behavior. Learned latent compression preserves or improves classification accuracy at compression ratios up to 176$\\times$, but retrieval performance degrades with context length, incurring in the loss of temporally localized information. These results highlight the importance of architectural designs that decouple sequence length from computational complexity while preserving temporal fidelity.</p></details> | <details><summary>ICLR ...</summary><p>ICLR TSALM 2026. Benchmark generation code and datasets: https://github.com/AI-X-Labs/TS-Haystack</p></details> |\n| **[DBGL: Decay-aware Bipartite Graph Learning for Irregular Medical Time Series Classification](https://arxiv.org/abs/2604.11842v1)** | 2026-04-12 | <details><summary>Show</summary><p>Irregular Medical Time Series play a critical role in the clinical domain to better understand the patient's condition. However, inherent irregularity arising from heterogeneous sampling rates, asynchronous observations, and variable gaps poses key challenges for reliable modeling. Existing methods often distort temporal sampling irregularity and missingness patterns while failing to capture variable decay irregularity, resulting in suboptimal representations. To address these limitations, we introduce DBGL, Decay-Aware Bipartite Graph Learning for Irregular Medical Time Series. DBGL first introduces a patient-variable bipartite graph that simultaneously captures irregular sampling patterns without artificial alignment and adaptively models variable relationships for temporal sampling irregularity modeling, enhancing representation learning. To model variable decay irregularity, DBGL designs a novel node-specific temporal decay encoding mechanism that captures each variable's decay rates based on sampling interval, yielding a more accurate and faithful representation of irregular temporal dynamics. We evaluate the performance of DBGL on four publicly available datasets, and the results show that DBGL outperforms all baselines.</p></details> |  |\n| **[Non-stationary Diffusion For Probabilistic Time Series Forecasting](https://arxiv.org/abs/2505.04278v3)** | 2026-04-12 | <details><summary>Show</summary><p>Due to the dynamics of underlying physics and external influences, the uncertainty of time series often varies over time. However, existing Denoising Diffusion Probabilistic Models (DDPMs) often fail to capture this non-stationary nature, constrained by their constant variance assumption from the additive noise model (ANM). In this paper, we innovatively utilize the Location-Scale Noise Model (LSNM) to relax the fixed uncertainty assumption of ANM. A diffusion-based probabilistic forecasting framework, termed Non-stationary Diffusion (NsDiff), is designed based on LSNM that is capable of modeling the changing pattern of uncertainty. Specifically, NsDiff combines a denoising diffusion-based conditional generative model with a pre-trained conditional mean and variance estimator, enabling adaptive endpoint distribution modeling. Furthermore, we propose an uncertainty-aware noise schedule, which dynamically adjusts the noise levels to accurately reflect the data uncertainty at each step and integrates the time-varying variances into the diffusion process. Extensive experiments conducted on nine real-world and synthetic datasets demonstrate the superior performance of NsDiff compared to existing approaches. Code is available at https://github.com/wwy155/NsDiff.</p></details> | <details><summary>Accep...</summary><p>Accepted as spotlight poster at ICML</p></details> |\n| **[WaveMoE: A Wavelet-Enhanced Mixture-of-Experts Foundation Model for Time Series Forecasting](https://arxiv.org/abs/2604.10544v1)** | 2026-04-12 | <details><summary>Show</summary><p>Time series foundation models (TSFMs) have recently achieved remarkable success in universal forecasting by leveraging large-scale pretraining on diverse time series data. Complementing this progress, incorporating frequency-domain information yields promising performance in enhancing the modeling of complex temporal patterns, such as periodicity and localized high-frequency dynamics, which are prevalent in real-world time series. To advance this direction, we propose a new perspective that integrates explicit frequency-domain representations into scalable foundation models, and introduce WaveMoE, a wavelet-enhanced mixture-of-experts foundation model for time series forecasting. WaveMoE adopts a dual-path architecture that jointly processes time series tokens and wavelet tokens aligned along a unified temporal axis, and coordinates them through a shared expert routing mechanism that enables consistent expert specialization while efficiently scaling model capacity. Preliminary experimental results on 16 diverse benchmark datasets indicate that WaveMoE has the potential to further improve forecasting performance by incorporating wavelet-domain corpora.</p></details> | <details><summary>Prese...</summary><p>Presented at ICLR 2026 TSALM Workshop (1st Workshop on Time Series in the Age of Large Models)</p></details> |\n| **[Semantic-Enhanced Time-Series Forecasting via Large Language Models](https://arxiv.org/abs/2508.07697v7)** | 2026-04-12 | <details><summary>Show</summary><p>Time series forecasting plays a significant role in finance, energy, meteorology, and IoT applications. Recent studies have leveraged the generalization capabilities of large language models (LLMs) to adapt to time series forecasting, achieving promising performance. However, existing studies focus on token-level modal alignment, instead of bridging the intrinsic modality gap between linguistic knowledge structures and time series data patterns, greatly limiting the semantic representation. To address this issue, we propose a novel Semantic-Enhanced LLM (SE-LLM) that explores the inherent periodicity and anomalous characteristics of time series to embed into the semantic space to enhance the token embedding. This process enhances the interpretability of tokens for LLMs, thereby activating the potential of LLMs for temporal sequence analysis. Moreover, existing Transformer-based LLMs excel at capturing long-range dependencies but are weak at modeling short-term anomalies in time-series data. Hence, we propose a plugin module embedded within self-attention that models long-term and short-term dependencies to effectively adapt LLMs to time-series analysis. Our approach freezes the LLM and reduces the sequence dimensionality of tokens, greatly reducing computational consumption. Experiments demonstrate the superiority performance of our SE-LLM against the state-of-the-art (SOTA) methods.</p></details> | 23 pages,6 figures |\n| **[Transformers for dynamical systems learn transfer operators in-context](https://arxiv.org/abs/2602.18679v2)** | 2026-04-12 | <details><summary>Show</summary><p>Large-scale foundation models for scientific machine learning adapt to physical settings unseen during training, such as zero-shot transfer between turbulent scales. This phenomenon, in-context learning, challenges conventional understanding of learning and adaptation in physical systems. Here, we study in-context learning of dynamical systems in a minimal setting: we train a small two-layer, single-head transformer to forecast one dynamical system, and then evaluate its ability to forecast a different dynamical system without retraining. We discover an early tradeoff in training between in-distribution and out-of-distribution performance, which manifests as a secondary double descent phenomenon. We discover that attention-based models apply a transfer-operator forecasting strategy in-context. They (1) lift low-dimensional time series using delay embedding, to detect the system's higher-dimensional dynamical manifold, and (2) identify and forecast long-lived invariant sets that characterize the global flow on this manifold. Our results clarify the mechanism enabling large pretrained models to forecast unseen physical systems at test time without retraining, and they illustrate the unique ability of attention-based models to leverage global attractor information in service of short-term forecasts.</p></details> | 5 pages, 4 figures |\n| **[Fourier-KAN-Mamba: A Novel State-Space Equation Approach for Time-Series Anomaly Detection](https://arxiv.org/abs/2511.15083v2)** | 2026-04-12 | <details><summary>Show</summary><p>Time-series anomaly detection plays a critical role in numerous real-world applications, including industrial monitoring and fault diagnosis. Recently, Mamba-based state-space models have shown remarkable efficiency in long-sequence modeling. However, directly applying Mamba to anomaly detection tasks still faces challenges in capturing complex temporal patterns and nonlinear dynamics. In this paper, we propose Fourier-KAN-Mamba, a novel hybrid architecture that integrates Fourier layer, Kolmogorov-Arnold Networks (KAN), and Mamba selective state-space model. The Fourier layer extracts multi-scale frequency features, KAN enhances nonlinear representation capability, and a temporal gating control mechanism further improves the model's ability to distinguish normal and anomalous patterns. Extensive experiments on MSL, SMAP, and SWaT datasets demonstrate that our method significantly outperforms existing state-of-the-art approaches. Keywords: time-series anomaly detection, state-space model, Mamba, Fourier transform, Kolmogorov-Arnold Network</p></details> | <details><summary>We re...</summary><p>We request withdrawal because we identified a flaw in the theoretical analysis of the anomaly-score identification mechanism. This part was supported mainly by metric observations without sufficient visual or empirical verification, which may affect the reliability of the related conclusions</p></details> |\n| **[ZARA: Training-Free Motion Time-Series Reasoning via Evidence-Grounded LLM Agents](https://arxiv.org/abs/2508.04038v2)** | 2026-04-12 | <details><summary>Show</summary><p>Motion sensor time-series are central to Human Activity Recognition (HAR), yet conventional approaches are constrained to fixed activity sets and typically require costly parameter retraining to adapt to new behaviors. While Large Language Models (LLMs) offer promising open-set reasoning capabilities, applying them directly to numerical time-series often leads to hallucinations and weak grounding. To address this challenge, we propose ZARA (Zero-training Activity Reasoning Agents), a knowledge- and retrieval-augmented agentic framework for motion time-series reasoning in a training-free inference setting. Rather than relying on black-box projections, ZARA distills reference data into a statistically grounded textual knowledge base that transforms implicit signal patterns into verifiable natural-language priors. Guided by retrieved evidence, ZARA iteratively selects discriminative cues and performs grounded reasoning over candidate activities. Extensive experiments on eight benchmarks show that ZARA generalizes robustly to unseen subjects and across datasets, demonstrating strong transferability across heterogeneous sensor domains. These results mark a step toward trustworthy, plug-and-play motion understanding beyond dataset-specific artifacts. Our code is available at https://github.com/zechenli03/ZARA.</p></details> | <details><summary>Accep...</summary><p>Accepted by ACL 2026 Main Conference</p></details> |\n| **[On the Structure of Risk Contribution: A Leave-One-Out Decomposition into Inherent and Correlation Risk](https://arxiv.org/abs/2604.10375v1)** | 2026-04-11 | <details><summary>Show</summary><p>This paper develops a decomposition of standard Risk Contribution (RC) into two economically interpretable components: inherent risk and correlation risk. Using a leave-one-out representation, each position's RC separates into a term reflecting its own volatility contribution independent of the portfolio and a term capturing its covariance with the remainder of the portfolio. The inherent component is always positive, arising from the intrinsic volatility of the position, while the correlation component may amplify or mitigate total portfolio risk depending on how the position moves relative to other holdings. Because the decomposition operates within standard RC, it preserves the property of strict additivity. This separation provides diagnostic insight not visible from aggregate risk contributions alone. It distinguishes whether a position contributes risk because it is volatile in isolation or because it is highly correlated with the rest of the portfolio, and it clarifies when a negatively correlated position functions as an effective hedge. Two approaches to time-series analysis are presented to track how inherent and correlation risk evolve across market regimes, revealing whether changes in portfolio risk during stress periods are driven by volatility shocks, correlation shifts, or both. Empirical illustrations suggest that the decomposition provides stable, transparent, and easily implementable risk diagnostics that can support portfolio risk reporting, stress testing, and performance attribution.</p></details> | <details><summary>Code:...</summary><p>Code: https://github.com/nolanalexander/inherent-correlation-decomposition</p></details> |\n| **[Structural Gating and Effect-aligned Lag-resolved Temporal Causal Discovery Framework with Application to Heat-Pollution Extremes](https://arxiv.org/abs/2604.10371v1)** | 2026-04-11 | <details><summary>Show</summary><p>This study proposes Structural Gating and Effect-aligned Discovery for Temporal Causal Discovery (SGED-TCD), a novel and general framework for lag-resolved causal discovery in complex multivariate time series. SGED-TCD combines explicit structural gating, stability-oriented learning, perturbation-effect alignment, and unified graph extraction to improve the interpretability, robustness, and functional consistency of inferred causal graphs. To evaluate its effectiveness in a representative real-world setting, we apply SGED-TCD to teleconnection-driven compound heatwave--air-pollution extremes in eastern and northern China. Using large-scale climate indices, regional circulation and boundary-layer variables, and compound extreme indicators, the framework reconstructs weighted causal networks with explicit dominant lags and relative causal importance. The inferred networks reveal clear regional and seasonal heterogeneity: warm-season extremes in Eastern China are mainly linked to low-latitude oceanic variability through circulation, radiation, and ventilation pathways, whereas cold-season extremes in Northern China are more strongly governed by high-latitude circulation variability associated with boundary-layer suppression and persistent stagnation. These results show that SGED-TCD can recover physically interpretable, hierarchical, and lag-resolved causal pathways in a challenging climate--environment system. More broadly, the proposed framework is not restricted to the present application and provides a general basis for temporal causal discovery in other complex domains.</p></details> |  |\n| **[On testing for independence between generalized error models of several time series](https://arxiv.org/abs/2410.24003v5)** | 2026-04-11 | <details><summary>Show</summary><p>We define generalized innovations associated with generalized error models having arbitrary distributions, that is, distributions that can be mixtures of continuous and discrete distributions. These models include stochastic volatility models and regime-switching models. We also propose statistics for testing independence between the generalized errors of these models, extending previous results of Duchesne, Ghoudi and Remillard (2012) obtained for stochastic volatility models. We define families of empirical processes constructed from lagged generalized errors, and we show that their joint asymptotic distributions are Gaussian and independent of the estimated parameters of the individual time series. Moebius transformations of the empirical processes are used to obtain tractable covariances. Several tests statistics are then proposed, based on Cramer-von Mises statistics and dependence measures, as well as graphical methods to visualize the dependence. In addition, numerical experiments are performed to assess the power of the proposed tests. Finally, to show the usefulness of our methodologies, examples of applications for financial data and crime data are given to cover both discrete and continuous cases. ll developed methodologies are implemented in the CRAN package IndGenErrors.</p></details> |  |\n| **[Detecting Invariant Manifolds in ReLU-Based RNNs](https://arxiv.org/abs/2510.03814v4)** | 2026-04-11 | <details><summary>Show</summary><p>Recurrent Neural Networks (RNNs) have found widespread applications in machine learning for time series prediction and dynamical systems reconstruction, and experienced a recent renaissance with improved training algorithms and architectural designs. Understanding why and how trained RNNs produce their behavior is important for scientific and medical applications, and explainable AI more generally. An RNN's dynamical repertoire depends on the topological and geometrical properties of its state space. Stable and unstable manifolds of periodic points play a particularly important role: They dissect a dynamical system's state space into different basins of attraction, and their intersections lead to chaotic dynamics with fractal geometry. Here we introduce a novel algorithm for detecting these manifolds, with a focus on piecewise-linear RNNs (PLRNNs) employing rectified linear units (ReLUs) as their activation function. We demonstrate how the algorithm can be used to trace the boundaries between different basins of attraction, and hence to characterize multistability, a computationally important property. We further show its utility in finding so-called homoclinic points, the intersections between stable and unstable manifolds, and thus establish the existence of chaos in PLRNNs. Finally we show for an empirical example, electrophysiological recordings from a cortical neuron, how insights into the underlying dynamics could be gained through our method.</p></details> |  |\n| **[TimeSeriesExamAgent: Creating Time Series Reasoning Benchmarks at Scale](https://arxiv.org/abs/2604.10291v1)** | 2026-04-11 | <details><summary>Show</summary><p>Large Language Models (LLMs) have shown promising performance in time series modeling tasks, but do they truly understand time series data? While multiple benchmarks have been proposed to answer this fundamental question, most are manually curated and focus on narrow domains or specific skill sets. To address this limitation, we propose scalable methods for creating comprehensive time series reasoning benchmarks that combine the flexibility of templates with the creativity of LLM agents. We first develop TimeSeriesExam, a multiple-choice benchmark using synthetic time series to evaluate LLMs across five core reasoning categories: pattern recognitionnoise understandingsimilarity analysisanomaly detection, and causality. Then, with TimeSeriesExamAgent, we scale our approach by automatically generating benchmarks from real-world datasets spanning healthcare, finance and weather domains. Through multi-dimensional quality evaluation, we demonstrate that our automatically generated benchmarks achieve diversity comparable to manually curated alternatives. However, our experiments reveal that LLM performance remains limited in both abstract time series reasoning and domain-specific applications, highlighting ongoing challenges in enabling effective time series understanding in these models. TimeSeriesExamAgent is available at https://github.com/magwiazda/TimeSeriesExamAgent.</p></details> |  |\n| **[A Heavy-Load-Enhanced and Changeable-Periodicity-Perceived Workload Prediction Network](https://arxiv.org/abs/2308.01917v3)** | 2026-04-11 | <details><summary>Show</summary><p>Cloud providers can greatly benefit from accurate workload prediction. However, the workload of cloud servers is highly variable, with occasional workload bursts, which makes workload prediction challenging. The time series forecasting methods relying on periodicity information, often assume fixed and known periodicity length, which does not align with the periodicity-changeable nature of cloud service workloads. Although many state-of-the-art time-series forecasting methods do not rely on periodicity information and achieve high overall accuracy, they are vulnerable to data imbalance between heavy workloads and regular workloads. As a result, their prediction accuracy on rare heavy workloads is limited. Unfortunately, heavyload-prediction accuracy is more important than overall one, as errors in heavyload prediction are more likely to cause Service Level Agreement violations than errors in normal-load prediction. Thus, we propose a changeable-periodicity-perceived workload prediction network (PePNet) to fuse periodic information adaptively for periodicity-changeable time series and improve rare heavy workload prediction accuracy. It has two distinctive characteristics: (i) A Periodicity-Perceived Mechanism to detect the periodicity length automatically and fuses periodic information adaptively, which is suitable for periodicity-changeable time series, and (ii) An Achilles' Heel Loss Function that is used to iteratively optimize the most under-fitting part in predicting sequence for each step, thus evidently improving the prediction accuracy of heavy load. Extensive experiments conducted on real-world datasets demonstrate that PePNet improves accuracy for overall workload by 11.8% averagely, compared with state-of-the-art methods. Especially, PePNet improves accuracy for heavy workload by 21.0% averagely.</p></details> | <details><summary>Submi...</summary><p>Submitted to TII 2026</p></details> |\n| **[DistDF: Time-Series Forecasting Needs Joint-Distribution Wasserstein Alignment](https://arxiv.org/abs/2510.24574v2)** | 2026-04-11 | <details><summary>Show</summary><p>Training time-series forecasting models requires aligning the conditional distribution of model forecasts with that of the label sequence. The standard direct forecast (DF) approach resorts to minimizing the conditional negative log-likelihood, typically estimated by the mean squared error. However, this estimation proves biased when the label sequence exhibits autocorrelation. In this paper, we propose DistDF, which achieves alignment by minimizing a distributional discrepancy between the conditional distributions of forecast and label sequences. Since such conditional discrepancies are difficult to estimate from finite time-series observations, we introduce a joint-distribution Wasserstein discrepancy for time-series forecasting, which provably upper bounds the conditional discrepancy of interest. The proposed discrepancy is tractable, differentiable, and readily compatible with gradient-based optimization. Extensive experiments show that DistDF improves diverse forecasting models and achieves leading performance. Code is available at https://anonymous.4open.science/r/DistDF-F66B.</p></details> |  |\n| **[Predicting Associations between Solar Flares and Coronal Mass Ejections Using SDO/HMI Magnetograms and a Hybrid Neural Network](https://arxiv.org/abs/2604.10016v1)** | 2026-04-11 | <details><summary>Show</summary><p>Solar eruptions, including flares and coronal mass ejections (CMEs), have a significant impact on Earth. Some flares are associated with CMEs, and some flares are not. The association between flares and CMEs is not always obvious. In this study, we propose a new deep learning method, specifically a hybrid neural network (HNN) that combines a vision transformer with long short-term memory, to predict associations between flares and CMEs. HNN finds spatio-temporal patterns in the time series of line-of-sight magnetograms of solar active regions (ARs) collected by the Helioseismic and Magnetic Imager on board the Solar Dynamics Observatory and uses the patterns to predict whether a flare projected to occur within the next 24 hours will be eruptive (i.e., CME-associated) or confined (i.e., not CME-associated). Our experimental results demonstrate the good performance of the HNN method. Furthermore, the results show that magnetic flux cancellation in polarity inversion line regions may well play a role in triggering flare-associated CMEs, a finding consistent with literature.</p></details> | 14 pages, 8 figures |\n| **[A Weak Penalty Neural ODE for Learning Chaotic Dynamics from Noisy Time Series](https://arxiv.org/abs/2511.06609v3)** | 2026-04-10 | <details><summary>Show</summary><p>The accurate forecasting of complex, high-dimensional dynamical systems from observational data is a fundamental task across numerous scientific and engineering disciplines. A significant challenge arises from noise-corrupted measurements, which severely degrade the performance of data-driven models. In chaotic dynamical systems, where small initial errors amplify exponentially, it is particularly difficult to develop a model from noisy data that achieves short-term accuracy while preserving long-term invariant properties. To overcome this, we consider the weak formulation as a complementary approach to the classical $L2$-loss function for training models of dynamical systems. We empirically verify that the weak formulation, with a proper choice of test function and integration domain, effectively filters noisy data. This insight explains why a weak form loss function is analogous to fitting a model to filtered data and provides a practical way to parameterize the weak form. Subsequently, we demonstrate how this approach overcomes the instability and inaccuracy of standard Neural ODE (NODE) in modeling chaotic systems. Through numerical examples, we show that our proposed training strategy, the Weak Penalty NODE, is computationally efficient, solver-agnostic, and yields accurate and robust forecasts across benchmark chaotic systems and a real-world climate dataset.</p></details> |  |\n| **[LLM4Delay: Flight Delay Prediction via Cross-Modality Adaptation of Large Language Models and Aircraft Trajectory Representation](https://arxiv.org/abs/2510.23636v3)** | 2026-04-10 | <details><summary>Show</summary><p>Flight delay prediction has become a key focus in air traffic management (ATM), as delays reflect inefficiencies in the system. This paper proposes LLM4Delay, a large language model (LLM)-based framework for predicting flight delays from the perspective of air traffic controllers monitoring aircraft after they enter the terminal maneuvering area (TMA). LLM4Delay is designed to integrate textual aeronautical information, including flight data, weather reports, and aerodrome notices, together with multiple trajectories that model airspace conditions, forming a comprehensive delay-relevant context. By jointly leveraging comprehensive textual and trajectory contexts via instance-level projection, an effective cross-modality adaptation strategy that maps multiple instance-level trajectory representations into the language modality, the framework improves delay prediction accuracy. LLM4Delay demonstrates superior performance compared to existing ATM frameworks and prior time-series-to-language adaptation methods. This highlights the complementary roles of textual and trajectory data while leveraging knowledge from both the pretrained trajectory encoder and the pretrained LLM. The proposed framework enables continuous updates to predictions as new information becomes available, indicating potential operational relevance.</p></details> | <details><summary>Prepr...</summary><p>Preprint submitted to IEEE Transactions on Intelligent Transportation Systems (T-ITS) for possible publication</p></details> |\n| **[A scalable estimator of higher-order information in complex dynamical systems](https://arxiv.org/abs/2506.18498v2)** | 2026-04-10 | <details><summary>Show</summary><p>Our understanding of complex systems rests on our ability to characterise how they perform distributed computation and integrate information. Advances in information theory have introduced several quantities to describe complex information structures, where collective patterns of coordination emerge from higher-order (i.e. beyond-pairwise) interdependencies. Unfortunately, the use of these approaches to study large complex systems is severely hindered by the poor scalability of existing techniques. Moreover, there are relatively few measures specifically designed for multivariate time series data. Here we introduce a novel measure of information about macroscopic structures, termed M-information, which quantifies the higher-order integration of information in complex dynamical systems. We show that M-information can be calculated via a convex optimisation problem, and we derive a robust and efficient algorithm that scales gracefully with system size. Our analyses show that M-information is resilient to noise, indexes critical behaviour in artificial neuronal populations, and reflects states of consciousness and task performance in real-world macaque and mouse neuroimaging data. Furthermore, M-information can be incorporated into existing information decomposition frameworks to reveal a comprehensive taxonomy of information dynamics. Taken together, these results help us unravel collective computation in large complex systems.</p></details> | <details><summary>12 pa...</summary><p>12 pages, 5 figures + appendix</p></details> |\n| **[Drift-Aware Online Dynamic Learning for Nonstationary Multivariate Time Series: Application to Sintering Quality Prediction](https://arxiv.org/abs/2604.09358v1)** | 2026-04-10 | <details><summary>Show</summary><p>Accurate prediction of nonstationary multivariate time series remains a critical challenge in complex industrial systems such as iron ore sintering. In practice, pronounced concept drift compounded by significant label verification latency rapidly degrades the performance of offline-trained models. Existing methods based on static architectures or passive update strategies struggle to simultaneously extract multi-scale spatiotemporal features and overcome the stability-plasticity dilemma without immediate supervision. To address these limitations, a Drift-Aware Multi-Scale Dynamic Learning (DA-MSDL) framework is proposed to maintain robust multi-output predictive performance via online adaptive mechanisms on nonstationary data streams. The framework employs a multi-scale bi-branch convolutional network as its backbone to disentangle local fluctuations from long-term trends, thereby enhancing representational capacity for complex dynamic patterns. To circumvent the label latency bottleneck, DA-MSDL leverages Maximum Mean Discrepancy (MMD) for unsupervised drift detection. By quantifying online statistical deviations in feature distributions, DA-MSDL proactively triggers model adaptation prior to inference. Furthermore, a drift-severity-guided hierarchical fine-tuning strategy is developed. Supported by prioritized experience replay from a dynamic memory queue, this approach achieves rapid distribution alignment while effectively mitigating catastrophic forgetting. Long-horizon experiments on real-world industrial sintering data and a public benchmark dataset demonstrate that DA-MSDL consistently outperforms representative baselines under severe concept drift. Exhibiting strong cross-domain generalization and predictive stability, the proposed framework provides an effective online dynamic learning paradigm for quality monitoring in nonstationary environments.</p></details> |  |\n| **[QARIMA: A Quantum Approach To Classical Time Series Analysis](https://arxiv.org/abs/2604.08277v2)** | 2026-04-10 | <details><summary>Show</summary><p>We present a quantum-inspired ARIMA methodology that integrates quantum-assisted lag discovery with fixed-configuration variational quantum circuits (VQCs) for parameter estimation and weak-lag refinement. Differencing and candidate lags are identified via swap-test-driven quantum autocorrelation (QACF) and quantum partial autocorrelation (QPACF), with a delayed-matrix construction that aligns quantum projections to time-domain regressors, followed by standard information-criterion parsimony. Given the screened orders (p,d,q), we retain a fixed VQC ansatz, optimizer, and training budget, preventing hyperparameter leakage, and deploy the circuit in two estimation roles: VQC-AR for autoregressive coefficients and VQC-MA for moving-average coefficients. Between screening and estimation, a lightweight VQC weak-lag refinement re-weights or prunes screened AR lags without altering (p,d,q). Across environmental and industrial datasets, we perform rolling-origin evaluations against automated classical ARIMA, reporting out-of-sample mean squared error (MSE), mean absolute percentage error (MAPE), and Diebold-Mariano tests on MSE and MAE. Empirically, the seven quantum contributions (1) differencing selection, (2) QACF, (3) QPACF, (4) swap-test primitives with delayed-matrix construction, (5) VQC-AR, (6) VQC weak-lag refinement, and (7) VQC-MA collectively reduce meta-optimization overhead and make explicit where quantum effects enter order discovery, lag refinement, and AR/MA parameter estimation.</p></details> | <details><summary>17 Al...</summary><p>17 Algorithms, 19 Figures , 26 Tables</p></details> |\n| **[BEDTime: A Unified Benchmark for Automatically Describing Time Series](https://arxiv.org/abs/2509.05215v3)** | 2026-04-10 | <details><summary>Show</summary><p>Recent works propose complex multi-modal models that handle both time series and language, ultimately claiming high performance on complex tasks like time series reasoning and cross-modal question answering. However, they skip foundational evaluations that such complex models should have mastered. So we ask a simple question: \\textit{How well can recent models describe structural properties of time series?} To answer this, we propose that successful models should be able to \\textit{recognize}, \\textit{differentiate}, and \\textit{generate} descriptions of univariate time series. We then create \\textbf{\\benchmark}, a benchmark to assess these novel tasks, that comprises \\textbf{five datasets} reformatted across \\textbf{three modalities}. In evaluating \\textbf{17 state-of-the-art models}, we find that (1) surprisingly, dedicated time series-language models fall short, despite being designed for similar tasks, (2) vision language models are quite capable, (3) language only methods perform worst, despite many lauding their potential, and (4) all approaches are clearly fragile to a range of real world robustness tests, indicating directions for future work. Together, our findings critique prior works' claims and provide avenues for advancing multi-modal time series modeling.</p></details> |  |\n| **[Batch Distillation Data for Developing Machine Learning Anomaly Detection Methods](https://arxiv.org/abs/2510.18075v2)** | 2026-04-10 | <details><summary>Show</summary><p>Machine learning (ML) holds great potential to advance anomaly detection (AD) in chemical processes. However, the development of ML-based methods is hindered by the lack of openly available experimental data. To address this gap, we have set up a laboratory-scale batch distillation plant and operated it to generate an extensive experimental database, covering fault-free experiments and experiments in which anomalies were intentionally induced, for training advanced ML-based AD methods. In total, 119 experiments were conducted across a wide range of operating conditions and mixtures. Most experiments containing anomalies were paired with a corresponding fault-free one. The database that we provide here includes time-series data from numerous sensors and actuators, along with estimates of measurement uncertainty. In addition, unconventional data sources -- such as concentration profiles obtained via online benchtop NMR spectroscopy and video and audio recordings -- are provided. Extensive metadata and expert annotations of all experiments are included. The anomaly annotations are based on an ontology developed in this work. The data are organized in a structured database and made freely available via doi.org/10.5281/zenodo.17395543. This new database paves the way for the development of advanced ML-based AD methods. As it includes information on the causes of anomalies, it further enables the development of interpretable and explainable ML approaches, as well as methods for anomaly mitigation.</p></details> |  |\n| **[AR-KAN: Autoregressive-Weight-Enhanced Kolmogorov-Arnold Network for Time Series Forecasting](https://arxiv.org/abs/2509.02967v3)** | 2026-04-10 | <details><summary>Show</summary><p>Traditional neural networks struggle to capture the spectral structure of complex signals. Fourier neural networks (FNNs) attempt to address this by embedding Fourier series components, yet many real-world signals are almost-periodic with non-commensurate frequencies, posing additional challenges. Building on prior work showing that ARIMA outperforms large language models (LLMs) for time series forecasting, we extend the comparison to neural predictors and find that ARIMA still maintains a clear advantage. Inspired by this finding, we propose the Autoregressive-Weight-Enhanced Kolmogorov-Arnold Network (AR-KAN). Based in the Universal Myopic Mapping Theorem, it integrates a pre-trained AR module for temporal memory with a KAN for nonlinear representation. We prove that the AR module preserves essential temporal features while reducing redundancy, and that the upper bound of the approximation error for AR-KAN is smaller than that for KAN in a probabilistic sense. Experimental results also demonstrate that AR-KAN delivers exceptional performance compared to existing models, both on synthetic almost-periodic functions and real-world datasets. These results highlight AR-KAN as a robust and effective framework for time series forecasting. Our code is available at https://github.com/ChenZeng001/AR-KAN.</p></details> |  |\n| **[Temporal Patch Shuffle (TPS): Leveraging Patch-Level Shuffling to Boost Generalization and Robustness in Time Series Forecasting](https://arxiv.org/abs/2604.09067v1)** | 2026-04-10 | <details><summary>Show</summary><p>Data augmentation is a crucial technique for improving model generalization and robustness, particularly in deep learning models where training data is limited. Although many augmentation methods have been developed for time series classification, most are not directly applicable to time series forecasting due to the need to preserve temporal coherence. In this work, we propose Temporal Patch Shuffle (TPS), a simple and model-agnostic data augmentation method for forecasting that extracts overlapping temporal patches, selectively shuffles a subset of patches using variance-based ordering as a conservative heuristic, and reconstructs the sequence by averaging overlapping regions. This design increases sample diversity while preserving forecast-consistent local temporal structure. We extensively evaluate TPS across nine long-term forecasting datasets using five recent model families (TSMixer, DLinear, PatchTST, TiDE, and LightTS), and across four short-term forecasting datasets using PatchTST, observing consistent performance improvements. Comprehensive ablation studies further demonstrate the effectiveness, robustness, and design rationale of the proposed method.</p></details> | <details><summary>25 pa...</summary><p>25 pages, 7 figures, 17 tables</p></details> |\n| **[TS-Reasoner: Domain-Oriented Time Series Inference Agents for Reasoning and Automated Analysis](https://arxiv.org/abs/2410.04047v6)** | 2026-04-10 | <details><summary>Show</summary><p>Time series analysis is crucial in real-world applications, yet traditional methods focus on isolated tasks only, and recent studies on time series reasoning remain limited to either single-step inference or are constrained to natural language answers. In this work, we introduce TS-Reasoner, a domain-specialized agent designed for multi-step time series inference. By integrating large language model (LLM) reasoning with domain-specific computational tools and an error feedback loop, TS-Reasoner enables domain-informed, constraint-aware analytical workflows that combine symbolic reasoning with precise numerical analysis. We assess the system's capabilities along two axes: (1) fundamental time series understanding assessed by TimeSeriesExam and (2) complex, multi-step inference evaluated by a newly proposed dataset designed to test both compositional reasoning and computational precision in time series analysis. Experiments show that our approach outperforms standalone general-purpose LLMs in both basic time series concept understanding as well as the multi-step time series inference task, highlighting the promise of domain-specialized agents for automating real-world time series reasoning and analysis.</p></details> |  |\n| **[Lightweight and Generalizable Multi-Sensor Human Activity Recognition via Cascaded Fusion and Style-Augmented Decomposition](https://arxiv.org/abs/2604.08910v1)** | 2026-04-10 | <details><summary>Show</summary><p>Wearable Human Activity Recognition (WHAR) is a prominent research area within ubiquitous computing, whose core lies in effectively modeling intra- and inter-sensor spatio-temporal relationships from multi-modal time series data. Existing methods either suffer from high computational complexity due to attention-based fusion or lack robustness to data variations during feature extraction. To address these issues, we propose a lightweight and generalizable framework that retains the core \"decomposition-extraction-fusion\" paradigm while introducing two key innovations. First, we replace the computationally expensive Attention and Cross-Variable Fusion (CVF) modules with a Cascaded Fusion Block (CFB), which achieves efficient feature interaction without explicit attention weights through the operational process of \"compression-recursion-concatenation-fusion\". Second, we integrate a MixStyle-based data augmentation module before the Local Temporal Feature Extraction (LTFE) and Global Temporal Aggregation (GTA) stages. By mixing the mean and variance of different samples within a batch and introducing random coefficients to perturb the data distribution, the model's generalization ability is enhanced without altering the core information of the data. The proposed framework maintains sensor-level, variable-level, and channel-level independence during the decomposition phase, and achieves efficient feature fusion and robust feature extraction in subsequent processes. Experiments on two benchmark datasets (Realdisp, Skoda) demonstrate that our model outperforms state-of-the-art methods in both accuracy and macro-F1 score, while reducing computational overhead by more than 30\\% compared to attention-based baselines. This work provides a practical solution for WHAR applications on resource-constrained wearable devices.</p></details> | <details><summary>8 pag...</summary><p>8 pages. arXiv admin note: text overlap with arXiv:2501.10917 by other authors</p></details> |\n| **[AlphaCast: A Human Wisdom-LLM Intelligence Co-Reasoning Framework for Interactive Time Series Forecasting](https://arxiv.org/abs/2511.08947v5)** | 2026-04-10 | <details><summary>Show</summary><p>Time series forecasting plays a crucial role in decision-making across many real-world applications. Despite substantial progress, most existing methods still treat forecasting as a static, single-pass regression problem. In contrast, human experts form predictions through iterative reasoning that integrates temporal features, domain knowledge, case-based references, and supplementary context, with continuous refinement. In this work, we propose Alphacast, an interaction-driven agentic reasoning framework that enables accurate time series forecasting with training-free large language models. Alphacast reformulates forecasting as an expert-like process and organizes it into a multi-stage workflow involving context preparation, reasoning-based generation, and reflective evaluation, transforming forecasting from a single-pass output into a multi-turn, autonomous interaction process. To support diverse perspectives commonly considered by human experts, we develop a lightweight toolkit comprising a feature set, a knowledge base, a case library, and a contextual pool that provides external support for LLM-based reasoning. Extensive experiments across multiple benchmarks show that Alphacast generally outperforms representative baselines. Code is available at this repository: https://github.com/echo01-ai/AlphaCast.</p></details> |  |\n| **[Forecasting the Evolving Composition of Inbound Tourism Demand: A Bayesian Compositional Time Series Approach Using Platform Booking Data](https://arxiv.org/abs/2602.18358v3)** | 2026-04-09 | <details><summary>Show</summary><p>Understanding how the composition of guest origin markets evolves over time is critical for destination marketing organizations, hospitality businesses, and tourism planners. We develop and apply Bayesian Dirichlet autoregressive moving average (BDARMA) models to forecast the compositional dynamics of guest origin market shares using proprietary Airbnb booking data spanning 2017--2025 across four major destination regions. Our analysis reveals substantial pandemic-induced structural breaks in origin composition, with heterogeneous recovery patterns across markets. In our analysis, the BDARMA framework achieves the lowest forecast error for EMEA and competitive performance across destination regions, outperforming standard benchmarks including naïve forecasts, exponential smoothing, and SARIMA on log-ratio transformed data in compositionally complex markets. For EMEA destinations, BDARMA achieves 27% lower forecast error than naïve methods ($p < 0.001$), with the greatest gains where multiple origin markets compete in the 5-25% share range. By modeling compositions directly on the simplex with a Dirichlet likelihood and incorporating seasonal variation in both mean and precision parameters, our approach produces coherent forecasts that respect the unit-sum constraint while capturing complex temporal dependencies. The methodology provides destination stakeholders with probabilistic forecasts of source market shares, enabling more informed strategic planning for marketing resource allocation, infrastructure investment, and crisis response.</p></details> |  |\n| **[StationarityToolkit: Comprehensive Time Series Stationarity Analysis in Python](https://arxiv.org/abs/2604.08676v1)** | 2026-04-09 | <details><summary>Show</summary><p>Time-series stationarity is a property that statistical characteristics such as trend, variance, seasonality remain constant over time. It is considered fundamental to many forecasting and analysis methods. Different tests detect different types of non-stationarity: structural breaks or deterministic trends, clustered or time-dependent variance, stochastic or deterministic seasonality. A series might pass one test while failing another; single-test approaches seldom distinguish between conceptually different types of non-stationarity that require different types of tests and transformations. `StationarityToolkit` addresses this by providing a comprehensive Python library that runs 10 statistical tests across three categories: trend (4 tests), variance (4 tests), and seasonality (2 tests). Rather than a binary stationary/non-stationary verdict, users receive detailed diagnostics with actionable notes for each detection. The toolkit automatically infers the frequency of the data provided (requires datetime index), provides clear interpretations with test statistics and p-values, and supports an iterative test-transform-retest workflow essential for real-world data sets.</p></details> | <details><summary>Submi...</summary><p>Submitted to Journal of Open Source Software</p></details> |\n| **[Zero-shot Multivariate Time Series Forecasting Using Tabular Prior Fitted Networks](https://arxiv.org/abs/2604.08400v1)** | 2026-04-09 | <details><summary>Show</summary><p>Tabular foundation models, particularly Prior-data Fitted Networks like TabPFN have emerged as the leading contender in a myriad of tasks ranging from data imputation to label prediction on the tabular data format surpassing the historical successes of tree-based models. This has led to investigations on their applicability to forecasting time series data which can be formulated as a tabular problem. While recent work to this end has displayed positive results, most works have limited their treatment of multivariate time series problems to several independent univariate time series forecasting subproblems, thus ignoring any inter-channel interactions. Overcoming this limitation, we introduce a generally applicable framework for multivariate time series forecasting using tabular foundation models. We achieve this by recasting the multivariate time series forecasting problem as a series of scalar regression problems which can then be solved zero-shot by any tabular foundation model with regression capabilities. We present results of our method using the TabPFN-TS backbone and compare performance with the current state of the art tabular methods.</p></details> |  |\n| **[ADAPTive Input Training for Many-to-One Pre-Training on Time-Series Classification](https://arxiv.org/abs/2604.08398v1)** | 2026-04-09 | <details><summary>Show</summary><p>Recent work on time-series models has leveraged self-supervised training to learn meaningful features and patterns in order to improve performance on downstream tasks and generalize to unseen modalities. While these pretraining methods have shown great promise in one-to-many scenarios, where a model is pre-trained on one dataset and fine-tuned on a downstream dataset, they have struggled to generalize to new datasets when more datasets are added during pre-training. This is a fundamental challenge in building foundation models for time-series data, as it limits the ability to develop models that can learn from a large variety of diverse datasets available. To address this challenge, we present a new pre-training paradigm for time-series data called ADAPT, which can efficiently align the physical properties of data in the time-series domain, enabling mixed-batch pre-training despite the extreme discrepancies in the input sizes and channel dimensions of pre-training data. We trained on 162 time-series classification datasets and set new state-of-the-art performance for classification benchmarks. We successfully train a model within the time-series domain on a wide range of datasets simultaneously, which is a major building block for building generalist foundation models in time-series domains.</p></details> |  |\n\n## Trajectory\n| **Title** | **Date** | **Abstract** | **Comment** |\n| --- | --- | --- | --- |\n| **[LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories](https://arxiv.org/abs/2604.15311v1)** | 2026-04-16 | <details><summary>Show</summary><p>This paper focuses on the alignment of flow matching models with human preferences. A promising way is fine-tuning by directly backpropagating reward gradients through the differentiable generation process of flow matching. However, backpropagating through long trajectories results in prohibitive memory costs and gradient explosion. Therefore, direct-gradient methods struggle to update early generation steps, which are crucial for determining the global structure of the final image. To address this issue, we introduce LeapAlign, a fine-tuning method that reduces computational cost and enables direct gradient propagation from reward to early generation steps. Specifically, we shorten the long trajectory into only two steps by designing two consecutive leaps, each skipping multiple ODE sampling steps and predicting future latents in a single step. By randomizing the start and end timesteps of the leaps, LeapAlign leads to efficient and stable model updates at any generation step. To better use such shortened trajectories, we assign higher training weights to those that are more consistent with the long generation path. To further enhance gradient stability, we reduce the weights of gradient terms with large magnitude, instead of completely removing them as done in previous works. When fine-tuning the Flux model, LeapAlign consistently outperforms state-of-the-art GRPO-based and direct-gradient methods across various metrics, achieving superior image quality and image-text alignment.</p></details> | <details><summary>Accep...</summary><p>Accepted by CVPR 2026. Project page: https://rockeycoss.github.io/leapalign/</p></details> |\n| **[OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis](https://arxiv.org/abs/2604.15093v1)** | 2026-04-16 | <details><summary>Show</summary><p>Mobile agents powered by vision-language models have demonstrated impressive capabilities in automating mobile tasks, with recent leading models achieving a marked performance leap, e.g., nearly 70% success on AndroidWorld. However, these systems keep their training data closed and remain opaque about their task and trajectory synthesis recipes. We present OpenMobile, an open-source framework that synthesizes high-quality task instructions and agent trajectories, with two key components: (1) The first is a scalable task synthesis pipeline that constructs a global environment memory from exploration, then leverages it to generate diverse and grounded instructions. and (2) a policy-switching strategy for trajectory rollout. By alternating between learner and expert models, it captures essential error-recovery data often missing in standard imitation learning. Agents trained on our data achieve competitive results across three dynamic mobile agent benchmarks: notably, our fine-tuned Qwen2.5-VL and Qwen3-VL reach 51.7% and 64.7% on AndroidWorld, far surpassing existing open-data approaches. Furthermore, we conduct transparent analyses on the overlap between our synthetic instructions and benchmark test sets, and verify that performance gains stem from broad functionality coverage rather than benchmark overfitting. We release data and code at https://njucckevin.github.io/openmobile/ to bridge the data gap and facilitate broader mobile agent research.</p></details> | Work in progress |\n| **[Trajectory Planning for a Multi-UAV Rigid-Payload Cascaded Transportation System Based on Enhanced Tube-RRT*](https://arxiv.org/abs/2604.15074v1)** | 2026-04-16 | <details><summary>Show</summary><p>This paper presents a two-stage trajectory planning framework for a multi-UAV rigid-payload cascaded transportation system, aiming to address planning challenges in densely cluttered environments. In Stage I, an Enhanced Tube-RRT* algorithm is developed by integrating active hybrid sampling and an adaptive expansion strategy, enabling rapid generation of a safe and feasible virtual tube in environments with dense obstacles. Moreover, a trajectory smoothness cost is explicitly incorporated into the edge cost to reduce excessive turns and thereby mitigate cable-induced oscillations. Simulation results demonstrate that the proposed Enhanced Tube-RRT* achieves a higher success rate and effective sampling rate than mixed-sampling Tube-RRT* (STube-RRT*) and adaptive-extension Tube-RRT* (AETube-RRT*), while producing a shorter optimal path with a smaller cumulative turning angle. In Stage II, a convex quadratic program is formulated by considering payload translational and rotational dynamics, cable tension constraints, and collision-safety constraints, yielding a smooth, collision-free desired payload trajectory. Finally, a centralized geometric control scheme is applied to the cascaded system to validate the effectiveness and feasibility of the proposed planning framework, offering a practical solution for payload attitude maneuvering in densely cluttered environments.</p></details> | <details><summary>15 pa...</summary><p>15 pages, 7 figures. Under review at IEEE Transactions on Aerospace and Electronic Systems (TAES). This work has been submitted to the IEEE for possible publication</p></details> |\n| **[Predicting Power-System Dynamic Trajectories with Foundation Models](https://arxiv.org/abs/2604.14991v1)** | 2026-04-16 | <details><summary>Show</summary><p>As power systems transition toward renewable-rich and inverter-dominated operations, accurate time-domain dynamic analysis becomes increasingly critical. Such analysis supports key operational tasks, including transient stability assessment, dynamic security analysis, contingency screening, and post-fault trajectory evaluation. In practice, these tasks may operate under several challenges, including unknown and time-varying system parameters, privacy constraints on data sharing, and the need for fast online inference. Existing learning-based approaches are typically trained for individual systems and therefore lack generalization across operating conditions and physical parameters. Hence, this paper proposes LArge Scale Small ODE (LASS)-ODE-Power, a learning framework for general-purpose time-domain prediction. The proposed approach leverages large-scale pretraining on more than 40 GB of DAE or ordinary differential-equation (ODE) trajectories to learn transferable representations. The resulting model supports trajectory prediction from short measurement prefixes across diverse dynamic regimes, including electromechanical and inverter-driven systems. Hence, the model can be directly used without data sharing in a zero-shot setting. In addition, the proposed architecture incorporates parallel and linearized computation to achieve fast inference. Moreover, to enhance task-specific performance in power systems, a specialized fine-tuning strategy is developed based on approximately 1 GB of heterogeneous power-system dynamic data. Extensive experiments over diverse power-system simulation scenarios demonstrate that LASS-ODE-Power consistently outperforms existing learning-based models in trajectory prediction accuracy with efficient inference.</p></details> | 10 pages |\n| **[Momentum-constrained Hybrid Heuristic Trajectory Optimization Framework with Residual-enhanced DRL for Visually Impaired Scenarios](https://arxiv.org/abs/2604.14986v1)** | 2026-04-16 | <details><summary>Show</summary><p>Safe and efficient assistive planning for visually impaired scenarios remains challenging, since existing methods struggle with multi-objective optimization, generalization, and interpretability. In response, this paper proposes a Momentum-Constrained Hybrid Heuristic Trajectory Optimization Framework (MHHTOF). To balance multiple objectives of comfort and safety, the framework designs a Heuristic Trajectory Sampling Cluster (HTSC) with a Momentum-Constrained Trajectory Optimization (MTO), which suppresses abrupt velocity and acceleration changes. In addition, a novel residual-enhanced deep reinforcement learning (DRL) module refines candidate trajectories, advancing temporal modeling and policy generalization. Finally, a dual-stage cost modeling mechanism (DCMM) is introduced to regulate optimization, where costs in the Frenet space ensure consistency, and reward-driven adaptive weights in the Cartesian space integrate user preferences for interpretability and user-centric decision-making. Experimental results show that the proposed framework converges in nearly half the iterations of baselines and achieves lower and more stable costs. In complex dynamic scenarios, MHHTOF further demonstrates stable velocity and acceleration curves with reduced risk, confirming its advantages in robustness, safety, and efficiency.</p></details> | <details><summary>24 pa...</summary><p>24 pages, 14 figures. arXiv admin note: text overlap with arXiv:2509.15582</p></details> |\n| **[Reward-Aware Trajectory Shaping for Few-step Visual Generation](https://arxiv.org/abs/2604.14910v1)** | 2026-04-16 | <details><summary>Show</summary><p>Achieving high-fidelity generation in extremely few sampling steps has long been a central goal of generative modeling. Existing approaches largely rely on distillation-based frameworks to compress the original multi-step denoising process into a few-step generator. However, such methods inherently constrain the student to imitate a stronger multi-step teacher, imposing the teacher as an upper bound on student performance. We argue that introducing \\textbf{preference alignment awareness} enables the student to optimize toward reward-preferred generation quality, potentially surpassing the teacher instead of being restricted to rigid teacher imitation. To this end, we propose \\textbf{Reward-Aware Trajectory Shaping (RATS)}, a lightweight framework for preference-aligned few-step generation. Specifically, teacher and student latent trajectories are aligned at key denoising stages through horizon matching, while a \\textbf{reward-aware gate} is introduced to adaptively regulate teacher guidance based on their relative reward performance. Trajectory shaping is strengthened when the teacher achieves higher rewards, and relaxed when the student matches or surpasses the teacher, thereby enabling continued reward-driven improvement. By seamlessly integrating trajectory distillation, reward-aware gating, and preference alignment, RATS effectively transfers preference-relevant knowledge from high-step generators without incurring additional test-time computational overhead. Experimental results demonstrate that RATS substantially improves the efficiency--quality trade-off in few-step visual generation, significantly narrowing the gap between few-step students and stronger multi-step generators.</p></details> |  |\n| **[Benchmarks for Trajectory Safety Evaluation and Diagnosis in OpenClaw and Codex: ATBench-Claw and ATBench-CodeX](https://arxiv.org/abs/2604.14858v1)** | 2026-04-16 | <details><summary>Show</summary><p>As agent systems move into increasingly diverse execution settings, trajectory-level safety evaluation and diagnosis require benchmarks that evolve with them. ATBench is a diverse and realistic agent trajectory benchmark for safety evaluation and diagnosis. This report presents ATBench-Claw and ATBench-CodeX, two domain-customized extensions that carry ATBench into the OpenClaw and OpenAI Codex / Codex-runtime settings. The key adaptation mechanism is to analyze each new setting, customize the three-dimensional Safety Taxonomy over risk source, failure mode, and real-world harm, and then use that customized taxonomy to define the benchmark specification consumed by the shared ATBench construction pipeline. This extensibility matters because agent frameworks remain relatively stable at the architectural level even as their concrete execution settings, tool ecosystems, and product capabilities evolve quickly. Concretely, ATBench-Claw targets OpenClaw-sensitive execution chains over tools, skills, sessions, and external actions, while ATBench-CodeX targets trajectories in the OpenAI Codex / Codex-runtime setting over repositories, shells, patches, dependencies, approvals, and runtime policy boundaries. Our emphasis therefore falls on taxonomy customization, domain-specific risk coverage, and benchmark design under a shared ATBench generation framework.</p></details> | 18 pages, 3 figures |\n| **[Trajectory-based actuator identification via differentiable simulation](https://arxiv.org/abs/2604.10351v2)** | 2026-04-16 | <details><summary>Show</summary><p>Accurate actuation models are critical for bridging the gap between simulation and real robot behavior, yet obtaining high-fidelity actuator dynamics typically requires dedicated test stands and torque sensing. We present a trajectory-based actuator identification method that uses differentiable simulation to fit system-level actuator models from encoder motion alone. Identification is posed as a trajectory-matching problem: given commanded joint positions and measured joint angles and velocities, we optimize actuator and simulator parameters by backpropagating through the simulator, without torque sensors, current/voltage measurements, or access to embedded motor-control internals. The framework supports multiple model classes, ranging from compact structured parameterizations to neural actuator mappings, within a unified optimization pipeline. On held-out real-robot trajectories for a high-gear-ratio actuator with an embedded PD controller, the proposed torque-sensor-free identification achieves much tighter trajectory alignment than a supervised stand-trained baseline dominated by steady-state data, reducing mean absolute position error from 14.20 mrad to as low as 7.54 mrad (1.88 times). Finally, we demonstrate downstream impact for the same actuator class in a real-robot locomotion study: training policies with the refined actuator model increases travel distance by 46% and reduces rotational deviation by 75% relative to the baseline.</p></details> |  |\n| **[Tora3: Trajectory-Guided Audio-Video Generation with Physical Coherence](https://arxiv.org/abs/2604.09057v2)** | 2026-04-16 | <details><summary>Show</summary><p>Audio-video (AV) generation has recently made strong progress in perceptual quality and multimodal coherence, yet generating content with plausible motion-sound relations remains challenging. Existing methods often produce object motions that are visually unstable and sounds that are only loosely aligned with salient motion or contact events, largely because they lack an explicit motion-aware structure shared by video and audio generation. We present Tora3, a trajectory-guided AV generation framework that improves physical coherence by using object trajectories as a shared kinematic prior. Rather than treating trajectories as a video-only control signal, Tora3 uses them to jointly guide visual motion and acoustic events. Specifically, we design a trajectory-aligned motion representation for video, a kinematic-audio alignment module driven by trajectory-derived second-order kinematic states, and a hybrid flow matching scheme that preserves trajectory fidelity in trajectory-conditioned regions while maintaining local coherence elsewhere. We further curate PAV, a large-scale AV dataset emphasizing motion-relevant patterns with automatically extracted motion annotations. Extensive experiments show that Tora3 improves motion realism, motion-sound synchronization, and overall AV generation quality over strong open-source baselines.</p></details> | <details><summary>12 pa...</summary><p>12 pages, 5 tables, 5 figures</p></details> |\n| **[FinTrace: Holistic Trajectory-Level Evaluation of LLM Tool Calling for Long-Horizon Financial Tasks](https://arxiv.org/abs/2604.10015v2)** | 2026-04-15 | <details><summary>Show</summary><p>Recent studies demonstrate that tool-calling capability enables large language models (LLMs) to interact with external environments for long-horizon financial tasks. While existing benchmarks have begun evaluating financial tool calling, they focus on limited scenarios and rely on call-level metrics that fail to capture trajectory-level reasoning quality. To address this gap, we introduce FinTrace, a benchmark comprising 800 expert-annotated trajectories spanning 34 real-world financial task categories across multiple difficulty levels. FinTrace employs a rubric-based evaluation protocol with nine metrics organized along four axes -- action correctness, execution efficiency, process quality, and output quality -- enabling fine-grained assessment of LLM tool-calling behavior. Our evaluation of 13 LLMs reveals that while frontier models achieve strong tool selection, all models struggle with information utilization and final answer quality, exposing a critical gap between invoking the right tools and reasoning effectively over their outputs. To move beyond diagnosis, we construct FinTrace-Training, the first trajectory-level preference dataset for financial tool-calling, containing 8,196 curated trajectories with tool-augmented contexts and preference pairs. We fine-tune Qwen-3.5-9B using supervised fine-tuning followed by direct preference optimization (DPO) and show that training on FinTrace-Training consistently improves intermediate reasoning metrics, with DPO more effectively suppressing failure modes. However, end-to-end answer quality remains a bottleneck, indicating that trajectory-level improvements do not yet fully propagate to final output quality.</p></details> |  |\n| **[Staying on Track: Efficient Trajectory Discovery with Adaptive Batch Sampling](https://arxiv.org/abs/2510.18099v2)** | 2026-04-15 | <details><summary>Show</summary><p>Bayesian optimization (BO) is a powerful framework for estimating parameters of expensive simulation models, particularly in settings where the likelihood is intractable and evaluations are costly. In stochastic models every simulation is run with a specific parameter set and an implicit or explicit random seed, where each parameter set and random seed combination generates an individual realization, or trajectory, sampled from an underlying random process. Existing BO approaches typically rely on summary statistics over the realizations, such as means, medians, or quantiles, potentially limiting their effectiveness when trajectory-level information is desired. We propose a trajectory-oriented BO method that incorporates a Gaussian process surrogate using both input parameters and random seeds as inputs, enabling direct inference at the trajectory level. Using a common random number approach, we define a surrogate-based likelihood over trajectories and introduce an adaptive Thompson Sampling algorithm that refines a fixed-size input grid through likelihood-based filtering and Metropolis-Hastings-based densification. This approach concentrates computation on statistically promising regions of the input space while balancing exploration and exploitation. We apply the method to stochastic epidemic models, a simple compartmental and a more computationally demanding agent-based model, demonstrating improved sampling efficiency and faster identification of data-consistent trajectories relative to parameter-only inference.</p></details> |  |\n| **[HINTBench: Horizon-agent Intrinsic Non-attack Trajectory Benchmark](https://arxiv.org/abs/2604.13954v1)** | 2026-04-15 | <details><summary>Show</summary><p>Existing agent-safety evaluation has focused mainly on externally induced risks. Yet agents may still enter unsafe trajectories under benign conditions. We study this complementary but underexplored setting through the lens of \\emph{intrinsic} risk, where intrinsic failures remain latent, propagate across long-horizon execution, and eventually lead to high-consequence outcomes. To evaluate this setting, we introduce \\emph{non-attack intrinsic risk auditing} and present \\textbf{HINTBench}, a benchmark of 629 agent trajectories (523 risky, 106 safe; 33 steps on average) supporting three tasks: risk detection, risk-step localization, and intrinsic failure-type identification. Its annotations are organized under a unified five-constraint taxonomy. Experiments reveal a substantial capability gap: strong LLMs perform well on trajectory-level risk detection, but their performance drops to below 35 Strict-F1 on risk-step localization, while fine-grained failure diagnosis proves even harder. Existing guard models transfer poorly to this setting. These findings establish intrinsic risk auditing as an open challenge for agent safety.</p></details> |  |\n| **[Bayesian Joint Modelling of Longitudinal Creatinine Trajectories in Children with Auto-Immune Disorders to Predict Paediatric Kidney Disease Risk in a Single Centre Study](https://arxiv.org/abs/2604.12740v1)** | 2026-04-14 | <details><summary>Show</summary><p>This study investigates the relationship between longitudinal serum creatinine measurements and the risk of adverse kidney outcomes in paediatric patients with auto-immune disorders at Great Ormond Street Hospital for Children NHS Foundation Trust, London. To jointly analyse repeated biomarker measurements and time-to-event outcomes, we employed a joint modelling framework that combines the creatinine trajectories with the time to death or diagnosis of acute kidney injury or chronic kidney disease. Covariates considered in analysis included demographic and clinical characteristics. The results demonstrate a strong association between evolving creatinine profiles and the risk of the composite event. Specifically, treatment with corticosteroids and calcium channel blockers was associated with an increased event risk, whereas immunosuppressive therapy was associated with a reduced risk. The longitudinal component showed that creatinine trajectories were significantly influenced by age and BMI z-score. To demonstrate the practical utility of the proposed framework, dynamic risk predictions were generated using patients' observed creatinine trajectories. Model performance was compared using model selection criteria, alongside area under the curve and Brier score to evaluate the accuracy of dynamic risk predictions. These predictions illustrate the potential of joint models to support personalised medicine and clinical decision making in paediatric nephrology through real-time risk assessment.</p></details> |  |\n| **[FeaXDrive: Feasibility-aware Trajectory-Centric Diffusion Planning for End-to-End Autonomous Driving](https://arxiv.org/abs/2604.12656v1)** | 2026-04-14 | <details><summary>Show</summary><p>End-to-end diffusion planning has shown strong potential for autonomous driving, but the physical feasibility of generated trajectories remains insufficiently addressed. In particular, generated trajectories may exhibit local geometric irregularities, violate trajectory-level kinematic constraints, or deviate from the drivable area, indicating that the commonly used noise-centric formulation in diffusion planning is not yet well aligned with the trajectory space where feasibility is more naturally characterized. To address this issue, we propose FeaXDrive, a feasibility-aware trajectory-centric diffusion planning method for end-to-end autonomous driving. The core idea is to treat the clean trajectory as the unified object for feasibility-aware modeling throughout the diffusion process. Built on this trajectory-centric formulation, FeaXDrive integrates adaptive curvature-constrained training to improve intrinsic geometric and kinematic feasibility, drivable-area guidance within reverse diffusion sampling to enhance consistency with the drivable area, and feasibility-aware GRPO post-training to further improve planning performance while balancing trajectory-space feasibility. Experiments on the NAVSIM benchmark show that FeaXDrive achieves strong closed-loop planning performance while substantially improving trajectory-space feasibility. These findings highlight the importance of explicitly modeling trajectory-space feasibility in end-to-end diffusion planning and provide a step toward more reliable and physically grounded autonomous driving planners.</p></details> | 21 pages, 6 figures |\n| **[SparseWorld-TC: Trajectory-Conditioned Sparse Occupancy World Model](https://arxiv.org/abs/2511.22039v3)** | 2026-04-14 | <details><summary>Show</summary><p>This paper introduces a novel architecture for trajectory-conditioned forecasting of future 3D scene occupancy. In contrast to methods that rely on variational autoencoders (VAEs) to generate discrete occupancy tokens, which inherently limit representational capacity, our approach predicts multi-frame future occupancy in an end-to-end manner directly from raw image features. Inspired by the success of attention-based transformer architectures in foundational vision and language models such as GPT and VGGT, we employ a sparse occupancy representation that bypasses the intermediate bird's eye view (BEV) projection and its explicit geometric priors. This design allows the transformer to capture spatiotemporal dependencies more effectively. By avoiding both the finite-capacity constraint of discrete tokenization and the structural limitations of BEV representations, our method achieves state-of-the-art performance on the nuScenes benchmark for 1-3 second occupancy forecasting, outperforming existing approaches by a significant margin. Furthermore, it demonstrates robust scene dynamics understanding, consistently delivering high accuracy under arbitrary future trajectory conditioning.</p></details> | <details><summary>Accep...</summary><p>Accepted by CVPR2026 as an oral</p></details> |\n| **[Scalable Trajectory Generation for Whole-Body Mobile Manipulation](https://arxiv.org/abs/2604.12565v1)** | 2026-04-14 | <details><summary>Show</summary><p>Robots deployed in unstructured environments must coordinate whole-body motion -- simultaneously moving a mobile base and arm -- to interact with the physical world. This coupled mobility and dexterity yields a state space that grows combinatorially with scene and object diversity, demanding datasets far larger than those sufficient for fixed-base manipulation. Yet existing acquisition methods, including teleoperation and planning, are either labor-intensive or computationally prohibitive at scale. The core bottleneck is the lack of a scalable pipeline for generating large-scale, physically valid, coordinated trajectory data across diverse embodiments and environments. Here we introduce AutoMoMa, a GPU-accelerated framework that unifies AKR modeling, which consolidates base, arm, and object kinematics into a single chain, with parallelized trajectory optimization. AutoMoMa achieves 5,000 episodes per GPU-hour (over $80\\times$ faster than CPU-based baselines), producing a dataset of over 500k physically valid trajectories spanning 330 scenes, diverse articulated objects, and multiple robot embodiments. Prior datasets were forced to compromise on scale, diversity, or kinematic fidelity; AutoMoMa addresses all three simultaneously. Training downstream IL policies further reveals that even a single articulated-object task requires tens of thousands of demonstrations for SOTA methods to reach $\\approx 80\\%$ success, confirming that data scarcity -- not algorithmic limitations -- has been the binding constraint. AutoMoMa thus bridges high-performance planning and reliable IL-based control, providing the infrastructure previously missing for coordinated mobile manipulation research. By making large-scale, kinematically valid training data practical, AutoMoMa showcases generalizable whole-body robot policies capable of operating in the diverse, unstructured settings of the real world.</p></details> |  |\n| **[Forecasting the Past: Gradient-Based Distribution Shift Detection in Trajectory Prediction](https://arxiv.org/abs/2604.12425v1)** | 2026-04-14 | <details><summary>Show</summary><p>Trajectory prediction models often fail in real-world automated driving due to distributional shifts between training and test conditions. Such distributional shifts, whether behavioural or environmental, pose a critical risk by causing the model to make incorrect forecasts in unfamiliar situations. We propose a self-supervised method that trains a decoder in a post-hoc fashion on the self-supervised task of forecasting the second half of observed trajectories from the first half. The L2 norm of the gradient of this forecasting loss with respect to the decoder's final layer defines a score to identify distribution shifts. Our approach, first, does not affect the trajectory prediction model, ensuring no interference with original prediction performance and second, demonstrates substantial improvements on distribution shift detection for trajectory prediction on the Shifts and Argoverse datasets. Moreover, we show that this method can also be used to early detect collisions of a deep Q-Network motion planner in the Highway simulator. Source code is available at https://github.com/Michedev/forecasting-the-past.</p></details> | <details><summary>Accep...</summary><p>Accepted at CVPRW SAIAD 2026</p></details> |\n| **[Resident fitness computation in linear time and other algorithmic aspects of interacting trajectories](https://arxiv.org/abs/2502.11561v3)** | 2026-04-14 | <details><summary>Show</summary><p>Systems of interacting trajectories were recently studied in~\\cite{HGSTW24}. Such a system of $[0,1]$-valued piecewise linear trajectories arises as a scaling limit of the system of logarithmic subpopulation sizes in a population-genetic model (more precisely, a Moran model) with mutation and selection. By definition, the resident fitness is initially 0 and afterwards it increases by the ultimate slope of each trajectory that reaches height 1. We show that although the interaction of $n$ trajectories may yield $Ω(n^2)$ slope changes in total, the resident fitness function can be computed algorithmically in $O(n)$ time. Our algorithm uses the so-called continued lines representation of the system of interacting trajectories. In the special case of Poissonian interacting trajectories where the birth times of the trajectories form a Poisson process and the initial slopes are random and i.i.d., we provide a linear bound on the expected total number of slope changes.</p></details> | 17 pages, 4 figures |\n| **[PILOT: Planning via Internalized Latent Optimization Trajectories for Large Language Models](https://arxiv.org/abs/2601.19917v2)** | 2026-04-14 | <details><summary>Show</summary><p>Strategic planning is critical for multi-step reasoning, yet compact Large Language Models (LLMs) often lack the capacity to formulate global strategies, leading to error propagation in long-horizon tasks. Our analysis reveals that LLMs possess latent reasoning capabilities that can be unlocked when conditioned on explicit plans from a teacher model; however, runtime reliance on external guidance is often impractical due to latency and availability constraints. To bridge this gap, we propose PILOT (Planning via Internalized Latent Optimization Trajectories), a non-invasive framework designed to internalize the strategic oversight of large models into intrinsic Latent Guidance. Instead of altering backbone weights, PILOT employs a lightweight Hyper-Network to synthesize a query-conditioned Latent Guidance vector. This vector acts as an internal steering mechanism, guiding the model's representations toward optimal reasoning paths. Extensive experiments on mathematical and coding benchmarks demonstrate that PILOT effectively stabilizes reasoning trajectories, consistently outperforming strong baselines (e.g., +8.9% on MATH500) with negligible inference latency.</p></details> |  |\n| **[Improved particle swarm optimization algorithm: multi-target trajectory optimization for swarm drones](https://arxiv.org/abs/2507.13647v2)** | 2026-04-14 | <details><summary>Show</summary><p>Real-time trajectory planning for unmanned aerial vehicles (UAVs) in dynamic environments remains a key challenge due to high computational demands and the need for fast, adaptive responses. Traditional Particle Swarm Optimization (PSO) methods, while effective for offline planning, often struggle with premature convergence and latency in real-time scenarios. To overcome these limitations, we propose PE-PSO, an enhanced PSO-based online trajectory planner. The method introduces a persistent exploration mechanism to preserve swarm diversity and an entropy-based parameter adjustment strategy to dynamically adapt optimization behavior. UAV trajectories are modeled using B-spline curves, which ensure path smoothness while reducing optimization complexity. To extend this capability to UAV swarms, we develop a multi-agent framework that combines genetic algorithm (GA)-based task allocation with distributed PE-PSO, supporting scalable and coordinated trajectory generation. The distributed architecture allows for parallel computation and decentralized control, enabling effective cooperation among agents while maintaining real-time performance. Comprehensive simulations demonstrate that the proposed framework outperforms conventional PSO and other swarm-based planners across several metrics, including trajectory quality, energy efficiency, obstacle avoidance, and computation time. These results confirm the effectiveness and applicability of PE-PSO in real-time multi-UAV operations under complex environmental conditions.</p></details> | <details><summary>New e...</summary><p>New experiments have revealed systematic errors in the original data</p></details> |\n| **[Joint Trajectory and Resource Optimization for Aerial RIS-assisted Integrated TNT Networks](https://arxiv.org/abs/2604.12283v1)** | 2026-04-14 | <details><summary>Show</summary><p>Integrated terrestrial and non-terrestrial networks (ITNTNs) are regarded as a key architectural paradigm for sixth-generation (6G) wireless systems. This paper investigates a dual-aerial reconfigurable intelligent surface (RIS)-assisted ITNTN, where a terrestrial base station (TBS) and a satellite (SAT) jointly serve terrestrial and satellite users with the aid of an unmanned aerial vehicle (UAV)-mounted RIS and a high-altitude platform (HAP)-mounted RIS. We formulate an average sum-rate maximization problem by jointly optimizing the TBS and SAT precoders, the RIS phase shift matrices, and the three-dimensional trajectories of the UAV and the HAP, subject to transmit power, unit-modulus, and mobility constraints. The resulting optimization problem is highly non-convex due to the strong coupling among the transmit precoders, RIS phase shifts, and aerial platform mobility. To efficiently address this challenge, we propose a block coordinate descent (BCD) framework that integrates weighted minimum mean square error (WMMSE) optimization for precoder design, a manifold-based Riemannian conjugate gradient (RCG) method for RIS phase-shift optimization, and successive convex approximation (SCA) for trajectory optimization. The proposed algorithm is shown to converge to a stationary point. The simulation results show that the proposed joint design achieves an approximately $7.05 \\%$ higher average sum-rate compared to the random RIS scheme, highlighting the effectiveness of dual-aerial RIS deployment and joint communication-mobility optimization in ITNTNs.</p></details> | <details><summary>This ...</summary><p>This work has been submitted to IEEE Transactions on Wireless Communications for possible publication</p></details> |\n| **[Joint Trajectory and Resource Optimization for Dual-aerial ARIS-assisted NOMA-TNT Networks](https://arxiv.org/abs/2604.12266v1)** | 2026-04-14 | <details><summary>Show</summary><p>Integrated terrestrial and non-terrestrial networks (ITNTNs) are envisioned as a key paradigm for sixth-generation (6G) wireless systems, enabling seamless global connectivity. In this paper, we investigate a dual-aerial active reconfigurable intelligent surface (ARIS)-assisted non-orthogonal multiple access (NOMA)-based ITNTN, where a terrestrial base station (TBS) and a satellite (SAT) simultaneously serve terrestrial and satellite users with the aid of a UAV-mounted ARIS and a HAP-mounted ARIS. Users are multiplexed via power-domain NOMA with a predefined SIC decoding order. We formulate an average sum-rate maximization problem by jointly optimizing transmit beamforming, ARIS coefficients, and the 3D trajectories of the UAV and HAP, subject to power, unit-modulus, ARIS power, and mobility constraints. The problem is highly non-convex due to coupled variables, nonlinear SINR expressions, ARIS amplification, and trajectory-dependent channels. To address this, a block coordinate descent (BCD)-based framework is proposed. Specifically, beamforming is optimized via WMMSE, ARIS phase shifts via a manifold-based RCG method, amplification factors via SCA, and trajectories via first-order approximations. The proposed algorithm is guaranteed to converge to a stationary point. Simulation results demonstrate that the proposed design achieves significant performance gains over benchmark schemes. In particular, it provides an average sum-rate improvement of approximately $8.44\\%$ over passive RIS under given power constraints, highlighting the benefits of dual-aerial ARIS and joint communication-mobility optimization.</p></details> | <details><summary>This ...</summary><p>This work has been submitted to IEEE Transactions on Communications for possible publication</p></details> |\n| **[Characterizing Human Semantic Navigation in Concept Production as Trajectories in Embedding Space](https://arxiv.org/abs/2602.05971v2)** | 2026-04-14 | <details><summary>Show</summary><p>Semantic representations can be framed as a structured, dynamic knowledge space through which humans navigate to retrieve and manipulate meaning. To investigate how humans traverse this geometry, we introduce a framework that represents concept production as navigation through embedding space. Using different transformer text embedding models, we construct participant-specific semantic trajectories based on cumulative embeddings and extract geometric and dynamical metrics, including distance to next, distance to centroid, entropy, velocity, and acceleration. These measures capture both scalar and directional aspects of semantic navigation, providing a computationally grounded view of semantic representation search as movement in a geometric space. We evaluate the framework on four datasets across different languages, spanning different property generation tasks: Neurodegenerative, Swear verbal fluency, Property listing task in Italian, and in German. Across these contexts, our approach distinguishes between clinical groups and concept types, offering a mathematical framework that requires minimal human intervention compared to typical labor-intensive linguistic pre-processing methods. Comparison with a non-cumulative approach reveals that cumulative embeddings work best for longer trajectories, whereas shorter ones may provide too little context, favoring the non-cumulative alternative. Critically, different embedding models yielded similar results, highlighting similarities between different learned representations despite different training pipelines. By framing semantic navigation as a structured trajectory through embedding space, bridging cognitive modeling with learned representation, thereby establishing a pipeline for quantifying semantic representation dynamics with applications in clinical research, cross-linguistic analysis, and the assessment of artificial cognition.</p></details> | <details><summary>10 pa...</summary><p>10 pages, 6 figures (excluding refs/appendix). Accepted to ICLR 2026</p></details> |\n| **[AGMA: Adaptive Gaussian Mixture Anchors for Prior-Guided Multimodal Human Trajectory Forecasting](https://arxiv.org/abs/2602.04204v2)** | 2026-04-14 | <details><summary>Show</summary><p>Human trajectory forecasting requires capturing the multimodal nature of pedestrian behavior. However, existing approaches suffer from prior misalignment. Their learned or fixed priors often fail to capture the full distribution of plausible futures, limiting both prediction accuracy and diversity. We theoretically establish that prediction error is lower-bounded by prior quality, making prior modeling a key performance bottleneck. Guided by this insight, we propose AGMA (Adaptive Gaussian Mixture Anchors), which constructs expressive priors through two stages: extracting diverse behavioral patterns from training data and distilling them into a scene-adaptive global prior for inference. Extensive experiments on ETH-UCY, Stanford Drone, and JRDB datasets demonstrate that AGMA achieves state-of-the-art performance, confirming the critical role of high-quality priors in trajectory forecasting.</p></details> | <details><summary>Withd...</summary><p>Withdrawn for substantial revision and will be re-uploaded as a new manuscript</p></details> |\n| **[Uncertainty Guided Exploratory Trajectory Optimization for Sampling-Based Model Predictive Control](https://arxiv.org/abs/2604.12149v1)** | 2026-04-13 | <details><summary>Show</summary><p>Trajectory optimization depends heavily on initialization. In particular, sampling-based approaches are highly sensitive to initial solutions, and limited exploration frequently leads them to converge to local minima in complex environments. We present Uncertainty Guided Exploratory Trajectory Optimization (UGE-TO), a trajectory optimization algorithm that generates well-separated samples to achieve a better coverage of the configuration space. UGE-TO represents trajectories as probability distributions induced by uncertainty ellipsoids. Unlike sampling-based approaches that explore only in the action space, this representation captures the effects of both system dynamics and action selection. By incorporating the impact of dynamics, in addition to the action space, into our distributions, our method enhances trajectory diversity by enforcing distributional separation via the Hellinger distance between them. It enables a systematic exploration of the configuration space and improves robustness against local minima. Further, we present UGE-MPC, which integrates UGE-TO into sampling-based model predictive controller methods. Experiments demonstrate that UGE-MPC achieves higher exploration and faster convergence in trajectory optimization compared to baselines under the same sampling budget, achieving 72.1% faster convergence in obstacle-free environments and 66% faster convergence with a 6.7% higher success rate in the cluttered environment compared to the best-performing baseline. Additionally, we validate the approach through a range of simulation scenarios and real-world experiments. Our results indicate that UGE-MPC has higher success rates and faster convergence, especially in environments that demand significant deviations from nominal trajectories to avoid failures. The project and code are available at https://ogpoyrazoglu.github.io/cuniform_sampling/.</p></details> | <details><summary>This ...</summary><p>This paper has been accepted for presentation at the IEEE International Conference on Robotics and Automation (ICRA) 2026</p></details> |\n| **[Temporal Flattening in LLM-Generated Text: Comparing Human and LLM Writing Trajectories](https://arxiv.org/abs/2604.12097v1)** | 2026-04-13 | <details><summary>Show</summary><p>Large language models (LLMs) are increasingly used in daily applications, from content generation to code writing, where each interaction treats the model as stateless, generating responses independently without memory. Yet human writing is inherently longitudinal: authors' styles and cognitive states evolve across months and years. This raises a central question: can LLMs reproduce such temporal structure across extended time periods? We construct and publicly release a longitudinal dataset of 412 human authors and 6,086 documents spanning 2012--2024 across three domains (academic abstracts, blogs, news) and compare them to trajectories generated by three representative LLMs under standard and history-conditioned generation settings. Using drift and variance-based metrics over semantic, lexical, and cognitive-emotional representations, we find temporal flattening in LLM-generated text. LLMs produce greater lexical diversity but exhibit substantially reduced semantic and cognitive-emotional drift relative to humans. These differences are highly predictive: temporal variability patterns alone achieve 94% accuracy and 98% ROC-AUC in distinguishing human from LLM trajectories. Our results demonstrate that temporal flattening persists regardless of whether LLMs generate independently or with access to incremental history, revealing a fundamental property of current deployment paradigms. This gap has direct implications for applications requiring authentic temporal structure, such as synthetic training data and longitudinal text modeling.</p></details> | <details><summary>25 pa...</summary><p>25 pages, 6 figures. To appear in Findings of ACL 2026</p></details> |\n| **[VISTA: Validation-Informed Trajectory Adaptation via Self-Distillation](https://arxiv.org/abs/2604.12044v1)** | 2026-04-13 | <details><summary>Show</summary><p>Deep learning models may converge to suboptimal solutions despite strong validation accuracy, masking an optimization failure we term Trajectory Deviation. This is because as training proceeds, models can abandon high generalization states for specific data sub-populations, thus discarding previously learned latent features without triggering classical overfitting signals. To address this problem we introduce VISTA, an online self-distillation framework that enforces consistency along the optimization trajectory. Using a validation-informed Marginal Coverage score, VISTA identifies expert anchors, which are earlier model states that retain specialized competence over distinct data regions. A coverage-weighted ensemble of these anchors is integrated online during training, regularizing the loss landscape and preserving mastered knowledge. When evaluated across multiple benchmarks, VISTA demonstrates improved robustness and generalization over standard training and prior self-distillation methods, while a lightweight implementation reduces storage overhead by 90% without performance loss.</p></details> |  |\n| **[Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration](https://arxiv.org/abs/2604.11446v1)** | 2026-04-13 | <details><summary>Show</summary><p>Recently, scaling reinforcement learning with verifiable rewards (RLVR) for large language models (LLMs) has emerged as an effective training paradigm for significantly improving model capabilities, which requires guiding the model to perform extensive exploration and learning, leading to substantial computational overhead and becoming a key challenge. To reduce the number of training steps, Prior work performs linear extrapolation of model parameters. However, the dynamics of model parameter updates during RLVR training remain insufficiently understood. To further investigate the evolution of LLMs during RLVR training, we conduct empirical experiments and find that the rank-1 subspace of the model does not evolve linearly, and its dominance over the original parameters is further amplified during LoRA training. Based on the above insights, we propose the \\textbf{N}onlinear \\textbf{Ext}rapolation of low-rank trajectories (\\textbf{NExt}), a novel framework that models and extrapolates low-rank parameter trajectories in a nonlinear manner. Concretely, we first train the model using LoRA and extract the rank-1 subspace of parameter differences at multiple training steps, which is then used for the subsequent nonlinear extrapolation. Afterward, we utilized the extracted rank-1 subspace to train a predictor, which can model the trajectory of parameter updates during RLVR, and then perform the predict-extend process to extrapolate model parameters, achieving the acceleration of RLVR. To further study and understand NExt, we conduct comprehensive experiments that demonstrate the effectiveness and robustness of the method. Our method reduces computational overhead by approximately 37.5\\% while remaining compatible with a wide range of RLVR algorithms and tasks. We release our code in https://github.com/RUCAIBox/NExt.</p></details> | Working in progress |\n| **[dTRPO: Trajectory Reduction in Policy Optimization of Diffusion Large Language Models](https://arxiv.org/abs/2603.18806v2)** | 2026-04-13 | <details><summary>Show</summary><p>Diffusion Large Language Models (dLLMs) introduce a new paradigm for language generation, which in turn presents new challenges for aligning them with human preferences. In this work, we aim to improve the policy optimization for dLLMs by reducing the cost of the trajectory probability calculation, thereby enabling scaled-up offline policy training. We prove that: (i) under reference policy regularization, the probability ratio of the newly unmasked tokens is an unbiased estimate of that of intermediate diffusion states, and (ii) the probability of the full trajectory can be effectively estimated with a single forward pass of a re-masked final state. By integrating these two trajectory reduction strategies into a policy optimization objective, we propose Trajectory Reduction Policy Optimization (dTRPO). We evaluate dTRPO on 7B dLLMs across instruction-following and reasoning benchmarks. Results show that it substantially improves the core performance of state-of-the-art dLLMs, achieving gains of up to 9.6% on STEM tasks, up to 4.3% on coding tasks, and up to 3.0% on instruction-following tasks. Moreover, dTRPO exhibits strong training efficiency due to its offline, single-forward nature, and achieves improved generation efficiency through high-quality outputs.</p></details> |  |\n| **[Learning from Contrasts: Synthesizing Reasoning Paths from Diverse Search Trajectories](https://arxiv.org/abs/2604.11365v1)** | 2026-04-13 | <details><summary>Show</summary><p>Monte Carlo Tree Search (MCTS) has been widely used for automated reasoning data exploration, but current supervision extraction methods remain inefficient. Standard approaches retain only the single highest-reward trajectory, discarding the comparative signals present in the many explored paths. Here we introduce \\textbf{Contrastive Reasoning Path Synthesis (CRPS)}, a framework that transforms supervision extraction from a filtering process into a synthesis procedure. CRPS uses a structured reflective process to analyze the differences between high- and low-quality search trajectories, extracting explicit information about strategic pivots and local failure modes. These insights guide the synthesis of reasoning chains that incorporate success patterns while avoiding identified pitfalls. We show empirically that models fine-tuned on just 60K CRPS-synthesized examples match or exceed the performance of baselines trained on 590K examples derived from standard rejection sampling, a 20$\\times$ reduction in dataset size. Furthermore, CRPS improves generalization on out-of-domain benchmarks, demonstrating that learning from the contrast between success and failure produces more transferable reasoning capabilities than learning from success alone.</p></details> |  |\n| **[Reliable and Real-Time Highway Trajectory Planning via Hybrid Learning-Optimization Frameworks](https://arxiv.org/abs/2508.04436v2)** | 2026-04-13 | <details><summary>Show</summary><p>Autonomous highway driving involves high-speed safety risks due to limited reaction time, where rare but dangerous events may lead to severe consequences. This places stringent requirements on trajectory planning in terms of both reliability and computational efficiency. This paper proposes a hybrid highway trajectory planning (H-HTP) framework that integrates learning-based adaptability with optimization-based formal safety guarantees. The key design principle is a deliberate division of labor: a learning module generates a traffic-adaptive velocity profile, while all safety-critical decisions including collision avoidance and kinematic feasibility are delegated to a Mixed-Integer Quadratic Program (MIQP). This design ensures that formal safety constraints are always enforced, regardless of the complexity of multi-vehicle interactions. A linearization strategy for the vehicle geometry substantially reduces the number of integer variables, enabling real-time optimization without sacrificing formal safety guarantees. Experiments on the HighD dataset demonstrate that H-HTP achieves a scenario success rate above 97% with an average planning-cycle time of approximately 54 ms, reliably producing smooth, kinematically feasible, and collision-free trajectories in safety-critical highway scenarios.</p></details> |  |\n| **[Mobile GUI Agent Privacy Personalization with Trajectory Induced Preference Optimization](https://arxiv.org/abs/2604.11259v1)** | 2026-04-13 | <details><summary>Show</summary><p>Mobile GUI agents powered by Multimodal Large Language Models (MLLMs) can execute complex tasks on mobile devices. Despite this progress, most existing systems still optimize task success or efficiency, neglecting users' privacy personalization. In this paper, we study the often-overlooked problem of agent personalization. We observe that personalization can induce systematic structural heterogeneity in execution trajectories. For example, privacy-first users often prefer protective actions, e.g., refusing permissions, logging out, and minimizing exposure, leading to logically different execution trajectories from utility-first users. Such variable-length and structurally different trajectories make standard preference optimization unstable and less informative. To address this issue, we propose Trajectory Induced Preference Optimization (TIPO), which uses preference-intensity weighting to emphasize key privacy-related steps and padding gating to suppress alignment noise. Results on our Privacy Preference Dataset show that TIPO improves persona alignment and distinction while preserving strong task executability, achieving 65.60% SR, 46.22 Compliance, and 66.67% PD, outperforming existing optimization methods across various GUI tasks. The code and dataset will be publicly released at https://github.com/Zhixin-L/TIPO.</p></details> | <details><summary>10 pa...</summary><p>10 pages, 6 figures, 3 tables</p></details> |\n| **[MapATM: Enhancing HD Map Construction through Actor Trajectory Modeling](https://arxiv.org/abs/2604.11081v1)** | 2026-04-13 | <details><summary>Show</summary><p>High-definition (HD) mapping tasks, which perform lane detections and predictions, are extremely challenging due to non-ideal conditions such as view occlusions, distant lane visibility, and adverse weather conditions. Those conditions often result in compromised lane detection accuracy and reduced reliability within autonomous driving systems. To address these challenges, we introduce MapATM, a novel deep neural network that effectively leverages historical actor trajectory information to improve lane detection accuracy, where actors refer to moving vehicles. By utilizing actor trajectories as structural priors for road geometry, MapATM achieves substantial performance enhancements, notably increasing AP by 4.6 for lane dividers and mAP by 2.6 on the challenging NuScenes dataset, representing relative improvements of 10.1% and 6.1%, respectively, compared to strong baseline methods. Extensive qualitative evaluations further demonstrate MapATM's capability to consistently maintain stable and robust map reconstruction across diverse and complex driving scenarios, underscoring its practical value for autonomous driving applications.</p></details> | <details><summary>6 pag...</summary><p>6 pages, 4 figures, 5 tables</p></details> |\n| **[VERTIGO: Visual Preference Optimization for Cinematic Camera Trajectory Generation](https://arxiv.org/abs/2604.02467v2)** | 2026-04-13 | <details><summary>Show</summary><p>Cinematic camera control relies on a tight feedback loop between director and cinematographer, where camera motion and framing are continuously reviewed and refined. Recent generative camera systems can produce diverse, text-conditioned trajectories, but they lack this \"director in the loop\" and have no explicit supervision of whether a shot is visually desirable. This results in in-distribution camera motion but poor framing, off-screen characters, and undesirable visual aesthetics. In this paper, we introduce VERTIGO, the first framework for visual preference optimization of camera trajectory generators. Our framework leverages a real-time graphics engine (Unity) to render 2D visual previews from generated camera motion. A cinematically fine-tuned vision-language model then scores these previews using our proposed cyclic semantic similarity mechanism, which aligns renders with text prompts. This process provides the visual preference signals for Direct Preference Optimization (DPO) post-training. Both quantitative evaluations and user studies on Unity renders and diffusion-based Camera-to-Video pipelines show consistent gains in condition adherence, framing quality, and perceptual realism. Notably, VERTIGO reduces the character off-screen rate from 38% to nearly 0% while preserving the geometric fidelity of camera motion. User study participants further prefer VERTIGO over baselines across composition, consistency, prompt adherence, and aesthetic quality, confirming the perceptual benefits of our visual preference post-training.</p></details> | <details><summary>28 pa...</summary><p>28 pages, 10 figures, ECCV 2026</p></details> |\n| **[From Topology to Trajectory: LLM-Driven World Models For Supply Chain Resilience](https://arxiv.org/abs/2604.11041v1)** | 2026-04-13 | <details><summary>Show</summary><p>Semiconductor supply chains face unprecedented resilience challenges amidst global geopolitical turbulence. Conventional Large Language Model (LLM) planners, when confronting such non-stationary \"Policy Black Swan\" events, frequently suffer from Decision Paralysis or a severe Grounding Gap due to the absence of physical environmental modeling. This paper introduces ReflectiChain, a cognitive agentic framework tailored for resilient macroeconomic supply chain planning. The core innovation lies in the integration of Latent Trajectory Rehearsal powered by a generative world model, which couples reflection-in-action (System 2 deliberation) with delayed reflection-on-action. Furthermore, we leverage a Retrospective Agentic RL mechanism to enable autonomous policy evolution during the deployment phase (test-time). Evaluations conducted on our high-fidelity benchmark, Semi-Sim, demonstrate that under extreme scenarios such as export bans and material shortages, ReflectiChain achieves a 250% improvement in average step rewards over the strongest LLM baselines. It successfully restores the Operability Ratio (OR) from a deficient 13.3% to over 88.5% while ensuring robust gradient convergence. Ablation studies further underscore that the synergy between physical grounding constraints and double-loop learning is fundamental to bridging the gap between semantic reasoning and physical reality for long-horizon strategic planning.</p></details> |  |\n| **[Online Covariance Estimation in Averaged SGD: Improved Batch-Mean Rates and Minimax Optimality via Trajectory Regression](https://arxiv.org/abs/2604.10814v1)** | 2026-04-12 | <details><summary>Show</summary><p>We study online covariance matrix estimation for Polyak--Ruppert averaged stochastic gradient descent (SGD). The online batch-means estimator of Zhu, Chen and Wu (2023) achieves an operator-norm convergence rate of $O(n^{-(1-α)/4})$, which yields $O(n^{-1/8})$ at the optimal learning-rate exponent $α\\rightarrow 1/2^+$. A rigorous per-block bias analysis reveals that re-tuning the block-growth parameter improves the batch-means rate to $O(n^{-(1-α)/3})$, achieving $O(n^{-1/6})$. The modified estimator requires no Hessian access and preserves $O(d^2)$ memory. We provide a complete error decomposition into variance, stationarity bias, and nonlinearity bias components. A weighted-averaging variant that avoids hard truncation is also discussed. We establish the minimax rate $Θ(n^{-(1-α)/2})$ for Hessian-free covariance estimation from the SGD trajectory: a Le Cam lower bound gives $Ω(n^{-(1-α)/2})$, and a trajectory-regression estimator--which estimates the Hessian by regressing SGD increments on iterates--achieves $O(n^{-(1-α)/2})$, matching the lower bound. The construction reveals that the bottleneck is the sublinear accumulation of information about the Hessian from the SGD drift.</p></details> |  |\n| **[Adaptive H-EFT-VA: A Provably Safe Trajectory Through the Trainability-Expressibility Landscape of Variational Quantum Algorithms](https://arxiv.org/abs/2604.10607v1)** | 2026-04-12 | <details><summary>Show</summary><p>H-EFT-VA established a physics-informed solution to the Barren Plateau (BP) problem via a hierarchical EFT UV-cutoff, guaranteeing gradient variance in Omega(1/poly(N)). However, localization restricts the ansatz to a polynomial subspace, creating a reference-state gap for states distant from |0>^N. We introduce Adaptive H-EFT-VA (A-H-EFT) to navigate the trainability-expressibility tradeoff by expanding the reachable Hilbert space along a safe trajectory. Gradient variance is maintained in Omega(1/poly(N)) if sigma(t) <= 0.5/sqrt(LN) (Theorem 1). A Safe Expansion Corollary and Monotone Growth Lemma confirm expansion without discontinuous jumps. Benchmarking across 16 experiments (up to N=14) shows A-H-EFT achieves fidelity F=0.54, doubling static H-EFT-VA (F=0.27) and outperforming HEA (F~0.01), with gradient variance >= 0.5 throughout. For Heisenberg XXZ (Delta_ref=1), A-H-EFT identifies the negative ground state while static methods fail. Results are statistically significant (p < 10^-37). Robustness over three decades of hyperparameters enables deployment without search. This is the first rigorously bounded trajectory through the VQA landscape.</p></details> | 17 figures |\n| **[Early Decisions Matter: Proximity Bias and Initial Trajectory Shaping in Non-Autoregressive Diffusion Language Models](https://arxiv.org/abs/2604.10567v1)** | 2026-04-12 | <details><summary>Show</summary><p>Diffusion-based language models (dLLMs) have emerged as a promising alternative to autoregressive language models, offering the potential for parallel token generation and bidirectional context modeling. However, harnessing this flexibility for fully non-autoregressive decoding remains an open question, particularly for reasoning and planning tasks. In this work, we investigate non-autoregressive decoding in dLLMs by systematically analyzing its inference dynamics along the temporal axis. Specifically, we uncover an inherent failure mode in confidence-based non-autoregressive generation stemming from a strong proximity bias-the tendency for the denoising order to concentrate on spatially adjacent tokens. This local dependency leads to spatial error propagation, rendering the entire trajectory critically contingent on the initial unmasking position. Leveraging this insight, we present a minimal-intervention approach that guides early token selection, employing a lightweight planner and end-of-sequence temperature annealing. We thoroughly evaluate our method on various reasoning and planning tasks and observe substantial overall improvement over existing heuristic baselines without significant computational overhead.</p></details> |  |\n| **[Tracing Prompt-Level Trajectories to Understand Student Learning with AI in Programming Education](https://arxiv.org/abs/2604.10400v1)** | 2026-04-12 | <details><summary>Show</summary><p>As AI tools such as ChatGPT enter programming classrooms, students encounter differing rules across courses and instructors, which shape how they use AI and leave them with unequal capabilities for leveraging it. We investigate how students engaged with AI in an introductory Python assignment, analyzing student-LLM chat histories and final code submissions from 163 students. We examined prompt-level strategies, traced trajectories of interaction, and compared AI-generated code with student submissions. We identified trajectories ranging from full delegation to iterative refinement, with hybrid forms in between. Although most students directly copied AI-generated code in their submission, many students scaffolded the code generation through iterative refinement. We also contrasted interaction patterns with assignment outcomes and course performance. Our findings show that prompting trajectories serve as promising windows into students' self-regulation and learning orientation. We draw design implications for educational AI systems that promote personalized and productive student-AI collaborative learning.</p></details> |  |\n| **[Learning to Unscramble: Simplifying Symbolic Expressions via Self-Supervised Oracle Trajectories](https://arxiv.org/abs/2603.11164v2)** | 2026-04-11 | <details><summary>Show</summary><p>We present a new self-supervised machine learning approach for symbolic simplification of complex mathematical expressions. Training data is generated by scrambling simple expressions and recording the inverse operations, creating oracle trajectories that provide both goal states and explicit paths to reach them. A permutation-equivariant, transformer-based policy network is then trained on this data step-wise to predict the oracle action given the input expression. We demonstrate this approach on two problems in high-energy physics: dilogarithm reduction and spinor-helicity scattering amplitude simplification. In both cases, our trained policy network achieves near perfect solve rates across a wide range of difficulty levels, substantially outperforming prior approaches based on reinforcement learning and end-to-end regression. When combined with contrastive grouping and beam search, our model achieves a 100\\% full simplification rate on a representative selection of 5-point gluon tree-level amplitudes in Yang-Mills theory, including expressions with over 200 initial terms.</p></details> | <details><summary>14 pa...</summary><p>14 pages, 6 figures, 2 tables; work done in collaboration with Claude Code; v2: refs added</p></details> |\n| **[A Coordinate-Invariant Local Representation of Motion and Force Trajectories for Identification and Generalization Across Coordinate Systems](https://arxiv.org/abs/2604.10241v1)** | 2026-04-11 | <details><summary>Show</summary><p>Identifying the trajectories of rigid bodies and of interaction forces is essential for a wide range of tasks in robotics, biomechanics, and related domains. These tasks include trajectory segmentation, recognition, and prediction. For these tasks, a key challenge lies in achieving consistent results when the trajectory is expressed in different coordinate systems. A way to address this challenge is to utilize trajectory models that can generalize across coordinate systems. The focus of this paper is on such trajectory models obtained by transforming the trajectory into a coordinate-invariant representation. However, coordinate-invariant representations often suffer from sensitivity to measurement noise and the manifestation of singularities in the representation, where the representation is not uniquely defined. This paper aims to address this limitation by introducing the novel Dual-Upper-Triangular Invariant Representation (DUTIR), with improved robustness to singularities, along with its computational algorithm. The proposed representation is formulated at a level of abstraction that makes it applicable to both rigid-body trajectories and interaction-force trajectories, hence making it a versatile tool for robotics, biomechanics, and related domains.</p></details> | <details><summary>This ...</summary><p>This preprint has been accepted for presentation at the 17th World Symposium on the Algorithmic Foundations of Robotics (WAFR 2026). The preprint corresponds to the version submitted for peer review</p></details> |\n| **[MAVEN-T: Multi-Agent enVironment-aware Enhanced Neural Trajectory predictor with Reinforcement Learning](https://arxiv.org/abs/2604.10169v1)** | 2026-04-11 | <details><summary>Show</summary><p>Trajectory prediction remains a critical yet challenging component in autonomous driving systems, requiring sophisticated reasoning capabilities while meeting strict real-time deployment constraints. While knowledge distillation has demonstrated effectiveness in model compression, existing approaches often fail to preserve complex decision-making capabilities, particularly in dynamic multi-agent scenarios. This paper introduces MAVEN-T, a teacher-student framework that achieves state-of-the-art trajectory prediction through complementary architectural co-design and progressive distillation. The teacher employs hybrid attention mechanisms for maximum representational capacity, while the student uses efficient architectures optimized for deployment. Knowledge transfer is performed via multi-granular distillation with adaptive curriculum learning that dynamically adjusts complexity based on performance. Importantly, the framework incorporates reinforcement learning to overcome the imitation ceiling of traditional distillation, enabling the student to verify, refine, and optimize teacher knowledge through dynamic environmental interaction, potentially achieving more robust decision-making than the teacher itself. Extensive experiments on NGSIM and highD datasets demonstrate 6.2x parameter compression and 3.7x inference speedup while maintaining state-of-the-art accuracy, establishing a new paradigm for deploying sophisticated reasoning models under resource constraints.</p></details> |  |\n| **[Beyond Fluency: Toward Reliable Trajectories in Agentic IR](https://arxiv.org/abs/2604.04269v2)** | 2026-04-11 | <details><summary>Show</summary><p>Information Retrieval is shifting from passive document ranking toward autonomous agentic workflows that operate in multi-step Reason-Act-Observe loops. In such long-horizon trajectories, minor early errors can cascade, leading to functional misalignment between internal reasoning and external tool execution despite continued linguistic fluency. This position paper synthesizes failure modes observed in industrial agentic systems, categorizing errors across planning, retrieval, reasoning, and execution. We argue that safe deployment requires moving beyond endpoint accuracy toward trajectory integrity and causal attribution. To address compounding error and deceptive fluency, we propose verification gates at each interaction unit and advocate systematic abstention under calibrated uncertainty. Reliable Agentic IR systems must prioritize process correctness and grounded execution over plausible but unverified completion.</p></details> |  |\n| **[Ontological Trajectory Forecasting via Finite Semigroup Iteration and Lie Algebra Approximation in Geopolitical Knowledge Graphs](https://arxiv.org/abs/2604.10087v1)** | 2026-04-11 | <details><summary>Show</summary><p>We present EL-DRUIN, an ontological reasoning system for geopolitical intelligence analysis that combines formal ontology, finite semigroup algebra, and Lie algebra approximation to forecast long-run relationship trajectories. Current LLM-based political analysis systems operate as summarisation engines, producing outputs bounded by textual pattern matching. EL-DRUIN departs from this paradigm by modelling geopolitical relationships as states in a finite set of named Dynamic Patterns, composing patterns via a semigroup operation whose structure constants are defined by an explicit composition table, and embedding each pattern as a vector in an 8-dimensional semantic Lie algebra space. Forward simulation iterates this semigroup operation, yielding reachable pattern sets at each discrete timestep; convergence to idempotent absorbing states (fixed points of the composition) constitutes the predicted long-run attractor. Bayesian posterior weights combine ontology-derived confidence priors with a Lie similarity term measuring the cosine similarity between the vector sum of composing patterns and the target pattern vector, providing interpretable, calibrated probabilities that are not self-reported by a language model. Bifurcation points -- steps at which two candidate attractors have near-equal posterior mass -- are detected and exposed to downstream analysis. We demonstrate the framework on six geopolitical scenarios including US-China technology decoupling and the Taiwan Strait military coercion trajectory. The architecture is publicly available as an open-source system with a Streamlit frontend exposing full computation traces, Bayesian posterior breakdowns, and 8D ontological state vectors.</p></details> | <details><summary>18 pa...</summary><p>18 pages. Code and system available at https://github.com/wuqihang-brave/El-druin</p></details> |\n| **[AccidentSim: Generating Vehicle Collision Videos with Physically Realistic Collision Trajectories from Real-World Accident Reports](https://arxiv.org/abs/2503.20654v4)** | 2026-04-11 | <details><summary>Show</summary><p>Collecting real-world vehicle accident videos for autonomous driving research is challenging due to their rarity and complexity. While existing driving video generation methods may produce visually realistic videos, they often fail to deliver physically realistic simulations because they lack the capability to generate accurate post-collision trajectories. In this paper, we introduce AccidentSim, a novel framework that generates physically realistic vehicle collision videos by extracting and utilizing the physical clues and contextual information available in real-world vehicle accident reports. Specifically, AccidentSim leverages a reliable physical simulator to replicate post-collision vehicle trajectories from the physical and contextual information in the accident reports and to build a vehicle collision trajectory dataset. This dataset is then used to fine-tune a language model, enabling it to respond to user prompts and predict physically consistent post-collision trajectories across various driving scenarios based on user descriptions. Finally, we employ Neural Radiance Fields (NeRF) to render high-quality backgrounds, merging them with the foreground vehicles that exhibit physically realistic trajectories to generate vehicle collision videos. Experimental results demonstrate that the videos produced by AccidentSim excel in both visual and physical authenticity.</p></details> |  |\n| **[YUV20K: A Complexity-Driven Benchmark and Trajectory-Aware Alignment Model for Video Camouflaged Object Detection](https://arxiv.org/abs/2604.09985v1)** | 2026-04-11 | <details><summary>Show</summary><p>Video Camouflaged Object Detection (VCOD) is currently constrained by the scarcity of challenging benchmarks and the limited robustness of models against erratic motion dynamics. Existing methods often struggle with Motion-Induced Appearance Instability and Temporal Feature Misalignment caused by complex motion scenarios. To address the data bottleneck, we present YUV20K, a pixel-level annoated complexity-driven VCOD benchmark. Comprising 24,295 annotated frames across 91 scenes and 47 kinds of species, it specifically targets challenging scenarios like large-displacement motion, camera motion and other 4 types scenarios. On the methodological front, we propose a novel framework featuring two key modules: Motion Feature Stabilization (MFS) and Trajectory-Aware Alignment (TAA). The MFS module utilizes frame-agnostic Semantic Basis Primitives to stablize features, while the TAA module leverages trajectory-guided deformable sampling to ensure precise temporal alignment. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art competitors on existing datasets and establishes a new baseline on the challenging YUV20K. Notably, our framework exhibits superior cross-domain generalization and robustness when confronting complex spatiotemporal scenarios. Our code and dataset will be available at https://github.com/K1NSA/YUV20K</p></details> |  |\n| **[Neuro-Oracle: A Trajectory-Aware Agentic RAG Framework for Interpretable Epilepsy Surgical Prognosis](https://arxiv.org/abs/2604.14216v1)** | 2026-04-10 | <details><summary>Show</summary><p>Predicting post-surgical seizure outcomes in pharmacoresistant epilepsy is a clinical challenge. Conventional deep-learning approaches operate on static, single-timepoint pre-operative scans, omitting longitudinal morphological changes. We propose \\emph{Neuro-Oracle}, a three-stage framework that: (i) distils pre-to-post-operative MRI changes into a compact 512-dimensional trajectory vector using a 3D Siamese contrastive encoder; (ii) retrieves historically similar surgical trajectories from a population archive via nearest-neighbour search; and (iii) synthesises a natural-language prognosis grounded in the retrieved evidence using a quantized Llama-3-8B reasoning agent. Evaluations are conducted on the public EPISURG dataset ($N{=}268$ longitudinally paired cases) using five-fold stratified cross-validation. Since ground-truth seizure-freedom scores are unavailable, we utilize a clinical proxy label based on the resection type. We acknowledge that the network representations may potentially learn the anatomical features of the resection cavities (i.e., temporal versus non-temporal locations) rather than true prognostic morphometry. Our current evaluation thus serves mainly as a proof-of-concept for the trajectory-aware retrieval architecture. Trajectory-based classifiers achieve AUC values between 0.834 and 0.905, compared with 0.793 for a single-timepoint ResNet-50 baseline. The Neuro-Oracle agent (M5) matches the AUC of purely discriminative trajectory classifiers (0.867) while producing structured justifications with zero observed hallucinations under our audit protocol. A Siamese Diversity Ensemble (M6) of trajectory-space classifiers attains an AUC of 0.905 without language-model overhead.</p></details> |  |\n| **[LLM4Delay: Flight Delay Prediction via Cross-Modality Adaptation of Large Language Models and Aircraft Trajectory Representation](https://arxiv.org/abs/2510.23636v3)** | 2026-04-10 | <details><summary>Show</summary><p>Flight delay prediction has become a key focus in air traffic management (ATM), as delays reflect inefficiencies in the system. This paper proposes LLM4Delay, a large language model (LLM)-based framework for predicting flight delays from the perspective of air traffic controllers monitoring aircraft after they enter the terminal maneuvering area (TMA). LLM4Delay is designed to integrate textual aeronautical information, including flight data, weather reports, and aerodrome notices, together with multiple trajectories that model airspace conditions, forming a comprehensive delay-relevant context. By jointly leveraging comprehensive textual and trajectory contexts via instance-level projection, an effective cross-modality adaptation strategy that maps multiple instance-level trajectory representations into the language modality, the framework improves delay prediction accuracy. LLM4Delay demonstrates superior performance compared to existing ATM frameworks and prior time-series-to-language adaptation methods. This highlights the complementary roles of textual and trajectory data while leveraging knowledge from both the pretrained trajectory encoder and the pretrained LLM. The proposed framework enables continuous updates to predictions as new information becomes available, indicating potential operational relevance.</p></details> | <details><summary>Prepr...</summary><p>Preprint submitted to IEEE Transactions on Intelligent Transportation Systems (T-ITS) for possible publication</p></details> |\n| **[Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories](https://arxiv.org/abs/2604.09429v1)** | 2026-04-10 | <details><summary>Show</summary><p>Recovering camera parameters from images and rendering scenes from novel viewpoints have long been treated as separate tasks in computer vision and graphics. This separation breaks down when image coverage is sparse or poses are ambiguous, since each task needs what the other produces. We propose Rays as Pixels, a Video Diffusion Model (VDM) that learns a joint distribution over videos and camera trajectories. We represent each camera as dense ray pixels (raxels) and denoise them jointly with video frames through Decoupled Self-Cross Attention mechanism. A single trained model handles three tasks: predicting camera trajectories from video, jointly generating video and camera trajectory from input images, and generating video from input images along a target camera trajectory. Because the model can both predict trajectories from a video and generate views conditioned on its own predictions, we evaluate it through a closed-loop self-consistency test, demonstrating that its forward and inverse predictions agree. Notably, trajectory prediction requires far fewer denoising steps than video generation, even a few denoising steps suffice for self-consistency. We report results on pose estimation and camera-controlled video generation.</p></details> | <details><summary>9 pag...</summary><p>9 pages, 6 figures, 4 tables. Project page: https://wbjang.github.io/raysaspixels/</p></details> |\n| **[TEC: A Collection of Human Trial-and-error Trajectories for Problem Solving](https://arxiv.org/abs/2604.06734v3)** | 2026-04-10 | <details><summary>Show</summary><p>Trial-and-error is a fundamental strategy for humans to solve complex problems and a necessary capability for Artificial Intelligence (AI) systems operating in real-world environments. Although several trial-and-error AI techniques have recently been proposed, most of them rely on simple heuristics designed by researchers and achieve limited performance gains. The core issue is the absence of appropriate data: current models cannot learn from detailed records of how humans actually conduct trial-and-error in practice. To address this gap, we introduce a data annotation platform and a corresponding dataset, termed Trial-and-Error Collection (TEC). The platform records users' complete trajectories across multiple trials and collects their reflections after receiving error feedback. Using this platform, we record the problem-solving processes of 46 participants on 58 tasks, resulting in 5,370 trial trajectories along with error reflections across 41,229 webpages. With this dataset, we observe that humans achieve substantially higher accuracy compared to LLMs, which demonstrates that humans are more effective in trial-and-error than LLMs. We believe that the TEC platform and dataset provide a valuable foundation for understanding human trial-and-error behavior and for developing more capable AI systems. Platform and dataset are publicly available.</p></details> |  |\n| **[Traj2Action: A Co-Denoising Framework for Trajectory-Guided Human-to-Robot Skill Transfer](https://arxiv.org/abs/2510.00491v3)** | 2026-04-10 | <details><summary>Show</summary><p>Learning diverse manipulation skills for real-world robots is severely bottlenecked by the reliance on costly and hard-to-scale teleoperated demonstrations. While human videos offer a scalable alternative, effectively transferring manipulation knowledge is fundamentally hindered by the significant morphological gap between human and robotic embodiments. To address this challenge and facilitate skill transfer from human to robot, we introduce Traj2Action, a novel framework that bridges this embodiment gap by using the 3D trajectory of the operational endpoint as a unified intermediate representation, and then transfers the manipulation knowledge embedded in this trajectory to the robot's actions. Our policy first learns to generate a coarse trajectory, which forms a high-level motion plan by leveraging both human and robot data. This plan then conditions the synthesis of precise, robot-specific actions (e.g., orientation and gripper state) within a co-denoising framework. Our work centers on two core objectives: first, the systematic verification of the Traj2Action framework's effectiveness-spanning architectural design, cross-task generalization, and data efficiency and second, the revelation of key laws that govern robot policy learning during the integration of human hand demonstration data. This research focus enables us to provide a scalable paradigm tailored to address human-to-robot skill transfer across morphological gaps. Extensive real-world experiments on a Franka robot demonstrate that Traj2Action boosts the performance by up to 27% and 22.25% over $π_0$ baseline on short- and long-horizon real-world tasks, and achieves significant gains as human data scales in robot policy learning.</p></details> |  |\n| **[Balancing Functionality and GDPR-Driven Privacy in ISAC Trajectory Sharing](https://arxiv.org/abs/2604.08743v1)** | 2026-04-09 | <details><summary>Show</summary><p>Integrated Sensing and Communications (ISAC) enables trajectory sharing that enhances beamforming, resource allocation, and cooperative perception, yet raises fundamental privacy concerns under the General Data Protection Regulation (GDPR) data minimisation principle. This paper proposes a Fisher Information Density (FID)-constrained trajectory sharing framework that enforces a local lower bound on estimation uncertainty, providing hard, quantifiable privacy guarantees by construction. Unlike fixed-noise approaches, the proposed method bounds the Privacy Leak Ratio (PLR) regardless of sensing power or adversarial post-processing, ensuring that no trajectory segment can be reconstructed beyond a prescribed accuracy threshold. Simulations on the OpenTraj dataset demonstrate that the framework keeps the average PLR below 20-25% and the maximum leakage segment duration under 2-2.5 s, while preserving data utility for downstream tasks such as movement prediction. The resulting criterion is interpretable, model-agnostic, and compatible with GDPR-compliant ISAC system design.</p></details> | <details><summary>Submi...</summary><p>Submitted to IEEE VTC2026-Fall</p></details> |\n| **[State and Trajectory Estimation of Tensegrity Robots via Factor Graphs and Chebyshev Polynomials](https://arxiv.org/abs/2604.08185v1)** | 2026-04-09 | <details><summary>Show</summary><p>Tensegrity robots offer compliance and adaptability, but their nonlinear, and underconstrained dynamics make state estimation challenging. Reliable continuous-time estimation of all rigid links is crucial for closed-loop control, system identification, and machine learning; however, conventional methods often fall short. This paper proposes a two-stage approach for robust state or trajectory estimation (i.e., filtering or smoothing) of a cable-driven tensegrity robot. For online state estimation, this work introduces a factor-graph-based method, which fuses measurements from an RGB-D camera with on-board cable length sensors. To the best of the authors' knowledge, this is the first application of factor graphs in this domain. Factor graphs are a natural choice, as they exploit the robot's structural properties and provide effective sensor fusion solutions capable of handling nonlinearities in practice. Both the Mahalanobis distance-based clustering algorithm, used to handle noise, and the Chebyshev polynomial method, used to estimate the most probable velocities and intermediate states, are shown to perform well on simulated and real-world data, compared to an ICP-based algorithm. Results show that the approach provides high fidelity, continuous-time state and trajectory estimates for complex tensegrity robot motions.</p></details> | <details><summary>Accep...</summary><p>Accepted at Robotsoft 2026</p></details> |\n| **[Aligning Agents via Planning: A Benchmark for Trajectory-Level Reward Modeling](https://arxiv.org/abs/2604.08178v1)** | 2026-04-09 | <details><summary>Show</summary><p>In classical Reinforcement Learning from Human Feedback (RLHF), Reward Models (RMs) serve as the fundamental signal provider for model alignment. As Large Language Models evolve into agentic systems capable of autonomous tool invocation and complex reasoning, the paradigm of reward modeling faces unprecedented challenges--most notably, the lack of benchmarks specifically designed to assess RM capabilities within tool-integrated environments. To address this gap, we present Plan-RewardBench, a trajectory-level preference benchmark designed to evaluate how well judges distinguish preferred versus distractor agent trajectories in complex tool-using scenarios. Plan-RewardBench covers four representative task families -- (i) Safety Refusal, (ii) Tool-Irrelevance / Unavailability, (iii) Complex Planning, and (iv) Robust Error Recovery -- comprising validated positive trajectories and confusable hard negatives constructed via multi-model natural rollouts, rule-based perturbations, and minimal-edit LLM perturbations. We benchmark representative RMs (generative, discriminative, and LLM-as-Judge) under a unified pairwise protocol, reporting accuracy trends across varying trajectory lengths and task categories. Furthermore, we provide diagnostic analyses of prevalent failure modes. Our results reveal that all three evaluator families face substantial challenges, with performance degrading sharply on long-horizon trajectories, underscoring the necessity for specialized training in agentic, trajectory-level reward modeling. Ultimately, Plan-RewardBench aims to serve as both a practical evaluation suite and a reusable blueprint for constructing agentic planning preference data.</p></details> | <details><summary>13 pa...</summary><p>13 pages, 5 figures, accepted to ACL 2026 main conference</p></details> |\n| **[Force-Aware Residual DAgger via Trajectory Editing for Precision Insertion with Impedance Control](https://arxiv.org/abs/2603.04038v2)** | 2026-04-09 | <details><summary>Show</summary><p>Imitation learning (IL) has shown strong potential for contact-rich precision insertion tasks. However, its practical deployment is often hindered by covariate shift and the need for continuous expert monitoring to recover from failures during execution. In this paper, we propose Trajectory Editing Residual Dataset Aggregation (TER-DAgger), a scalable and force-aware human-in-the-loop imitation learning framework that mitigates covariate shift by learning residual policies through optimization-based trajectory editing. This approach smoothly fuses policy rollouts with human corrective trajectories, providing consistent and stable supervision. Second, we introduce a force-aware failure anticipation mechanism that triggers human intervention only when discrepancies arise between predicted and measured end-effector forces, significantly reducing the requirement for continuous expert monitoring. Third, all learned policies are executed within a Cartesian impedance control framework, ensuring compliant and safe behavior during contact-rich interactions. Extensive experiments in both simulation and real-world precision insertion tasks show that TER-DAgger improves the average success rate by over 37\\% compared to behavior cloning, human-guided correction, retraining, and fine-tuning baselines, demonstrating its effectiveness in mitigating covariate shift and enabling scalable deployment in contact-rich manipulation.</p></details> |  |\n| **[WorldMAP: Bootstrapping Vision-Language Navigation Trajectory Prediction with Generative World Models](https://arxiv.org/abs/2604.07957v1)** | 2026-04-09 | <details><summary>Show</summary><p>Vision-language models (VLMs) and generative world models are opening new opportunities for embodied navigation. VLMs are increasingly used as direct planners or trajectory predictors, while world models support look-ahead reasoning by imagining future views. Yet predicting a reliable trajectory from a single egocentric observation remains challenging. Current VLMs often generate unstable trajectories, and world models, though able to synthesize plausible futures, do not directly provide the grounded signals needed for navigation learning. This raises a central question: how can generated futures be turned into supervision for grounded trajectory prediction? We present WorldMAP, a teacher--student framework that converts world-model-generated futures into persistent semantic-spatial structure and planning-derived supervision. Its world-model-driven teacher builds semantic-spatial memory from generated videos, grounds task-relevant targets and obstacles, and produces trajectory pseudo-labels through explicit planning. A lightweight student with a multi-hypothesis trajectory head is then trained to predict navigation trajectories directly from vision-language inputs. On Target-Bench, WorldMAP achieves the best ADE and FDE among compared methods, reducing ADE by 18.0% and FDE by 42.1% relative to the best competing baseline, while lifting a small open-source VLM to DTW performance competitive with proprietary models. More broadly, the results suggest that, in embodied navigation, the value of world models may lie less in supplying action-ready imagined evidence than in synthesizing structured supervision for navigation learning.</p></details> |  |\n| **[TrajGuard: Streaming Hidden-state Trajectory Detection for Decoding-time Jailbreak Defense](https://arxiv.org/abs/2604.07727v1)** | 2026-04-09 | <details><summary>Show</summary><p>Existing jailbreak defense paradigms primarily rely on static detection of prompts, outputs, or internal states, often neglecting the dynamic evolution of risk during decoding. This oversight leaves risk signals embedded in decoding trajectories underutilized, constituting a critical blind spot in current defense systems. In this work, we empirically demonstrate that hidden states in critical layers during the decoding phase carry stronger and more stable risk signals than input jailbreak prompts. Specifically, the hidden representations of tokens generated during jailbreak attempts progressively approach high-risk regions in the latent space. Based on this observation, we propose TrajGuard, a training-free, decoding-time defense framework. TrajGuard aggregates hidden-state trajectories via a sliding window to quantify risk in real time, triggering a lightweight semantic adjudication only when risk within a local window persistently exceeds a threshold. This mechanism enables the immediate interruption or constraint of subsequent decoding. Extensive experiments across 12 jailbreak attacks and various open-source LLMs show that TrajGuard achieves an average defense rate of 95%. Furthermore, it reduces detection latency to 5.2 ms/token while maintaining a false positive rate below 1.5%. These results confirm that hidden-state trajectories during decoding can effectively support real-time jailbreak detection, highlighting a promising direction for defenses without model modification.</p></details> | <details><summary>Accep...</summary><p>Accepted to Findings of ACL 2026</p></details> |\n| **[Joint Task Offloading, Inference Optimization and UAV Trajectory Planning for Generative AI Empowered Intelligent Transportation Digital Twin](https://arxiv.org/abs/2604.07687v1)** | 2026-04-09 | <details><summary>Show</summary><p>To implement the intelligent transportation digital twin (ITDT), unmanned aerial vehicles (UAVs) are scheduled to process the sensing data from the roadside sensors. At this time, generative artificial intelligence (GAI) technologies such as diffusion models are deployed on the UAVs to transform the raw sensing data into the high-quality and valuable. Therefore, we propose the GAI-empowered ITDT. The dynamic processing of a set of diffusion model inference (DMI) tasks on the UAVs with dynamic mobility simultaneously influences the DT updating fidelity and delay. In this paper, we investigate a joint optimization problem of DMI task offloading, inference optimization and UAV trajectory planning as the system utility maximization (SUM) problem to address the fidelity-delay tradeoff for the GAI-empowered ITDT. To seek a solution to the problem under the network dynamics, we model the SUM problem as the heterogeneous-agent Markov decision process, and propose the sequential update-based heterogeneous-agent twin delayed deep deterministic policy gradient (SU-HATD3) algorithm, which can quickly learn a near-optimal solution. Numerical results demonstrate that compared with several baseline algorithms, the proposed algorithm has great advantages in improving the system utility and convergence rate.</p></details> |  |\n| **[SANDO: Safe Autonomous Trajectory Planning for Dynamic Unknown Environments](https://arxiv.org/abs/2604.07599v1)** | 2026-04-08 | <details><summary>Show</summary><p>SANDO is a safe trajectory planner for 3D dynamic unknown environments, where obstacle locations and motions are unknown a priori and a collision-free plan can become unsafe at any moment, requiring fast replanning. Existing soft-constraint planners are fast but cannot guarantee collision-free paths, while hard-constraint methods ensure safety at the cost of longer computation. SANDO addresses this trade-off through three contributions. First, a heat map-based A* global planner steers paths away from high-risk regions using soft costs, and a spatiotemporal safe flight corridor (STSFC) generator produces time-layered polytopes that inflate obstacles only by their worst-case reachable set at each time layer, rather than by the worst case over the entire horizon. Second, trajectory optimization is formulated as a Mixed-Integer Quadratic Program (MIQP) with hard collision-avoidance constraints, and a variable elimination technique reduces the number of decision variables, enabling fast computation. Third, a formal safety analysis establishes collision-free guarantees under explicit velocity-bound and estimation-error assumptions. Ablation studies show that variable elimination yields up to 7.4x speedup in optimization time, and that STSFCs are critical for feasibility in dense dynamic environments. Benchmark simulations against state-of-the-art methods across standardized static benchmarks, obstacle-rich static forests, and dynamic environments show that SANDO consistently achieves the highest success rate with no constraint violations across all difficulty levels; perception-only experiments without ground truth obstacle information confirm robust performance under realistic sensing. Hardware experiments on a UAV with fully onboard planning, perception, and localization demonstrate six safe flights in static environments and ten safe flights among dynamic obstacles.</p></details> | 20 pages, 17 figures |\n| **[Active Reward Machine Inference From Raw State Trajectories](https://arxiv.org/abs/2604.07480v1)** | 2026-04-08 | <details><summary>Show</summary><p>Reward machines are automaton-like structures that capture the memory required to accomplish a multi-stage task. When combined with reinforcement learning or optimal control methods, they can be used to synthesize robot policies to achieve such tasks. However, specifying a reward machine by hand, including a labeling function capturing high-level features that the decisions are based on, can be a daunting task. This paper deals with the problem of learning reward machines directly from raw state and policy information. As opposed to existing works, we assume no access to observations of rewards, labels, or machine nodes, and show what trajectory data is sufficient for learning the reward machine in this information-scarce regime. We then extend the result to an active learning setting where we incrementally query trajectory extensions to improve data (and indirectly computational) efficiency. Results are demonstrated with several grid world examples.</p></details> |  |\n| **[Beyond Loss Values: Robust Dynamic Pruning via Loss Trajectory Alignment](https://arxiv.org/abs/2604.07306v1)** | 2026-04-08 | <details><summary>Show</summary><p>Existing dynamic data pruning methods often fail under noisy-label settings, as they typically rely on per-sample loss as the ranking criterion. This could mistakenly lead to preserving noisy samples due to their high loss values, resulting in significant performance drop. To address this, we propose AlignPrune, a noise-robust module designed to enhance the reliability of dynamic pruning under label noise. Specifically, AlignPrune introduces the Dynamic Alignment Score (DAS), which is a loss-trajectory-based criterion that enables more accurate identification of noisy samples, thereby improving pruning effectiveness. As a simple yet effective plug-and-play module, AlignPrune can be seamlessly integrated into state-of-the-art dynamic pruning frameworks, consistently outperforming them without modifying either the model architecture or the training pipeline. Extensive experiments on five widely-used benchmarks across various noise types and pruning ratios demonstrate the effectiveness of AlignPrune, boosting accuracy by up to 6.3\\% over state-of-the-art baselines. Our results offer a generalizable solution for pruning under noisy data, encouraging further exploration of learning in real-world scenarios. Code is available at: https://github.com/leonqin430/AlignPrune.</p></details> | <details><summary>Publi...</summary><p>Published in CVPR 2026 Findings</p></details> |\n| **[TraceSafe: A Systematic Assessment of LLM Guardrails on Multi-Step Tool-Calling Trajectories](https://arxiv.org/abs/2604.07223v1)** | 2026-04-08 | <details><summary>Show</summary><p>As large language models (LLMs) evolve from static chatbots into autonomous agents, the primary vulnerability surface shifts from final outputs to intermediate execution traces. While safety guardrails are well-benchmarked for natural language responses, their efficacy remains largely unexplored within multi-step tool-use trajectories. To address this gap, we introduce TraceSafe-Bench, the first comprehensive benchmark specifically designed to assess mid-trajectory safety. It encompasses 12 risk categories, ranging from security threats (e.g., prompt injection, privacy leaks) to operational failures (e.g., hallucinations, interface inconsistencies), featuring over 1,000 unique execution instances. Our evaluation of 13 LLM-as-a-guard models and 7 specialized guardrails yields three critical findings: 1) Structural Bottleneck: Guardrail efficacy is driven more by structural data competence (e.g., JSON parsing) than semantic safety alignment. Performance correlates strongly with structured-to-text benchmarks ($ρ=0.79$) but shows near-zero correlation with standard jailbreak robustness. 2) Architecture over Scale: Model architecture influences risk detection performance more significantly than model size, with general-purpose LLMs consistently outperforming specialized safety guardrails in trajectory analysis. 3) Temporal Stability: Accuracy remains resilient across extended trajectories. Increased execution steps allow models to pivot from static tool definitions to dynamic execution behaviors, actually improving risk detection performance in later stages. Our findings suggest that securing agentic workflows requires jointly optimizing for structural reasoning and safety alignment to effectively mitigate mid-trajectory risks.</p></details> |  |\n| **[VHOI: Controllable Video Generation of Human-Object Interactions from Sparse Trajectories via Motion Densification](https://arxiv.org/abs/2512.09646v2)** | 2026-04-08 | <details><summary>Show</summary><p>Synthesizing realistic human-object interactions (HOI) in video is challenging due to the complex, instance-specific interaction dynamics of both humans and objects. Incorporating controllability in video generation further adds to the complexity. Existing controllable video generation approaches face a trade-off: sparse controls like keypoint trajectories are easy to specify but lack instance-awareness, while dense signals such as optical flow, depths or 3D meshes are informative but costly to obtain. We propose VHOI, a two-stage framework that first densifies sparse trajectories into HOI mask sequences, and then fine-tunes a video diffusion model conditioned on these dense masks. We introduce a novel HOI-aware motion representation that uses color encodings to distinguish not only human and object motion, but also body-part-specific dynamics. This design incorporates a human prior into the conditioning signal and strengthens the model's ability to understand and generate realistic HOI dynamics. Experiments demonstrate state-of-the-art results in controllable HOI video generation. VHOI is not limited to interaction-only scenarios and can also generate full human navigation leading up to object interactions in an end-to-end manner. Project page: https://vcai.mpi-inf.mpg.de/projects/vhoi/.</p></details> |  |\n| **[Self-Discovered Intention-aware Transformer for Multi-modal Vehicle Trajectory Prediction](https://arxiv.org/abs/2604.07126v1)** | 2026-04-08 | <details><summary>Show</summary><p>Predicting vehicle trajectories plays an important role in autonomous driving and ITS applications. Although multiple deep learning algorithms are devised to predict vehicle trajectories, their reliant on specific graph structure (e.g., Graph Neural Network) or explicit intention labeling limit their flexibilities. In this study, we propose a pure Transformer-based network with multiple modals considering their neighboring vehicles. Two separate tracks are employed. One track focuses on predicting the trajectories while the other focuses on predicting the likelihood of each intention considering neighboring vehicles. Study finds that the two track design can increase the performance by separating spatial module from the trajectory generating module. Also, we find the the model can learn an ordered group of trajectories by predicting residual offsets among K trajectories.</p></details> | 5 pages, 2 figures |\n| **[Differentiable Environment-Trajectory Co-Optimization for Safe Multi-Agent Navigation](https://arxiv.org/abs/2604.06972v1)** | 2026-04-08 | <details><summary>Show</summary><p>The environment plays a critical role in multi-agent navigation by imposing spatial constraints, rules, and limitations that agents must navigate around. Traditional approaches treat the environment as fixed, without exploring its impact on agents' performance. This work considers environment configurations as decision variables, alongside agent actions, to jointly achieve safe navigation. We formulate a bi-level problem, where the lower-level sub-problem optimizes agent trajectories that minimize navigation cost and the upper-level sub-problem optimizes environment configurations that maximize navigation safety. We develop a differentiable optimization method that iteratively solves the lower-level sub-problem with interior point methods and the upper-level sub-problem with gradient ascent. A key challenge lies in analytically coupling these two levels. We address this by leveraging KKT conditions and the Implicit Function Theorem to compute gradients of agent trajectories w.r.t. environment parameters, enabling differentiation throughout the bi-level structure. Moreover, we propose a novel metric that quantifies navigation safety as a criterion for the upper-level environment optimization, and prove its validity through measure theory. Our experiments validate the effectiveness of the proposed framework in a variety of safety-critical navigation scenarios, inspired from warehouse logistics to urban transportation. The results demonstrate that optimized environments provide navigation guidance, improving both agents' safety and efficiency.</p></details> |  |\n| **[STERN: Simultaneous Trajectory Estimation and Relative Navigation for Autonomous Underwater Proximity Operations](https://arxiv.org/abs/2309.08780v2)** | 2026-04-08 | <details><summary>Show</summary><p>Due to the challenges regarding the limits of their endurance and autonomous capabilities, underwater docking for autonomous underwater vehicles (AUVs) has become a topic of interest for many academic and commercial applications. Herein, we take on the problem of relative navigation for the generalized version of the docking operation, which we address as proximity operations. Proximity operations typically involve only two actors, a chaser and a target. We leverage the similarities to proximity operations (prox-ops) from spacecraft robotic missions to frame the diverse docking scenarios with a set of phases the chaser undergoes on the way to its target. We emphasize the versatility on the use of factor graphs as a generalized representation to model the underlying simultaneous trajectory estimation and relative navigation (STERN) problem that arises with any prox-ops scenario, regardless of the sensor suite or the agents' dynamic constraints. To emphasize the flexibility of factor graphs as the modeling foundation for arbitrary underwater prox-ops, we compile a list of state-of-the-art research in the field and represent the different scenario using the same factor graph representation. We detail the procedure required to model, design, and implement factor graph-based estimators by addressing a long-distance acoustic homing scenario of an AUV to a moving mothership using datasets from simulated and real-world deployments; an analysis of these results is provided to shed light on the flexibility and limitations of the dynamic assumptions of the moving target. A description of our front- and back-end is also presented together with a timing breakdown of all processes to show its potential deployment on a real-time system.</p></details> | <details><summary>v2 up...</summary><p>v2 updated after revision. Article contains 24 pages and 18 figures. Published in the IEEE Journal of Oceanic Engineering, available at: https://doi.org/10.1109/JOE.2025.3624470</p></details> |\n| **[SCT-MOT: Enhancing Air-to-Air Multiple UAVs Tracking with Swarm-Coupled Motion and Trajectory Guidance](https://arxiv.org/abs/2604.06883v1)** | 2026-04-08 | <details><summary>Show</summary><p>Air-to-air tracking of swarm UAVs presents significant challenges due to the complex nonlinear group motion and weak visual cues for small objects, which often cause detection failures, trajectory fragmentation, and identity switches. Although existing methods have attempted to improve performance by incorporating trajectory prediction, they model each object independently, neglecting the swarm-level motion dependencies. Their limited integration between motion prediction and appearance representation also weakens the spatio-temporal consistency required for tracking in visually ambiguous and cluttered environments, making it difficult to maintain coherent trajectories and reliable associations. To address these challenges, we propose SCT-MOT, a tracking framework that integrates Swarm-Coupled motion modeling and Trajectory-guided feature fusion. First, we develop a Swarm Motion-Aware Trajectory Prediction (SMTP) module jointly models historical trajectories and posture-aware appearance features from a swarm-level perspective, enabling more accurate forecasting of the nonlinear, coupled group trajectories. Second, we design a Trajectory-Guided Spatio-Temporal Feature Fusion (TG-STFF) module aligns predicted positions with historical visual cues and deeply integrates them with current frame features, enhancing temporal consistency and spatial discriminability for weak objects. Extensive experiments on three public air-to-air swarm UAV tracking datasets, including AIRMOT, MOT-FLY, and UAVSwarm, demonstrate that SMTP achieves more accurate trajectory forecasts and yields a 1.21\\% IDF1 improvement over the state-of-the-art trajectory prediction module EqMotion when integrated into the same MOT framework. Overall, our SCT-MOT consistently achieves superior accuracy and robustness compared to state-of-the-art trackers across multiple metrics under complex swarm scenarios.</p></details> | <details><summary>17 pa...</summary><p>17 pages, 7 figures. Under review at IEEE Transactions on Aerospace and Electronic Systems (TAES). This work has been submitted to the IEEE for possible publication</p></details> |\n| **[Inference-Time Scaling of Diffusion Language Models via Trajectory Refinement](https://arxiv.org/abs/2507.08390v4)** | 2026-04-08 | <details><summary>Show</summary><p>Discrete diffusion models have recently emerged as strong alternatives to autoregressive language models, matching their performance through large-scale training. However, inference-time control remains relatively underexplored. In this work, we study how to steer generation toward desired rewards without retraining the models. Prior methods typically resample or filter within a single denoising trajectory, optimizing rewards step-by-step without trajectory-level refinement. We introduce particle Gibbs sampling for diffusion language models (PG-DLM), an inference-time algorithm enabling trajectory-level refinement. PG-DLM constructs a Markov chain over full denoising trajectories and applies a conditional sequential Monte Carlo kernel to resample them. By doing so, PG-DLM introduces a new scaling axis, the number of refinement iterations, which is unavailable to prior methods. Increasing iterations remains effective even as gains from adding more parallel samples saturate. Furthermore, PG-DLM enables adaptive compute allocation by performing additional iterations only when needed, leading to further efficiency gains. We derive theoretical guarantees for convergence and variance bounds, and analyze trade-offs across different scaling axes. Empirically, PG-DLM outperforms prior methods across compute budgets on reward-guided generation tasks. On GSM8K, it achieves 90.07% accuracy with 2.9 particles on average and 94.47% accuracy with 16 particles.</p></details> |  |\n| **[ATBench: A Diverse and Realistic Agent Trajectory Benchmark for Safety Evaluation and Diagnosis](https://arxiv.org/abs/2604.02022v2)** | 2026-04-08 | <details><summary>Show</summary><p>Evaluating the safety of LLM-based agents is increasingly important because risks in realistic deployments often emerge over multi-step interactions rather than isolated prompts or final responses. Existing trajectory-level benchmarks remain limited by insufficient interaction diversity, coarse observability of safety failures, and weak long-horizon realism. We introduce ATBench, a trajectory-level benchmark for structured, diverse, and realistic evaluation of agent safety. ATBench organizes agentic risk along three dimensions: risk source, failure mode, and real-world harm. Based on this taxonomy, we construct trajectories with heterogeneous tool pools and a long-context delayed-trigger protocol that captures realistic risk emergence across multiple stages. The benchmark contains 1,000 trajectories (503 safe and 497 unsafe), averaging 9.01 turns and 3.95k tokens, with 1,954 invoked tools drawn from pools spanning 2,084 available tools. Data quality is supported by rule-based and LLM-based filtering plus full human audit. Experiments on frontier LLMs, open-source models, and specialized guard systems show that ATBench is challenging even for strong evaluators, while enabling taxonomy-stratified analysis, cross-benchmark comparison, and diagnosis of long-horizon failure patterns.</p></details> |  |\n| **[TrajectoryMover: Generative Movement of Object Trajectories in Videos](https://arxiv.org/abs/2603.29092v2)** | 2026-04-07 | <details><summary>Show</summary><p>Generative video editing has enabled several intuitive editing operations for short video clips that would previously have been difficult to achieve, especially for non-expert editors. Existing methods focus on prescribing an object's 3D or 2D motion trajectory in a video, or on altering the appearance of an object or a scene, while preserving both the video's plausibility and identity. Yet a method to move an object's 3D motion trajectory in a video, i.e., moving an object while preserving its relative 3D motion, is currently still missing. The main challenge lies in obtaining paired video data for this scenario. Previous methods typically rely on clever data generation approaches to construct plausible paired data from unpaired videos, but this approach fails if one of the videos in a pair can not easily be constructed from the other. Instead, we introduce TrajectoryAtlas, a new data generation pipeline for large-scale synthetic paired video data and a video generator TrajectoryMover fine-tuned with this data. We show that this successfully enables generative movement of object trajectories. Project page: https://chhatrekiran.github.io/trajectorymover</p></details> | <details><summary>24 pa...</summary><p>24 pages, 8 figures. Project page: https://chhatrekiran.github.io/trajectorymover</p></details> |\n| **[AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling](https://arxiv.org/abs/2603.21357v2)** | 2026-04-07 | <details><summary>Show</summary><p>LLM agents fail on the majority of real-world tasks -- GPT-4o succeeds on fewer than 15% of WebArena navigation tasks and below 55% pass@1 on ToolBench (Zhou et al., 2024; Qin et al., 2024) -- yet every failed trajectory is routinely discarded, wasting the dominant source of collected experience. We introduce AgentHER, a framework that recovers this lost training signal by adapting the Hindsight Experience Replay (HER; Andrychowicz et al., 2017) principle to natural-language agent trajectories for offline data augmentation. The key insight is simple: a trajectory that fails goal A is often a correct demonstration for some achievable alternative goal B. AgentHER realises this idea through a four-stage pipeline -- failure classification, outcome extraction, LLM-guided prompt relabeling with confidence gating, and data packaging -- that converts discarded failures into high-quality SFT, DPO, and ShareGPT training data, with both zero-cost rule-based and LLM-judge implementations. On WebArena (Zhou et al., 2024) and ToolBench (Qin et al., 2024), AgentHER improves over success-only SFT by +7.1-11.7 pp across four model families (GPT-4o, Qwen2.5-72B/7B, LLaMA-3.1-8B), while achieving 2x data efficiency -- matching baseline performance with only 50% of successful demonstrations. Gains are consistent from 1.5B to 72B parameters (+5.8-9.2 pp) and compound under iterative redeployment (+2.1 pp over additional rounds). Human evaluation confirms 97.7% relabeling precision under multi-judge verification.</p></details> |  |\n| **[Modeling Patient Care Trajectories with Transformer Hawkes Processes](https://arxiv.org/abs/2604.05844v1)** | 2026-04-07 | <details><summary>Show</summary><p>Patient healthcare utilization consists of irregularly time-stamped events, such as outpatient visits, inpatient admissions, and emergency encounters, forming individualized care trajectories. Modeling these trajectories is crucial for understanding utilization patterns and predicting future care needs, but is challenging due to temporal irregularity and severe class imbalance. In this work, we build on the Transformer Hawkes Process framework to model patient trajectories in continuous time. By combining Transformer-based history encoding with Hawkes process dynamics, the model captures event dependencies and jointly predicts event type and time-to-event. To address extreme imbalance, we introduce an imbalance-aware training strategy using inverse square-root class weighting. This improves sensitivity to rare but clinically important events without altering the data distribution. Experiments on real-world data demonstrate improved performance and provide clinically meaningful insights for identifying high-risk patient populations.</p></details> |  |\n| **[LLM Reasoning as Trajectories: Step-Specific Representation Geometry and Correctness Signals](https://arxiv.org/abs/2604.05655v1)** | 2026-04-07 | <details><summary>Show</summary><p>This work characterizes large language models' chain-of-thought generation as a structured trajectory through representation space. We show that mathematical reasoning traverses functionally ordered, step-specific subspaces that become increasingly separable with layer depth. This structure already exists in base models, while reasoning training primarily accelerates convergence toward termination-related subspaces rather than introducing new representational organization. While early reasoning steps follow similar trajectories, correct and incorrect solutions diverge systematically at late stages. This late-stage divergence enables mid-reasoning prediction of final-answer correctness with ROC-AUC up to 0.87. Furthermore, we introduce trajectory-based steering, an inference-time intervention framework that enables reasoning correction and length control based on derived ideal trajectories. Together, these results establish reasoning trajectories as a geometric lens for interpreting, predicting, and controlling LLM reasoning behavior.</p></details> | ACL 2026 (Main) |\n| **[Goal-Oriented Reactive Simulation for Closed-Loop Trajectory Prediction](https://arxiv.org/abs/2603.24155v2)** | 2026-04-07 | <details><summary>Show</summary><p>Current trajectory prediction models are primarily trained in an open-loop manner, which often leads to covariate shift and compounding errors when deployed in real-world, closed-loop settings. Furthermore, relying on static datasets or non-reactive log-replay simulators severs the interactive loop, preventing the ego agent from learning to actively negotiate surrounding traffic. In this work, we propose an on-policy closed-loop training paradigm optimized for high-frequency, receding horizon ego prediction. To ground the ego prediction in a realistic representation of traffic interactions and to achieve reactive consistency, we introduce a goal-oriented, transformer-based scene decoder, resulting in an inherently reactive training simulation. By exposing the ego agent to a mixture of open-loop data and simulated, self-induced states, the model learns recovery behaviors to correct its own execution errors. Extensive evaluation demonstrates that closed-loop training significantly enhances collision avoidance capabilities at high replanning frequencies, yielding relative collision rate reductions of up to 27.0% on nuScenes and 79.5% in dense DeepScenario intersections compared to open-loop baselines. Additionally, we show that a hybrid simulation combining reactive with non-reactive surrounding agents achieves optimal balance between immediate interactivity and long-term behavioral stability.</p></details> |  |\n| **[A Synthetic Eye Movement Dataset for Script Reading Detection: Real Trajectory Replay on a 3D Simulator](https://arxiv.org/abs/2604.05475v1)** | 2026-04-07 | <details><summary>Show</summary><p>Large vision-language models have achieved remarkable capabilities by training on massive internet-scale data, yet a fundamental asymmetry persists: while LLMs can leverage self-supervised pretraining on abundant text and image data, the same is not true for many behavioral modalities. Video-based behavioral data -- gestures, eye movements, social signals -- remains scarce, expensive to annotate, and privacy-sensitive. A promising alternative is simulation: replace real data collection with controlled synthetic generation to produce automatically labeled data at scale. We introduce infrastructure for this paradigm applied to eye movement, a behavioral signal with applications across vision-language modeling, virtual reality, robotics, accessibility systems, and cognitive science. We present a pipeline for generating synthetic labeled eye movement video by extracting real human iris trajectories from reference videos and replaying them on a 3D eye movement simulator via headless browser automation. Applying this to the task of script-reading detection during video interviews, we release final_dataset_v1: 144 sessions (72 reading, 72 conversation) totaling 12 hours of synthetic eye movement video at 25fps. Evaluation shows that generated trajectories preserve the temporal dynamics of the source data (KS D < 0.14 across all metrics). A matched frame-by-frame comparison reveals that the 3D simulator exhibits bounded sensitivity at reading-scale movements, attributable to the absence of coupled head movement -- a finding that informs future simulator design. The pipeline, dataset, and evaluation tools are released to support downstream behavioral classifier development at the intersection of behavioral modeling and vision-language systems.</p></details> | <details><summary>Synth...</summary><p>Synthetic eye movement dataset generation via 3D eye simulator; iris trajectory replay; script reading detection; behavioral data augmentation</p></details> |\n| **[PRISM-MCTS: Learning from Reasoning Trajectories with Metacognitive Reflection](https://arxiv.org/abs/2604.05424v1)** | 2026-04-07 | <details><summary>Show</summary><p>PRISM-MCTS: Learning from Reasoning Trajectories with Metacognitive Reflection Siyuan Cheng, Bozhong Tian, Yanchao Hao, Zheng Wei Published: 06 Apr 2026, Last Modified: 06 Apr 2026 ACL 2026 Findings Conference, Area Chairs, Reviewers, Publication Chairs, Authors Revisions BibTeX CC BY 4.0 Keywords: Efficient/Low-Resource Methods for NLP, Generation, Question Answering Abstract: The emergence of reasoning models, exemplified by OpenAI o1, signifies a transition from intuitive to deliberative cognition, effectively reorienting the scaling laws from pre-training paradigms toward test-time computation. While Monte Carlo Tree Search (MCTS) has shown promise in this domain, existing approaches typically treat each rollout as an isolated trajectory. This lack of information sharing leads to severe inefficiency and substantial computational redundancy, as the search process fails to leverage insights from prior explorations. To address these limitations, we propose PRISM-MCTS, a novel reasoning framework that draws inspiration from human parallel thinking and reflective processes. PRISM-MCTS integrates a Process Reward Model (PRM) with a dynamic shared memory, capturing both \"Heuristics\" and \"Fallacies\". By reinforcing successful strategies and pruning error-prone branches, PRISM-MCTS effectively achieves refinement. Furthermore, we develop a data-efficient training strategy for the PRM, achieving high-fidelity evaluation under a few-shot regime. Empirical evaluations across diverse reasoning benchmarks substantiate the efficacy of PRISM-MCTS. Notably, it halves the trajectory requirements on GPQA while surpassing MCTS-RAG and Search-o1, demonstrating that it scales inference by reasoning judiciously rather than exhaustively.</p></details> |  |\n| **[CC-VPSTO: Chance-Constrained Via-Point-Based Stochastic Trajectory Optimisation for Online Robot Motion Planning under Uncertainty](https://arxiv.org/abs/2402.01370v4)** | 2026-04-06 | <details><summary>Show</summary><p>Reliable robot autonomy hinges on decision-making systems that account for uncertainty without imposing overly conservative restrictions on the robot's action space. We introduce Chance-Constrained Via-Point-Based Stochastic Trajectory Optimisation (CC-VPSTO), a real-time capable framework for generating task-efficient robot trajectories that satisfy constraints with high probability by formulating stochastic control as a chance-constrained optimisation problem. Since such problems are generally intractable, we propose a deterministic surrogate formulation based on Monte Carlo sampling, solved efficiently with gradient-free optimisation. To address bias in naïve sampling approaches, we quantify approximation error and introduce padding strategies to improve reliability. We focus on three challenges: (i) sample-efficient constraint approximation, (ii) conditions for surrogate solution validity, and (iii) online optimisation. Integrated into a receding-horizon MPC framework, CC-VPSTO enables reactive, task-efficient control under uncertainty, balancing constraint satisfaction and performance in a principled manner. The strengths of our approach lie in its generality, i.e. no assumptions on the underlying uncertainty distribution, system dynamics, cost function, or the form of inequality constraints; and its applicability to online robot motion planning. We demonstrate the validity and efficiency of our approach in both simulation and on a Franka Emika robot.</p></details> | <details><summary>23 pa...</summary><p>23 pages, 12 figures, submitted to International Journal of Robotics Research</p></details> |\n| **[Multimodal Classification Network Guided Trajectory Planning for Four-Wheel Independent Steering Autonomous Parking Considering Obstacle Attributes](https://arxiv.org/abs/2512.18836v3)** | 2026-04-06 | <details><summary>Show</summary><p>Four-wheel Independent Steering (4WIS) vehicles have attracted increasing attention for their superior maneuverability. Human drivers typically choose to cross or drive over the low-profile obstacles (e.g., plastic bags) to efficiently navigate through narrow spaces, while existing planners neglect obstacle attributes, leading to suboptimal efficiency or planning failures. To address this issue, we propose a novel multimodal trajectory planning framework that employs a neural network for scene perception, combines 4WIS hybrid A* search to generate a warm start, and utilizes an optimal control problem (OCP) for trajectory optimization. Specifically, a multimodal perception network fusing visual information and vehicle states is employed to capture semantic and contextual scene understanding, enabling the planner to adapt the strategy according to scene complexity (hard or easy task). For hard tasks, guided points are introduced to decompose complex tasks into local subtasks, improving the search efficiency. The multiple steering modes of 4WIS vehicles, Ackermann, diagonal, and zero-turn, are also incorporated as kinematically feasible motion primitives. Moreover, a hierarchical obstacle handling strategy, which categorizes obstacles as \"non-traversable\", \"crossable\", and \"drive-over\", is incorporated into the node expansion process, explicitly linking obstacle attributes to planning actions to enable efficient decisions. Furthermore, to address dynamic obstacles with motion uncertainty, we introduce a probabilistic risk field model, constructing risk-aware driving corridors that serve as linear collision constraints in OCP. Experimental results demonstrate the proposed framework's effectiveness in generating safe, efficient, and smooth trajectories for 4WIS vehicles, especially in constrained environments.</p></details> | <details><summary>The m...</summary><p>The manuscript in this current form requires substantial revision. For this reason, I request the withdrawal of the submission to allow for comprehensive improvement before resubmission</p></details> |\n| **[Uncertainty-Guided Latent Diagnostic Trajectory Learning for Sequential Clinical Diagnosis](https://arxiv.org/abs/2604.05116v1)** | 2026-04-06 | <details><summary>Show</summary><p>Clinical diagnosis requires sequential evidence acquisition under uncertainty. However, most Large Language Model (LLM) based diagnostic systems assume fully observed patient information and therefore do not explicitly model how clinical evidence should be sequentially acquired over time. Even when diagnosis is formulated as a sequential decision process, it is still challenging to learn effective diagnostic trajectories. This is because the space of possible evidence-acquisition paths is relatively large, while clinical datasets rarely provide explicit supervision information for desirable diagnostic paths. To this end, we formulate sequential diagnosis as a Latent Diagnostic Trajectory Learning (LDTL) framework based on a planning LLM agent and a diagnostic LLM agent. For the diagnostic LLM agent, diagnostic action sequences are treated as latent paths and we introduce a posterior distribution that prioritizes trajectories providing more diagnostic information. The planning LLM agent is then trained to follow this distribution, encouraging coherent diagnostic trajectories that progressively reduce uncertainty. Experiments on the MIMIC-CDM benchmark demonstrate that our proposed LDTL framework outperforms existing baselines in diagnostic accuracy under a sequential clinical diagnosis setting, while requiring fewer diagnostic tests. Furthermore, ablation studies highlight the critical role of trajectory-level posterior alignment in achieving these improvements.</p></details> |  |\n| **[SAIL: Scene-aware Adaptive Iterative Learning for Long-Tail Trajectory Prediction in Autonomous Vehicles](https://arxiv.org/abs/2604.04573v1)** | 2026-04-06 | <details><summary>Show</summary><p>Autonomous vehicles (AVs) rely on accurate trajectory prediction for safe navigation in diverse traffic environments, yet existing models struggle with long-tail scenarios-rare but safety-critical events characterized by abrupt maneuvers, high collision risks, and complex interactions. These challenges stem from data imbalance, inadequate definitions of long-tail trajectories, and suboptimal learning strategies that prioritize common behaviors over infrequent ones. To address this, we propose SAIL, a novel framework that systematically tackles the long-tail problem by first defining and modeling trajectories across three key attribute dimensions: prediction error, collision risk, and state complexity. Our approach then synergizes an attribute-guided augmentation and feature extraction process with a highly adaptive contrastive learning strategy. This strategy employs a continuous cosine momentum schedule, similarity-weighted hard-negative mining, and a dynamic pseudo-labeling mechanism based on evolving feature clustering. Furthermore, it incorporates a focusing mechanism to intensify learning on hard-positive samples within each identified class. This comprehensive design enables SAIL to excel at identifying and forecasting diverse and challenging long-tail events. Extensive evaluations on the nuScenes and ETH/UCY datasets demonstrate SAIL's superior performance, achieving up to 28.8% reduction in prediction error on the hardest 1% of long-tail samples compared to state-of-the-art baselines, while maintaining competitive accuracy across all scenarios. This framework advances reliable AV trajectory prediction in real-world, mixed-autonomy settings.</p></details> |  |\n| **[TIGFlow-GRPO: Trajectory Forecasting via Interaction-Aware Flow Matching and Reward-Guided Optimization](https://arxiv.org/abs/2603.24936v2)** | 2026-04-06 | <details><summary>Show</summary><p>Human trajectory forecasting is important for intelligent multimedia systems operating in visually complex environments, such as autonomous driving and crowd surveillance. Although Conditional Flow Matching (CFM) has shown strong ability in modeling trajectory distributions from spatio-temporal observations, existing approaches still focus primarily on supervised fitting, which may leave social norms and scene constraints insufficiently reflected in generated trajectories. To address this issue, we propose TIGFlow-GRPO, a two-stage generative approach that aligns flow-based trajectory generation with behavioral rules. In the first stage, we build a CFM-based predictor with a Trajectory-Interaction-Graph (TIG) module to model fine-grained visual-spatial interactions and strengthen context encoding. This stage captures both agent-agent and agent-scene relations more effectively, providing more informative conditional features for subsequent alignment. In the second stage, we perform Flow-GRPO post-training, where deterministic flow rollout is reformulated as stochastic ODE-to-SDE sampling to enable trajectory exploration, and a composite reward combines view-aware social compliance with map-aware physical feasibility. By evaluating trajectories explored through SDE rollout, GRPO progressively steers multimodal predictions toward behaviorally plausible futures. Experiments on the ETH/UCY and SDD datasets show that TIGFlow-GRPOimproves forecasting accuracy and long-horizon stability while generatingtrajectories that are more socially compliant and physically feasible.These results suggest that the proposed approach provides an effective way to connectflow-based trajectory modeling with behavior-aware alignment in dynamic multimedia environments.</p></details> |  |\n| **[Search-Based Multi-Trajectory Refinement for Safe C-to-Rust Translation with Large Language Models](https://arxiv.org/abs/2505.15858v3)** | 2026-04-06 | <details><summary>Show</summary><p>The C programming language has been foundational in building system-level software. However, its manual memory management model frequently leads to memory safety issues. In response, Rust has emerged as a memory-safe alternative. Moreover, automating the C-to-Rust translation empowered by the rapid advancements of the generative capabilities of LLMs is gaining growing interest for large volumes of legacy C code. Leveraging LLM for the C-to-Rust translation introduces distinct challenges, unlike the math or commonsense QA domains where the LLMs have been predominantly applied. First, the scarcity of parallel C-to-Rust datasets hinders the retrieval of suitable code translation exemplars for in-context learning. Second, unlike math or commonsense QA problems, the intermediate steps required for C-to-Rust are not well-defined. Third, it remains unclear how to organize and cascade these intermediate steps to construct a correct translation trajectory. While existing LLM-based approaches have achieved some success, they have relied on iterative code refinement along a single search trajectory on a C-to-Rust problem space and have not explored the use of systematic search mechanisms to navigate the space of possible refinement trajectories. To address these challenges in the C-to-Rust translation, we propose the MCTS-Guided LLM refinement technique for automated C-to-safe-Rust translation (LAC2R). LAC2R uses MCTS to systematically explore multiple refinement trajectories and organize the LLM-induced intermediate steps for correct translation. We experimentally demonstrated that LAC2R effectively conducts C-to-Rust translation on large-scale, real-world benchmarks. On small-scale benchmarks, LAC2R is the only method that simultaneously attains the highest safety ratio, perfect project-level correctness, and the fewest linter warnings among the compared methods.</p></details> |  |\n| **[Accelerated Gradient Methods for Nonconvex Optimization: Escape Trajectories From Strict Saddle Points and Convergence to Local Minima](https://arxiv.org/abs/2307.07030v3)** | 2026-04-06 | <details><summary>Show</summary><p>This paper considers the problem of understanding the behavior of a general class of accelerated gradient methods on smooth nonconvex functions. Motivated by some recent works that have proposed effective algorithms, based on Polyak's heavy ball method and the Nesterov accelerated gradient method, to achieve convergence to a local minimum of nonconvex functions, this work proposes a broad class of Nesterov-type accelerated methods and puts forth a rigorous study of these methods encompassing the escape from saddle points and convergence to local minima through both an asymptotic and a non-asymptotic analysis. In the asymptotic regime, this paper answers an open question of whether Nesterov's accelerated gradient method (NAG) with variable momentum parameter avoids strict saddle points almost surely. This work also develops two metrics of asymptotic rates of convergence and divergence, and evaluates these two metrics for several popular standard accelerated methods such as the NAG and Nesterov's accelerated gradient with constant momentum (NCM) near strict saddle points. In the non-asymptotic regime, this work provides an analysis that leads to the \"linear\" exit time estimates from strict saddle neighborhoods for trajectories of these accelerated methods as well the necessary conditions for the existence of such trajectories. Finally, this work studies a sub-class of accelerated methods that can converge in convex neighborhoods of nonconvex functions with a near optimal rate to a local minimum and at the same time this sub-class offers superior saddle-escape behavior compared to that of NAG.</p></details> | <details><summary>123 p...</summary><p>123 pages, 20 figures; adds a short clarification to the proof of Theorem 7.7 and incorporates a proof-stage typo fix; published in Foundations of Computational Mathematics, April 2026</p></details> |\n\n## Graph Neural Networks\n| **Title** | **Date** | **Abstract** | **Comment** |\n| --- | --- | --- | --- |\n| **[How Embeddings Shape Graph Neural Networks: Classical vs Quantum-Oriented Node Representations](https://arxiv.org/abs/2604.15273v1)** | 2026-04-16 | <details><summary>Show</summary><p>Node embeddings act as the information interface for graph neural networks, yet their empirical impact is often reported under mismatched backbones, splits, and training budgets. This paper provides a controlled benchmark of embedding choices for graph classification, comparing classical baselines with quantum-oriented node representations under a unified pipeline. We evaluate two classical baselines alongside quantum-oriented alternatives, including a circuit-defined variational embedding and quantum-inspired embeddings computed via graph operators and linear-algebraic constructions. All variants are trained and tested with the same backbone, stratified splits, identical optimization and early stopping, and consistent metrics. Experiments on five different TU datasets and on QM9 converted to classification via target binning show clear dataset dependence: quantum-oriented embeddings yield the most consistent gains on structure-driven benchmarks, while social graphs with limited node attributes remain well served by classical baselines. The study highlights practical trade-offs between inductive bias, trainability, and stability under a fixed training budget, and offers a reproducible reference point for selecting quantum-oriented embeddings in graph learning.</p></details> | <details><summary>6 pag...</summary><p>6 pages. Accepted at IJCNN 2026</p></details> |\n| **[Sampling Transferable Graph Neural Networks with Limited Graph Information](https://arxiv.org/abs/2410.16593v6)** | 2026-04-16 | <details><summary>Show</summary><p>Graph neural networks (GNNs) achieve strong performance on graph learning tasks, but training on large-scale networks remains computationally challenging. Transferability results show that GNNs with fixed weights can generalize from smaller graphs to larger ones drawn from the same family, motivating the use of sampled subgraphs to boost training efficiency. Yet most existing sampling strategies rely on reliable access to the target graph structure, which in practice may be noisy, incomplete, or unavailable prior to training. In lieu of precise connectivity information, we study feature-driven subgraph sampling for transferable GNNs, with the goal of preserving spectral properties of graph operators that control GNN expressivity. We adopt an alignment-based perspective linking node feature statistics to graph spectral structure and develop two complementary notions of feature-graph alignment. For coarse alignment, we formalize feature homophily through a Laplacian-based measure quantifying the alignment of feature principal components with graph eigenvectors, and establish a lower bound on the Laplacian trace in terms of feature statistics. This motivates a simple, non-sequential sampling algorithm that operates on the feature matrix and preserves a trace-based proxy for operator rank. For fine alignment, we assume a stationary model where the feature covariance and Laplacian share an eigenbasis, and establish that diagonal covariance entries reflect node-degree ordering under monotone filters. We empirically validate that filter monotonicity dictates the relationship between feature variance and spectral energy. On real-world benchmarks, selecting the retention rule that maximizes the Laplacian trace consistently yields GNNs with superior transferability and reduced generalization gaps.</p></details> | <details><summary>Submi...</summary><p>Submitted to IEEE TSP</p></details> |\n| **[Beyond the Laplacian: Doubly Stochastic Matrices for Graph Neural Networks](https://arxiv.org/abs/2604.15069v1)** | 2026-04-16 | <details><summary>Show</summary><p>Graph Neural Networks (GNNs) conventionally rely on standard Laplacian or adjacency matrices for structural message passing. In this work, we substitute the traditional Laplacian with a Doubly Stochastic graph Matrix (DSM), derived from the inverse of the modified Laplacian, to naturally encode continuous multi-hop proximity and strict local centrality. To overcome the intractable $O(n^3)$ complexity of exact matrix inversion, we first utilize a truncated Neumann series to scalably approximate the DSM, which serves as the foundation for our proposed DsmNet. Furthermore, because algebraic truncation inherently causes probability mass leakage, we introduce DsmNet-compensate. This variant features a mathematically rigorous Residual Mass Compensation mechanism that analytically re-injects the truncated tail mass into self-loops, strictly restoring row-stochasticity and structural dominance. Extensive theoretical and empirical analyses demonstrate that our decoupled architectures operate efficiently in $O(K|E|)$ time and effectively mitigate over-smoothing by bounding Dirichlet energy decay, providing robust empirical validation on homophilic benchmarks. Finally, we establish the theoretical boundaries of the DSM on heterophilic topologies and demonstrate its versatility as a continuous structural encoding for Graph Transformers.</p></details> |  |\n| **[Adaptive Canonicalization with Application to Invariant Anisotropic Geometric Networks](https://arxiv.org/abs/2509.24886v3)** | 2026-04-16 | <details><summary>Show</summary><p>Canonicalization is a widely used strategy in equivariant machine learning, enforcing symmetry in neural networks by mapping each input to a standard form. Yet, it often introduces discontinuities that can affect stability during training, limit generalization, and complicate universal approximation theorems. In this paper, we address this by introducing adaptive canonicalization, a general framework in which the canonicalization depends both on the input and the network. Specifically, we present the adaptive canonicalization based on prior maximization, where the standard form of the input is chosen to maximize the predictive confidence of the network. We prove that this construction yields continuous and symmetry-respecting models that admit universal approximation properties. We propose two applications of our setting: (i) resolving eigenbasis ambiguities in spectral graph neural networks, and (ii) handling rotational symmetries in point clouds. We empirically validate our methods on molecular and protein classification, as well as point cloud classification tasks. Our adaptive canonicalization outperforms the three other common solutions to equivariant machine learning: data augmentation, standard canonicalization, and equivariant architectures.</p></details> |  |\n| **[Dense Neural Networks are not Universal Approximators](https://arxiv.org/abs/2602.07618v3)** | 2026-04-16 | <details><summary>Show</summary><p>We investigate the approximation capabilities of dense neural networks. While universal approximation theorems establish that sufficiently large architectures can approximate arbitrary continuous functions if there are no restrictions on the weight values, we show that dense neural networks do not possess this universality. Our argument is based on a model compression approach, combining the weak regularity lemma with an interpretation of feedforward networks as message passing graph neural networks. We consider ReLU neural networks subject to natural constraints on weights and input and output dimensions, which model a notion of dense connectivity. Within this setting, we demonstrate the existence of Lipschitz continuous functions that cannot be approximated by such networks. This highlights intrinsic limitations of neural networks with dense layers and motivates the use of sparse connectivity as a necessary ingredient for achieving true universality.</p></details> |  |\n| **[Using deep learning to construct stochastic local search SAT solvers with performance bounds](https://arxiv.org/abs/2309.11452v3)** | 2026-04-16 | <details><summary>Show</summary><p>The Boolean Satisfiability problem (SAT), as the prototypical $\\mathsf{NP}$-complete problem, is crucial in both theoretical computer science and practical applications. To address this problem, stochastic local search (SLS) algorithms, which iteratively and randomly update candidate assignments, present an important and theoretically well-studied class of solvers. Recent theoretical advancements have identified conditions under which SLS solvers efficiently solve SAT instances, provided they have access to suitable ``oracles'', i.e., instance-specific distribution samples. We propose leveraging machine learning models, particularly graph neural networks (GNN), as oracles to enhance the performance of SLS solvers. Our approach, evaluated on random and pseudo-industrial SAT instances, demonstrates a significant performance improvement regarding step counts and solved instances. Our work bridges theoretical results and practical applications, highlighting the potential of purpose-trained SAT solvers with performance guarantees.</p></details> | <details><summary>24 pa...</summary><p>24 pages, significantly updated version with new datasets and experiments. Code available at https://github.com/porscheofficial/sls_sat_solving_with_deep_learning. Accepted for publication in Machine Learning: Science and Technology 7 (2026) 025057</p></details> |\n| **[Leveraging graph neural networks and mobility data for COVID-19 forecasting](https://arxiv.org/abs/2501.11711v2)** | 2026-04-16 | <details><summary>Show</summary><p>The COVID-19 pandemic has claimed millions of lives, spurring the development of diverse forecasting models. In this context, the true utility of complex spatio-temporal architectures versus simpler temporal baselines remains a subject of debate. Here, we show that structural sparsification of the input graph and temporal granularity are determining factors for the effectiveness of Graph Neural Networks (GNNs). By leveraging human mobility networks in Brazil and China, we address a conflicting scenario in the literature: while standard LSTMs suffice for smooth, monotonic cumulative trends, GNNs significantly outperform baselines when forecasting volatile daily case counts. We show that backbone extraction substantially enhances predictive stability and reduces predictive error by removing negligible connections. Our results indicate that incorporating spatial dependencies is essential for modeling complex dynamics. Specifically, GNN architectures such as GCRN and GCLSTM outperform the LSTM baseline (Nemenyi test, p < 0.05) on datasets from Brazil and China for daily case predictions. Lastly, we frame the problem as a binary classification task to better analyze the dependency between context sizes and prediction horizons.</p></details> |  |\n| **[PUFFIN: Protein Unit Discovery with Functional Supervision](https://arxiv.org/abs/2604.14796v1)** | 2026-04-16 | <details><summary>Show</summary><p>Proteins carry out biological functions through the coordinated action of groups of residues organized into structural arrangements. These arrangements, which we refer to as protein units, exist at an intermediate scale, being larger than individual residues yet smaller than entire proteins. A deeper understanding of protein function can be achieved by identifying these units and their associations with function. However, existing approaches either focus on residue-level signals, rely on curated annotations, or segment protein structures without incorporating functional information, thereby limiting interpretable analysis of structure-function relationships. We introduce PUFFIN, a data-driven framework for discovering protein units by jointly learning structural partitioning and functional supervision. PUFFIN represents proteins as residue-level structure graphs and applies a graph neural network with a structure-aware pooling mechanism that partitions each protein into multi-residue units, with functional supervision that shapes the partition. We show that the learned units are structurally coherent, exhibit organized associations with molecular function, and show meaningful correspondence with curated InterPro annotations. Together, these results demonstrate that PUFFIN provides an interpretable framework for analyzing structure-function relationships using learned protein units and their statistical function associations. We made our source code available at https://github.com/boun-tabi-lifelu/puffin.</p></details> | <details><summary>21 pa...</summary><p>21 pages, 9 figures, to appear in ISMB 2026 proceedings</p></details> |\n| **[Graph-Based Alternatives to LLMs for Human Simulation](https://arxiv.org/abs/2511.02135v2)** | 2026-04-16 | <details><summary>Show</summary><p>Large language models (LLMs) have become a popular approach for simulating human behaviors, yet it remains unclear if LLMs are necessary for all simulation tasks. We study a broad family of close-ended simulation tasks, with applications from survey prediction to test-taking, and show that a graph neural network can match or surpass strong LLM-based methods. We introduce Graph-basEd Models for Human Simulation (GEMS) which formulates close-ended simulation as link prediction on a heterogeneous graph of individuals and choices. Across three datasets and three evaluation settings, GEMS matches or outperforms the strongest LLM-based methods while using three orders of magnitude fewer parameters. These results suggest that graph-based modeling can complement LLMs as an efficient and transparent approach to simulating human behaviors. Code is available at https://github.com/schang-lab/gems.</p></details> | <details><summary>Confe...</summary><p>Conference: ACL 2026 Long Main Code: https://github.com/schang-lab/gems</p></details> |\n| **[CPGRec+: A Balance-oriented Framework for Personalized Video Game Recommendations](https://arxiv.org/abs/2604.14586v1)** | 2026-04-16 | <details><summary>Show</summary><p>The rapid expansion of gaming industry requires advanced recommender systems tailored to its dynamic landscape. Existing Graph Neural Network (GNN)-based methods primarily prioritize accuracy over diversity, overlooking their inherent trade-off. To address this, we previously proposed CPGRec, a balance-oriented gaming recommender system. However, CPGRec fails to account for critical disparities in player-game interactions, which carry varying significance in reflecting players' personal preferences and may exacerbate over-smoothness issues inherent in GNN-based models. Moreover, existing approaches underutilize the reasoning capabilities and extensive knowledge of large language models (LLMs) in addressing these limitations. To bridge this gap, we propose two new modules. First, Preference-informed Edge Reweighting (PER) module assigns signed edge weights to qualitatively distinguish significant player interests and disinterests while then quantitatively measuring preference strength to mitigate over-smoothing in graph convolutions. Second, Preference-informed Representation Generation (PRG) module leverages LLMs to generate contextualized descriptions of games and players by reasoning personal preferences from comparing global and personal interests, thereby refining representations of players and games. Experiments on \\textcolor{black}{two Steam datasets} demonstrate CPGRec+'s superior accuracy and diversity over state-of-the-art models. The code is accessible at https://github.com/HsipingLi/CPGRec-Plus.</p></details> | <details><summary>Publi...</summary><p>Published in ACM Transactions on Information Systems (TOIS). 43 pages, 9 figures</p></details> |\n| **[Behavior-Aware Dual-Channel Preference Learning for Heterogeneous Sequential Recommendation](https://arxiv.org/abs/2604.14581v1)** | 2026-04-16 | <details><summary>Show</summary><p>Heterogeneous sequential recommendation (HSR) aims to learn dynamic behavior dependencies from the diverse behaviors of user-item interactions to facilitate precise sequential recommendation. Despite many efforts yielding promising achievements, there are still challenges in modeling heterogeneous behavior data. One significant issue is the inherent sparsity of a real-world data, which can weaken the recommendation performance. Although auxiliary behaviors (e.g., clicks) partially address this problem, they inevitably introduce some noise, and the sparsity of the target behavior (e.g., purchases) remains unresolved. Additionally, contrastive learning-based augmentation in existing methods often focuses on a single behavior type, overlooking fine-grained user preferences and losing valuable information. To address these challenges, we have meticulously designed a behavior-aware dual-channel preference learning framework (BDPL). This framework begins with the construction of customized behavior-aware subgraphs to capture personalized behavior transition relationships, followed by a novel cascade-structured graph neural network to aggregate node context information. We then model and enhance user representations through a preference-level contrastive learning paradigm, considering both long-term and short-term preferences. Finally, we fuse the overall preference information using an adaptive gating mechanism to predict the next item the user will interact with under the target behavior. Extensive experiments on three real-world datasets demonstrate the superiority of our BDPL over the state-of-the-art models.</p></details> |  |\n| **[Unsupervised Learning of Local Updates for Maximum Independent Set in Dynamic Graphs](https://arxiv.org/abs/2505.13754v3)** | 2026-04-15 | <details><summary>Show</summary><p>We present the first unsupervised learning model for Maximum-Independent-Set (MaxIS) in dynamic graphs where edges change over time. Our method combines structural learning from graph neural networks (GNNs) with a learned distributed update mechanism that, given an edge addition or deletion event, modifies nodes' internal memories and infers their MaxIS membership in a single, parallel step. We evaluate our model against a mixed integer programming solver and a breadth of unsupervised and supervised learning models for combinatorial optimization on static graphs. Across dynamic graphs of 200-1,000 nodes, our model achieves approximation ratios that are competitive with the state-of-the-art models while running 1.91-6.70x faster. When generalizing to graphs with 100x more nodes than those used for training, our model produces MaxIS solutions 1.00-1.18x larger than all other unsupervised models, but is outperformed by the state-of-the-art supervised model. These results demonstrate that this novel, unsupervised, update-based learning approach to dynamic combinatorial optimization is a viable alternative to the naïve reapplication of analogous models for static graphs, leveraging temporal information to improve neural methods for combinatorial optimization.</p></details> | <details><summary>10 pa...</summary><p>10 pages, 3 figures, 1 table, 3 algorithms. To appear at IJCNN 2026</p></details> |\n| **[ID and Graph View Contrastive Learning with Multi-View Attention Fusion for Sequential Recommendation](https://arxiv.org/abs/2604.14114v1)** | 2026-04-15 | <details><summary>Show</summary><p>Sequential recommendation has become increasingly prominent in both academia and industry, particularly in e-commerce. The primary goal is to extract user preferences from historical interaction sequences and predict items a user is likely to engage with next. Recent advances have leveraged contrastive learning and graph neural networks to learn more expressive representations from interaction histories -- graphs capture relational structure between nodes, while ID-based representations encode item-specific information. However, few studies have explored multi-view contrastive learning between ID and graph perspectives to jointly improve user and item representations, especially in settings where only interaction data is available without auxiliary information. To address this gap, we propose Multi-View Contrastive learning for sequential recommendation (MVCrec), a framework that integrates complementary signals from both sequential (ID-based) and graph-based views. MVCrec incorporates three contrastive objectives: within the sequential view, within the graph view, and across views. To effectively fuse the learned representations, we introduce a multi-view attention fusion module that combines global and local attention mechanisms to estimate the likelihood of a target user purchasing a target item. Comprehensive experiments on five real-world benchmark datasets demonstrate that MVCrec consistently outperforms 11 state-of-the-art baselines, achieving improvements of up to 14.44\\% in NDCG@10 and 9.22\\% in HitRatio@10 over the strongest baseline. Our code and datasets are available at https://github.com/sword-Lz/MMCrec.</p></details> |  |\n| **[Log-based vs Graph-based Approaches to Fault Diagnosis](https://arxiv.org/abs/2604.14019v1)** | 2026-04-15 | <details><summary>Show</summary><p>Modern distributed systems generate large volumes of logs that can be analyzed to support essential AIOps tasks such as fault diagnosis, which plays a crucial role in maintaining system reliability. Most existing approaches rely on log-based models that treat logs as linear sequences of events. However, such representations discard the structural context between events that are often present in execution logs, such as parent-child dependencies, fan-out (branching), or temporal features. To better capture these relationships, recent works on Graph Neural Networks (GNNs) suggest that representing logs as graphs offers a promising alternative. Building on these observations, this paper conducts a comparative study of log-based encoder architectures (e.g., BERT) and graph-based models (e.g., GNNs) for automated fault diagnosis. We evaluate our models on TraceBench, a trace-oriented log dataset, and on BGL, a more traditional system log dataset, covering both anomaly detection and fault type classification. Our results show that graph-only models fail to outperform encoder baselines. However, integrating learned representations from log encoders into graph-based models achieves the strongest overall performance. These findings highlight conditions under which graph-augmented architectures can outperform traditional log-based approaches.</p></details> | <details><summary>8 pag...</summary><p>8 pages, 7 figures, student project</p></details> |\n| **[Leveraging LLM-GNN Integration for Open-World Question Answering over Knowledge Graphs](https://arxiv.org/abs/2604.13979v1)** | 2026-04-15 | <details><summary>Show</summary><p>Open-world Question Answering (OW-QA) over knowledge graphs (KGs) aims to answer questions over incomplete or evolving KGs. Traditional KGQA assumes a closed world where answers must exist in the KG, limiting real-world applicability. In contrast, open-world QA requires inferring missing knowledge based on graph structure and context. Large language models (LLMs) excel at language understanding but lack structured reasoning. Graph neural networks (GNNs) model graph topology but struggle with semantic interpretation. Existing systems integrate LLMs with GNNs or graph retrievers. Some support open-world QA but rely on structural embeddings without semantic grounding. Most assume observed paths or complete graphs, making them unreliable under missing links or multi-hop reasoning. We present GLOW, a hybrid system that combines a pre-trained GNN and an LLM for open-world KGQA. The GNN predicts top-k candidate answers from the graph structure. These, along with relevant KG facts, are serialized into a structured prompt (e.g., triples and candidates) to guide the LLM's reasoning. This enables joint reasoning over symbolic and semantic signals, without relying on retrieval or fine-tuning. To evaluate generalization, we introduce GLOW-BENCH, a 1,000-question benchmark over incomplete KGs across diverse domains. GLOW outperforms existing LLM-GNN systems on standard benchmarks and GLOW-BENCH, achieving up to 53.3% and an average 38% improvement. GitHub code and data are available.</p></details> | <details><summary>18 pa...</summary><p>18 pages,6 figures,10 tables. https://aclanthology.org/2026.eacl-long.26/</p></details> |\n| **[Maximum entropy temporal networks](https://arxiv.org/abs/2509.02098v6)** | 2026-04-15 | <details><summary>Show</summary><p>Temporal networks consist of timestamped directed interactions that may appear continuously in time, yet few studies have directly tackled the continuous-time modeling of networks. Here, we introduce a maximum-entropy approach to temporal networks and with basic assumptions on constraints, the corresponding network ensembles admit a modular and interpretable representation: a set of global time processes and a static maximum-entropy edge, e.g. node pair, probability. This time-edge labels factorization yields closed-form log-likelihoods, degree, clustering and motif expectations, and yields a whole class of effective generative models. We provide the maximum-entropy derivation for the non-homogeneous Poisson Process (NHPP) intensities governing the probability of directed edges in temporal networks via the functional optimization over path entropy, connecting NHPP modeling to maximum-entropy network ensembles. NHPPs consistently improve log-likelihood over generic Poisson processes, while the maximum-entropy edge labels recover strength constraints and reproduce expected unique-degree curves. We discuss the limitations of this framework and how it can be integrated with multivariate Hawkes calibration procedures, renewal theory, and neural kernel estimation in graph neural networks.</p></details> | 21 pages, 32 figures |\n| **[Finetuning-Free Diffusion Model with Adaptive Constraint Guidance for Inorganic Crystal Structure Generation](https://arxiv.org/abs/2604.13354v1)** | 2026-04-14 | <details><summary>Show</summary><p>The discovery of inorganic crystal structures with targeted properties is a significant challenge in materials science. Generative models, especially state-of-the-art diffusion models, offer the promise of modeling complex data distributions and proposing novel, realistic samples. However, current generative AI models still struggle to produce diverse, original, and reliable structures of experimentally achievable materials suitable for high-stakes applications. In this work, we propose a generative machine learning framework based on diffusion models with adaptive constraint guidance, which enables the incorporation of user-defined physical and chemical constraints during the generation process. This approach is designed to be practical and interpretable for human experts, allowing transparent decision-making and expert-driven exploration. To ensure the robustness and validity of the generated candidates, we introduce a multi-step validation pipeline that combines graph neural network estimators trained to achieve DFT-level accuracy and convex hull analysis for assessing thermodynamic stability. Our approach has been tested and validated on several classical examples of inorganic families of compounds, as case studies. As a consequence, these preliminary results demonstrate our framework's ability to generate thermodynamically plausible crystal structures that satisfy targeted geometric constraints across diverse inorganic chemical systems.</p></details> | <details><summary>Full ...</summary><p>Full article including supplementary information, 55 pages, 9 figures</p></details> |\n| **[Graph-Based Fraud Detection with Dual-Path Graph Filtering](https://arxiv.org/abs/2604.14235v1)** | 2026-04-14 | <details><summary>Show</summary><p>Fraud detection on graph data can be viewed as a demanding task that requires distinguishing between different types of nodes. Because graph neural networks (GNNs) are naturally suited for processing information encoded in graph form through their message-passing operations, methods based on GNN models have increasingly attracted attention in the fraud detection domain. However, fraud graphs inherently exhibit relation camouflage, high heterophily, and class imbalance, causing most GNNs to underperform in fraud detection tasks. To address these challenges, this paper proposes a Graph-Based Fraud Detection Model with Dual-Path Graph Filtering (DPF-GFD). DPF-GFD first applies a beta wavelet-based operator to the original graph to capture key structural patterns. It then constructs a similarity graph from distance-based node representations and applies an improved low-pass filter. The embeddings from the original and similarity graphs are fused through supervised representation learning to obtain node features, which are finally used by an ensemble tree model to assess the fraud risk of unlabeled nodes. Unlike existing single-graph smoothing approaches, DPF-GFD introduces a frequency-complementary dual-path filtering paradigm tailored for fraud detection, explicitly decoupling structural anomaly modeling and feature similarity modeling. This design enables more discriminative and stable node representations in highly heterophilous and imbalanced fraud graphs. Comprehensive experiments on four real-world financial fraud detection datasets demonstrate the effectiveness of our proposed method.</p></details> | Neural Networks |\n| **[Explainable Graph Neural Networks for Interbank Contagion Surveillance: A Regulatory-Aligned Framework for the U.S. Banking Sector](https://arxiv.org/abs/2604.14232v1)** | 2026-04-14 | <details><summary>Show</summary><p>The Spatial-Temporal Graph Attention Network (ST-GAT) framework was created to serve as an explainable GNN-based solution for detecting bank distress early warning signs and for conducting macro-prudential surveillance of the interbank system in the United States. The ST-GAT framework models 8,103 FDIC insured institutions across 58 quarterly snapshots (2010Q1-2024Q2). Bilateral exposures were reconstructed from publicly available FDIC Call Reports using maximum entropy estimation to produce a dynamic directed weighted graph. The framework achieves the highest AUPRC among all GNN architectures (0.939 +/- 0.010), trailing only XGBoost (0.944). Ablation analysis confirms the BiLSTM temporal component contributes +0.020 AUPRC; temporal attention weights exhibit a monotonically decreasing pattern consistent with long-run structural vulnerability weighting. Permutation importance identifies ROA (0.309) and NPL Ratio (0.252) as dominant predictors, consistent with post-mortem analyses of the 2023 regional banking crisis. All data are publicly available FDIC Call Reports and FRED series; all code and results are released.</p></details> | <details><summary>28 pa...</summary><p>28 pages, submitted to Research in International Business and Finance (RIBAF)</p></details> |\n| **[Poisoning the Inner Prediction Logic of Graph Neural Networks for Clean-Label Backdoor Attacks](https://arxiv.org/abs/2603.05004v2)** | 2026-04-14 | <details><summary>Show</summary><p>Graph Neural Networks (GNNs) have achieved remarkable results in various tasks. Recent studies reveal that graph backdoor attacks can poison the GNN model to predict test nodes with triggers attached as the target class. However, apart from injecting triggers to training nodes, these graph backdoor attacks generally require altering the labels of trigger-attached training nodes into the target class, which is impractical in real-world scenarios. In this work, we focus on the clean-label graph backdoor attack, a realistic but understudied topic where training labels are not modifiable. According to our preliminary analysis, existing graph backdoor attacks generally fail under the clean-label setting. Our further analysis identifies that the core failure of existing methods lies in their inability to poison the prediction logic of GNN models, leading to the triggers being deemed unimportant for prediction. Therefore, we study a novel problem of effective clean-label graph backdoor attacks by poisoning the inner prediction logic of GNN models. We propose BA-Logic to solve the problem by coordinating a poisoned node selector and a logic-poisoning trigger generator. Extensive experiments on real-world datasets demonstrate that our method effectively enhances the attack success rate and surpasses state-of-the-art graph backdoor attack competitors under clean-label settings. Our code is available at https://anonymous.4open.science/r/BA-Logic</p></details> | <details><summary>Under...</summary><p>Under review as TMLR regular paper</p></details> |\n| **[Topology-Aware Reasoning over Incomplete Knowledge Graph with Graph-Based Soft Prompting](https://arxiv.org/abs/2604.12503v1)** | 2026-04-14 | <details><summary>Show</summary><p>Large Language Models (LLMs) have shown remarkable capabilities across various tasks but remain prone to hallucinations in knowledge-intensive scenarios. Knowledge Base Question Answering (KBQA) mitigates this by grounding generation in Knowledge Graphs (KGs). However, most multi-hop KBQA methods rely on explicit edge traversal, making them fragile to KG incompleteness. In this paper, we proposed a novel graph-based soft prompting framework that shifts the reasoning paradigm from node-level path traversal to subgraph-level reasoning. Specifically, we employ a Graph Neural Network (GNN) to encode extracted structural subgraphs into soft prompts, enabling LLM to reason over richer structural context and identify relevant entities beyond immediate graph neighbors, thereby reducing sensitivity to missing edges. Furthermore, we introduce a two-stage paradigm that reduces computational cost while preserving good performance: a lightweight LLM first leverages the soft prompts to identify question-relevant entities and relations, followed by a more powerful LLM for evidence-aware answer generation. Experiments on four multi-hop KBQA benchmarks show that our approach achieves state-of-the-art performance on three of them, demonstrating its effectiveness. Code is available at the repository: https://github.com/Wangshuaiia/GraSP.</p></details> | 12 pages, 2 figures |\n| **[Scale-aware Message Passing For Graph Node Classification](https://arxiv.org/abs/2411.19392v3)** | 2026-04-14 | <details><summary>Show</summary><p>Most Graph Neural Networks (GNNs) operate at the first-order scale, even though multi-scale representations are known to be crucial in domains such as image classification. In this work, we investigate whether GNNs can similarly benefit from multi-scale learning, rather than being limited to a fixed depth of $k$-hop aggregation. We begin by formalizing scale invariance in graph learning, providing theoretical guarantees and empirical evidence for its effectiveness. Building on this principle, we introduce ScaleNet, a scale-aware message-passing architecture that combines directed multi-scale feature aggregation with an adaptive self-loop mechanism. ScaleNet achieves state-of-the-art performance on six benchmark datasets, covering both homophilic and heterophilic graphs. To handle scalability, we further propose LargeScaleNet, which extends multi-scale learning to large graphs and sets new state-of-the-art results on three large-scale benchmarks. We also show that FaberNet's strength largely arises from multi-scale feature integration. Together with these state-of-the-art results, our findings suggest that scale invariance may serve as a valuable principle for improving the performance of single-order GNNs. The code for all experiments is available at \\href{https://github.com/Qin87/ScaleNet/tree/iclr_scale_aware/}{this link}.</p></details> | <details><summary>add t...</summary><p>add theoretical proof, LargeScaleNet for large graphs. arXiv admin note: text overlap with arXiv:2411.08758</p></details> |\n| **[GTCN-G: A Residual Graph-Temporal Fusion Network for Imbalanced Intrusion Detection](https://arxiv.org/abs/2510.07285v3)** | 2026-04-14 | <details><summary>Show</summary><p>The escalating complexity of network threats and the inherent class imbalance in traffic data present formidable challenges for modern Intrusion Detection Systems (IDS). While Graph Neural Networks (GNNs) excel in modeling topological structures and Temporal Convolutional Networks (TCNs) are proficient in capturing time-series dependencies, a framework that synergistically integrates both while explicitly addressing data imbalance remains an open challenge. This paper introduces a novel deep learning framework, named Gated Temporal Convolutional Network and Graph (GTCN-G), engineered to overcome these limitations. Our model uniquely fuses a Gated TCN (G-TCN) for extracting hierarchical temporal features from network flows with a Graph Convolutional Network (GCN) designed to learn from the underlying graph structure. The core innovation lies in the integration of a residual learning mechanism, implemented via a Graph Attention Network (GAT). This mechanism preserves original feature information through residual connections, which is critical for mitigating the class imbalance problem and enhancing detection sensitivity for rare malicious activities (minority classes). We conducted extensive experiments on two public benchmark datasets, UNSW-NB15 and ToN-IoT, to validate our approach. The empirical results demonstrate that the proposed GTCN-G model achieves state-of-the-art performance, significantly outperforming existing baseline models in both binary and multi-class classification tasks.</p></details> | <details><summary>This ...</summary><p>This preprint was submitted to IEEE TrustCom 2025. The accepted version will be published under copyright 2025 IEEE</p></details> |\n| **[Learning to accelerate distributed ADMM using graph neural networks](https://arxiv.org/abs/2509.05288v2)** | 2026-04-14 | <details><summary>Show</summary><p>Distributed optimization is fundamental to large-scale machine learning and control applications. Among existing methods, the alternating direction method of multipliers (ADMM) has gained popularity due to its strong convergence guarantees and suitability for decentralized computation. However, ADMM can suffer from slow convergence and high sensitivity to hyperparameter choices. In this work, we show that distributed ADMM iterations can be naturally expressed within the message-passing framework of graph neural networks (GNNs). Building on this connection, we propose learning adaptive step sizes and communication weights through a GNN that predicts these yperparameters based on the current iterates. By unrolling ADMM for a fixed number of iterations, we train the network end-to-end to minimize the solution distance after these iterations for a given problem class, while preserving the algorithm's convergence properties. Numerical experiments demonstrate that our learned variant consistently improves convergence speed and solution quality compared to standard ADMM, both within the trained computational budget and beyond. The code is available at https://github.com/paulhausner/learning-distributed-admm.</p></details> | <details><summary>Learn...</summary><p>Learning for Dynamics and Control Conference (L4DC), the first two authors contributed equally</p></details> |\n| **[Uncertainty Quantification on Graph Learning: A Survey](https://arxiv.org/abs/2404.14642v4)** | 2026-04-14 | <details><summary>Show</summary><p>Graphical models have demonstrated their exceptional capabilities across numerous applications. However, their performance, confidence, and trustworthiness are often limited by the inherent randomness in data generation and the lack of knowledge to accurately model real-world complexities. There has been increased interest in developing uncertainty quantification (UQ) techniques tailored to graphical models. In this survey, we systematically examine existing works on UQ for graphical models. This survey distinguishes itself from most existing UQ surveys by specifically concentrating on graphical models, including graph neural networks and graph foundation models. We organize the literature along two complementary dimensions: uncertainty representation and uncertainty handling. By synthesizing both established methodologies and emerging trends, we aim to bridge gaps in understanding key challenges and opportunities in UQ for graphical models, inspiring researchers on graphical models or uncertainty quantification to make further advancements at the cross of the two fields.</p></details> |  |\n| **[Latent patterns of urban mixing in mobility analysis across five global cities](https://arxiv.org/abs/2604.12202v1)** | 2026-04-14 | <details><summary>Show</summary><p>This study leverages large-scale travel surveys for over 200,000 residents across Boston, Chicago, Hong Kong, London, and Sao Paulo. With rich individual-level data, we make systematic comparisons and reveal patterns in social mixing, which cannot be identified by analyzing high-resolution mobility data alone. Using the same set of data, inferring socioeconomic status from residential neighborhoods yield social mixing levels 16% lower than using self-reported survey data. Besides, individuals over the age of 66 experience greater social mixing than those in late working life (aged 55 to 65), lending data-driven support to the \"second youth\" hypothesis. Teenagers and women with caregiving responsibilities exhibit lower social mixing levels. Across the five cities, proximity to major transit stations reduces the influence of individual socioeconomic status on social mixing. Finally, we construct detailed spatio-temporal place networks for each city using a graph neural network. Inputs of home-space, activity-space and demographic attributes are embedded and fed into a supervised autoencoder to predict individual exposure vectors. Results show that the structure of individual activity space, i.e., where people travel to, explains most of the variations in place exposure, suggesting that mobility shapes experienced social mixing more than sociodemographic characteristics, home environment, and transit proximity. The ablation tests further discover that, while different income groups may experience similar levels of social mixing, their activity spaces remain stratified by income, resulting in structurally different social mixing experiences.</p></details> | <details><summary>Fan, ...</summary><p>Fan, Z., Loo, B.P.Y., Duarte, F., Ratti, C., & Moro, E. (2026). Latent patterns of urban mixing in mobility analysis across five global cities. Nature Cities, accepted</p></details> |\n| **[XANE(3): An E(3)-Equivariant Graph Neural Network for Accurate Prediction of XANES Spectra from Atomic Structures](https://arxiv.org/abs/2604.12140v1)** | 2026-04-13 | <details><summary>Show</summary><p>We present XANE(3), a physics-based E(3)-equivariant graph neural network for predicting X-ray absorption near-edge structure (XANES) spectra directly from atomic structures. The model combines tensor-product message passing with spherical harmonic edge features, absorber-query attention pooling, custom equivariant layer normalization, adaptive gated residual connections, and a spectral readout based on a multi-scale Gaussian basis with an optional sigmoidal background term. To improve line-shape fidelity, training is performed with a composite objective that includes pointwise spectral reconstruction together with first- and second-derivative matching terms. We evaluate the model on a dataset of 5,941 FDMNES simulations of iron oxide surface facets and obtain a spectrum mean squared error of $1.0 \\times 10^{-3}$ on the test set. The model accurately reproduces the main edge structure, relative peak intensities, pre-edge features, and post-edge oscillations. Ablation studies show that the derivative-aware objective, custom equivariant normalization, absorber-conditioned attention pooling, adaptive gated residual mixing, and global background term each improve performance. Interestingly, a capacity-matched scalar-only variant achieves comparable pointwise reconstruction error but reduced derivative-level fidelity, indicating that explicit tensorial channels are not strictly required for low intensity error on this dataset, although they remain beneficial for capturing finer spectral structure. These results establish XANE(3) as an accurate and efficient surrogate for XANES simulation and offer a promising route toward accelerated spectral prediction, ML-assisted spectroscopy, and data-driven materials discovery.</p></details> |  |\n| **[Exploring Concept Subspace for Self-explainable Text-Attributed Graph Learning](https://arxiv.org/abs/2604.11986v1)** | 2026-04-13 | <details><summary>Show</summary><p>We introduce Graph Concept Bottleneck (GCB) as a new paradigm for self-explainable text-attributed graph learning. GCB maps graphs into a subspace, concept bottleneck, where each concept is a meaningful phrase, and predictions are made based on the activation of these concepts. Unlike existing interpretable graph learning methods that primarily rely on subgraphs as explanations, the concept bottleneck provides a new form of interpretation. To refine the concept space, we apply the information bottleneck principle to focus on the most relevant concepts. This not only yields more concise and faithful explanations but also explicitly guides the model to \"think\" toward the correct decision. We empirically show that GCB achieves intrinsic interpretability with accuracy on par with black-box Graph Neural Networks. Moreover, it delivers better performance under distribution shifts and data perturbations, showing improved robustness and generalizability, benefitting from concept-guided prediction.</p></details> |  |\n| **[DeepFleet: Multi-Agent Foundation Models for Mobile Robots](https://arxiv.org/abs/2508.08574v3)** | 2026-04-13 | <details><summary>Show</summary><p>We introduce DeepFleet, a suite of foundation models designed to support coordination and planning for large-scale mobile robot fleets. These models are trained on fleet movement data, including robot positions, goals, and interactions, from hundreds of thousands of robots in Amazon warehouses worldwide. DeepFleet consists of four architectures that each embody a distinct inductive bias and collectively explore key points in the design space for multi-agent foundation models: the robot-centric (RC) model is an autoregressive decision transformer operating on neighborhoods of individual robots; the robot-floor (RF) model uses a transformer with cross-attention between robots and the warehouse floor; the image-floor (IF) model applies convolutional encoding to a multi-channel image representation of the full fleet; and the graph-floor (GF) model combines temporal attention with graph neural networks for spatial relationships. In this paper, we describe these models and present our evaluation of the impact of these design choices on prediction task performance. We find that the robot-centric and graph-floor models, which both use asynchronous robot state updates and incorporate the localized structure of robot interactions, show the most promise. We also present experiments that show that these two models can make effective use of larger warehouses operation datasets as the models are scaled up.</p></details> | <details><summary>27 pa...</summary><p>27 pages, 10 figures, 2 tables</p></details> |\n| **[Learning How Much to Think: Difficulty-Aware Dynamic MoEs for Graph Node Classification](https://arxiv.org/abs/2604.11473v1)** | 2026-04-13 | <details><summary>Show</summary><p>Mixture-of-Experts (MoE) architectures offer a scalable path for Graph Neural Networks (GNNs) in node classification tasks but typically rely on static and rigid routing strategies that enforce a uniform expert budget or coarse-grained expert toggles on all nodes. This limitation overlooks the varying discriminative difficulty of nodes and leads to under-fitting for hard nodes and redundant computation for easy ones. To resolve this issue, we propose D2MoE, a novel framework that shifts the focus from static expert selection to node-wise expert resource allocation. By using predictive entropy as a real-time proxy for difficulty, D2MoE employs a difficulty-driven top-p routing mechanism to adaptively concentrate expert resources on hard nodes while reducing overhead for easy ones, achieving continuous and fine-grained expert budget scaling for node classification. Experiments on 13 benchmarks demonstrate that D2MoE achieves consistent state-of-the-art performance, surpassing leading baselines by up to 7.92% in accuracy on heterophilous graphs. Notably, on large-scale graphs, it reduces memory consumption by up to 73.07% and training time by 46.53% compared to the best-performing Graph MoE, thereby validating its superior efficiency.</p></details> |  |\n| **[Adversarial Robustness of Graph Transformers](https://arxiv.org/abs/2407.11764v2)** | 2026-04-13 | <details><summary>Show</summary><p>Existing studies have shown that Message-Passing Graph Neural Networks (MPNNs) are highly susceptible to adversarial attacks. In contrast, despite the increasing importance of Graph Transformers (GTs), their robustness properties are unexplored. We close this gap and design the first adaptive attacks for GTs. In particular, we provide general design principles for strong gradient-based attacks on GTs w.r.t. structure perturbations and instantiate our attack framework for five representative and popular GT architectures. Specifically, we study GTs with specialized attention mechanisms and Positional Encodings (PEs) based on pairwise shortest paths, random walks, and the Laplacian spectrum. We evaluate our attacks on multiple tasks and perturbation models, including structure perturbations for node and graph classification, and node injection for graph classification. Our results reveal that GTs can be catastrophically fragile in many cases. Addressing this vulnerability, we show how our adaptive attacks can be effectively used for adversarial training, substantially improving robustness.</p></details> | <details><summary>TMLR ...</summary><p>TMLR 2025 (J2C-Certification: Presented @ ICLR 2026). A preliminary version appeared at the Differentiable Almost Everything Workshop at ICML 2024. Code available at https://github.com/isefos/gt_robustness</p></details> |\n| **[Sheaf Diffusion with Adaptive Local Structure for Spatio-Temporal Forecasting](https://arxiv.org/abs/2604.11275v1)** | 2026-04-13 | <details><summary>Show</summary><p>Spatio-temporal systems often exhibit highly heterogeneous and non-intuitive responses to localized disruptions, limiting the effectiveness of conventional message passing approaches in modeling higher-order interactions under local heterogeneity. This paper reformulates spatio-temporal forecasting as the problem of learning information flow over locally structured spaces, rather than propagating globally aligned node representations. We introduce a spatio-temporal sheaf diffusion graph neural network (ST-Sheaf GNN) that embeds graph topology into sheaf-theoretic vector spaces connected by learned linear restriction maps. Unlike prior work that relies on static or globally shared transformations, our model learns dynamic restriction maps that evolve over time and adapt to local spatio-temporal patterns to enable substantially more expressive interactions. By explicitly modeling latent local structure, the proposed framework efficiently mitigates the oversmoothing phenomenon in deep GNN architectures. We evaluate our framework on a diverse set of real-world spatio-temporal forecasting benchmarks spanning multiple domains. Experimental results demonstrate state-of-the-art performance, highlighting the effectiveness of sheaf-theoretic topological representations as a powerful foundation for spatio-temporal graph learning. The code is available at: https://anonymous.4open.science/r/ST-SheafGNN-6523/.</p></details> |  |\n| **[CapBench: A Multi-PDK Dataset for Machine-Learning-Based Post-Layout Capacitance Extraction](https://arxiv.org/abs/2604.11202v1)** | 2026-04-13 | <details><summary>Show</summary><p>We present CapBench, a fully reproducible, multi-PDK dataset for capacitance extraction. The dataset is derived from open-source designs, including single-core CPUs, systems-on-chip, and media accelerators. All designs are fully placed and routed using 14 independent OpenROAD flow runs spanning three technology nodes: ASAP7, NanGate45, and Sky130HD. From these layouts, we extract 61,855 3D windows across three size tiers to enable transfer learning and scalability studies. High-fidelity capacitance labels are generated using RWCap, a state-of-the-art random-walk solver, and validated against the industry-standard Raphael, achieving a mean absolute error of 0.64% for total capacitance. Each window is pre-processed into density maps, graph representations, and point clouds. We evaluate 10 machine learning architectures that illustrate dataset usage and serve as baselines, including convolutional neural networks (CNNs), point cloud transformers, and graph neural networks (GNNs). CNNs demonstrate the lowest errors (1.75%), while GNNs are up to 41.4x faster but exhibit larger errors (10.2%), illustrating a clear accuracy-speed trade-off. Code and dataset are available at https://github.com/THU-numbda/CapBench.</p></details> | <details><summary>Accep...</summary><p>Accepted at the 63rd ACM/IEEE Design Automation Conference (DAC '26). 7 pages, 5 figures</p></details> |\n| **[DIB-OD: Preserving the Invariant Core for Robust Heterogeneous Graph Adaptation via Decoupled Information Bottleneck and Online Distillation](https://arxiv.org/abs/2604.10882v1)** | 2026-04-13 | <details><summary>Show</summary><p>Graph Neural Network pretraining is pivotal for leveraging unlabeled graph data. However, generalizing across heterogeneous domains remains a major challenge due to severe distribution shifts. Existing methods primarily focus on intra-domain patterns, failing to disentangle task-relevant invariant knowledge from domain-specific redundant noise, leading to negative transfer and catastrophic forgetting. To this end, we propose DIB-OD, a novel framework designed to preserve the invariant core for robust heterogeneous graph adaptation through a Decoupled Information Bottleneck and Online Distillation framework. Our core innovation is the explicit decomposition of representations into orthogonal invariant and redundant subspaces. By utilizing an Information Bottleneck teacher-student distillation mechanism and the Hilbert-Schmidt Independence Criterion, we isolate a stable invariant core that transcends domain boundaries. Furthermore, a self-adaptive semantic regularizer is introduced to protect this core from corruption during target-domain adaptation by dynamically gating label influence based on predictive confidence. Extensive experiments across chemical, biological, and social network domains demonstrate that DIB-OD significantly outperforms state-of-the-art methods, particularly in challenging inter-type domain transfers, showcasing superior generalization and anti-forgetting performance.</p></details> |  |\n| **[FedRio: Personalized Federated Social Bot Detection via Cooperative Reinforced Contrastive Adversarial Distillation](https://arxiv.org/abs/2604.10678v1)** | 2026-04-12 | <details><summary>Show</summary><p>Social bot detection is critical to the stability and security of online social platforms. However, current state-of-the-art bot detection models are largely developed in isolation, overlooking the benefits of leveraging shared detection patterns across platforms to improve performance and promptly identify emerging bot variants. The heterogeneity of data distributions and model architectures further complicates the design of an effective cross-platform and cross-model detection framework. To address these challenges, we propose FedRio (Personalized Federated Social Bot Detection with Cooperative Reinforced Contrastive Adversarial Distillation framework. We first introduce an adaptive message-passing module as the graph neural network backbone for each client. To facilitate efficient knowledge sharing of global data distributions, we design a federated knowledge extraction mechanism based on generative adversarial networks. Additionally, we employ a multi-stage adversarial contrastive learning strategy to enforce feature space consistency among clients and reduce divergence between local and global models. Finally, we adopt adaptive server-side parameter aggregation and reinforcement learning-based client-side parameter control to better accommodate data heterogeneity in heterogeneous federated settings. Extensive experiments on two real-world social bot detection benchmarks demonstrate that FedRio consistently outperforms state-of-the-art federated learning baselines in detection accuracy, communication efficiency, and feature space consistency, while remaining competitive with published centralized results under substantially stronger privacy constraints.</p></details> | 17 pages, 6 figures |\n| **[SIGMA: An Efficient Heterophilous Graph Neural Network with Fast Global Aggregation](https://arxiv.org/abs/2305.09958v5)** | 2026-04-12 | <details><summary>Show</summary><p>Graph neural networks (GNNs) realize great success in graph learning but suffer from performance loss when meeting heterophily, i.e. neighboring nodes are dissimilar, due to their local and uniform aggregation. Existing attempts of heterophilous GNNs incorporate long-range or global aggregations to distinguish nodes in the graph. However, these aggregations usually require iteratively maintaining and updating full-graph information, which limits their efficiency when applying to large-scale graphs. In this paper, we propose SIGMA, an efficient global heterophilous GNN aggregation integrating the structural similarity measurement SimRank. Our theoretical analysis illustrates that SIGMA inherently captures distant global similarity even under heterophily, that conventional approaches can only achieve after iterative aggregations. Furthermore, it enjoys efficient one-time computation with a complexity only linear to the node set size $\\mathcal{O}(n)$. Comprehensive evaluation demonstrates that SIGMA achieves state-of-the-art performance with superior aggregation and overall efficiency. Notably, it obtains $5\\times$ acceleration on the large-scale heterophily dataset pokec with over 30 million edges compared to the best baseline aggregation.</p></details> | ICDE 2025 |\n| **[Topology-Aware PAC-Bayesian Generalization Analysis for Graph Neural Networks](https://arxiv.org/abs/2604.10553v1)** | 2026-04-12 | <details><summary>Show</summary><p>Graph neural networks have demonstrated excellent applicability to a wide range of domains, including social networks, biological systems, recommendation systems, and wireless communications. Yet a principled theoretical understanding of their generalization behavior remains limited, particularly for graph classification tasks where complex interactions between model parameters and graph structure play a crucial role. Among existing theoretical tools, PAC-Bayesian norm-based generalization bounds provide a flexible and data-dependent framework; however, current results for GNNs often restrict the exploitation of graph structures. In this work, we propose a topology-aware PAC-Bayesian norm-based generalization framework for graph convolutional networks (GCNs) that extends a previously developed framework to graph-structured models. Our approach reformulates the derivation of generalization bounds as a stochastic optimization problem and introduces sensitivity matrices that measure the response of classification outputs with respect to structured weight perturbations. By imposing different structures on sensitivity matrices from both spatial and spectral perspectives, we derive a family of generalization error bounds with graph structures explicitly embedded. Such bounds could recover existing results as special cases, while yielding bounds that are tighter than state-of-the-art PAC-Bayesian bounds for GNNs. Notably, the proposed framework explicitly integrates graph structural properties into the generalization analysis, enabling a unified inspection of GNN generalization behavior from both spatial aggregation and spectral filtering viewpoints.</p></details> |  |\n| **[A Diffusion-Contrastive Graph Neural Network with Virtual Nodes for Wind Nowcasting in Unobserved Regions](https://arxiv.org/abs/2604.10328v1)** | 2026-04-11 | <details><summary>Show</summary><p>Accurate weather nowcasting remains one of the central challenges in atmospheric science, with critical implications for climate resilience, energy security, and disaster preparedness. Since it is not feasible to deploy observation stations everywhere, some regions lack dense observational networks, resulting in unreliable short-term wind predictions across those unobserved areas. Here we present a deep graph self-supervised framework that extends nowcasting capability into such unobserved regions without requiring new sensors. Our approach introduces \"virtual nodes\" into a diffusion and contrastive-based graph neural network, enabling the model to learn wind condition (i.e., speed, direction and gusts) in places with no direct measurements. Using high-temporal resolution weather station data across the Netherlands, we demonstrate that this approach reduces nowcast mean absolute error (MAE) of wind speed, gusts, and direction in unobserved regions by more than 30% - 46% compared with interpolation and regression methods. By enabling localized nowcasts where no measurements exist, this method opens new pathways for renewable energy integration, agricultural planning, and early-warning systems in data-sparse regions.</p></details> | 25 pages, 7 figures |\n| **[AdvSynGNN: Structure-Adaptive Graph Neural Nets via Adversarial Synthesis and Self-Corrective Propagation](https://arxiv.org/abs/2602.17071v2)** | 2026-04-11 | <details><summary>Show</summary><p>Graph neural networks frequently encounter significant performance degradation when confronted with structural noise or non-homophilous topologies. To address these systemic vulnerabilities, we present AdvSynGNN, a comprehensive architecture designed for resilient node-level representation learning. The proposed framework orchestrates multi-resolution structural synthesis alongside contrastive objectives to establish geometry-sensitive initializations. We develop a transformer backbone that adaptively accommodates heterophily by modulating attention mechanisms through learned topological signals. Central to our contribution is an integrated adversarial propagation engine, where a generative component identifies potential connectivity alterations while a discriminator enforces global coherence. Furthermore, label refinement is achieved through a residual correction scheme guided by per-node confidence metrics, which facilitates precise control over iterative stability. Empirical evaluations demonstrate that this synergistic approach effectively optimizes predictive accuracy across diverse graph distributions while maintaining computational efficiency. The study concludes with practical implementation protocols to ensure the robust deployment of the AdvSynGNN system in large-scale environments.</p></details> | 32 pages, 8 figures |\n| **[Virtual Smart Metering in District Heating Networks via Heterogeneous Spatial-Temporal Graph Neural Networks](https://arxiv.org/abs/2604.10166v1)** | 2026-04-11 | <details><summary>Show</summary><p>Intelligent operation of thermal energy networks aims to improve energy efficiency, reliability, and operational flexibility through data-driven control, predictive optimization, and early fault detection. Achieving these goals relies on sufficient observability, requiring continuous and well-distributed monitoring of thermal and hydraulic states. However, district heating systems are typically sparsely instrumented and frequently affected by sensor faults, limiting monitoring. Virtual sensing offers a cost-effective means to enhance observability, yet its development and validation remain limited in practice. Existing data-driven methods generally assume dense synchronized data, while analytical models rely on simplified hydraulic and thermal assumptions that may not adequately capture the behavior of heterogeneous network topologies. Consequently, modeling the coupled nonlinear dependencies between pressure, flow, and temperature under realistic operating conditions remains challenging. In addition, the lack of publicly available benchmark datasets hinders systematic comparison of virtual sensing approaches. To address these challenges, we propose a heterogeneous spatial-temporal graph neural network (HSTGNN) for constructing virtual smart heat meters. The model incorporates the functional relationships inherent in district heating networks and employs dedicated branches to learn graph structures and temporal dynamics for flow, temperature, and pressure measurements, thereby enabling the joint modeling of cross-variable and spatial correlations. To support further research, we introduce a controlled laboratory dataset collected at the Aalborg Smart Water Infrastructure Laboratory, providing synchronized high-resolution measurements representative of real operating conditions. Extensive experiments demonstrate that the proposed approach significantly outperforms existing baselines.</p></details> |  |\n| **[K-STEMIT: Knowledge-Informed Spatio-Temporal Efficient Multi-Branch Graph Neural Network for Subsurface Stratigraphy Thickness Estimation from Radar Data](https://arxiv.org/abs/2604.09922v1)** | 2026-04-10 | <details><summary>Show</summary><p>Subsurface stratigraphy contains important spatio-temporal information about accumulation, deformation, and layer formation in polar ice sheets. In particular, variations in internal ice layer thickness provide valuable constraints for snow mass balance estimation and projections of ice sheet change. Although radar sensors can capture these layered structures as depth-resolved radargrams, convolutional neural networks applied directly to radar images are often sensitive to speckle noise and acquisition artifacts. In addition, purely data-driven methods may underuse physical knowledge, leading to unrealistic thickness estimates under spatial or temporal extrapolation. To address these challenges, we develop K-STEMIT, a novel knowledge-informed, efficient, multi-branch spatio-temporal graph neural network that combines a geometric framework for spatial learning with temporal convolution to capture temporal dynamics, and incorporates physical data synchronized from the Model Atmospheric Regional physical weather model. An adaptive feature fusion strategy is employed to dynamically combine features learned from different branches. Extensive experiments have been conducted to compare K-STEMIT against current state-of-the-art methods in both knowledge-informed and non-knowledge-informed settings, as well as other existing methods. Results show that K-STEMIT consistently achieves the highest accuracy while maintaining near-optimal efficiency. Most notably, incorporating adaptive feature fusion and physical priors reduces the root mean-squared error by 21.01% with negligible additional cost compared to its conventional multi-branch variants. Additionally, our proposed K-STEMIT achieves consistently lower per-year relative MAE, enabling reliable, continuous spatiotemporal assessment of snow accumulation variability across large spatial regions.</p></details> |  |\n| **[Transferable FB-GNN-MBE Framework for Potential Energy Surfaces: Data-Adaptive Transfer Learning in Deep Learned Many-Body Expansion Theory](https://arxiv.org/abs/2604.09320v1)** | 2026-04-10 | <details><summary>Show</summary><p>Mechanistic understanding and rational design of complex chemical systems depend on fast and accurate predictions of electronic structures beyond individual building blocks. However, if the system exceeds hundreds of atoms, first-principles quantum mechanical (QM) modeling becomes impractical. In this study, we developed FB-GNN-MBE by integrating a fragment-based graph neural network (FB-GNN) into the many-body expansion (MBE) theory and demonstrated its capacity to reproduce first-principles potential energy surfaces (PES) for hierarchically structured systems with manageable accuracy, complexity, and interpretability. Specifically, we divided the entire system into basic building blocks (fragments), evaluated their one-fragment energies using a QM model, and addressed many-fragment interactions using the structure-property relationships trained by FB-GNNs. Our investigation shows that FB-GNN-MBE achieves chemical accuracy in predicting two-body (2B) and three-body (3B) energies across water, phenol, and mixture benchmarks, as well as the one-dimensional dissociation curves of water and phenol dimers. To transfer the success of FB-GNN-MBE across various systems with minimal computational costs and data demands, we developed and validated a teacher-student learning protocol. A heavy-weight FB-GNN trained on a mixed-density water cluster ensemble (teacher) distills its learned knowledge and passes it to a light-weight GNN (student), which is later fine-tuned on a uniform-density (H2O)21 cluster ensemble. This transfer learning strategy resulted in efficient and accurate prediction of 2B and 3B energies for variously sized water clusters without retraining. Our transferable FB-GNN-MBE framework outperformed conventional non-FB-GNN-based models and showed high practicality for large-scale molecular simulations.</p></details> | <details><summary>Under...</summary><p>Under review with The Journal of Chemical Physics. Main text: 23 pages, 11 figures, and 1 table. Supplementary Materials: 28 pages, 6 figures, 15 tables, 4 pseudo-algorithms</p></details> |\n| **[SatQNet: Satellite-assisted Quantum Network Entanglement Routing Using Directed Line Graph Neural Networks](https://arxiv.org/abs/2604.09306v1)** | 2026-04-10 | <details><summary>Show</summary><p>Quantum networks are expected to become a key enabler for interconnecting quantum devices. In contrast to classical communication networks, however, information transfer in quantum networks is usually restricted to short distances due to physical constraints of entanglement distribution. Satellites can extend entanglement distribution over long distances, but routing in such networks is challenging because satellite motion and stochastic link generation create a highly dynamic quantum topology. Existing routing methods often rely on global topology information that quickly becomes outdated due to delays in the classical control plane, while decentralized methods typically act on incomplete local information. We propose SatQNet, a reinforcement learning approach for entanglement routing in satellite-assisted quantum networks that can be decentralized at runtime. Its key innovation is an edge-centric directed line graph neural network that performs local message passing on directed edge embeddings, enabling it to better capture link properties in high-degree and time-varying topologies. By exchanging messages with neighboring repeaters, SatQNet learns a local graph representation at runtime that supports agents in establishing high-fidelity end-to-end entanglements. Trained on random graphs, SatQNet outperforms heuristic and learning-based approaches across diverse settings, including a real-world European backbone topology, and generalizes to unseen topologies without retraining.</p></details> |  |\n| **[DiffHLS: Differential Learning for High-Level Synthesis QoR Prediction with GNNs and LLM Code Embeddings](https://arxiv.org/abs/2604.09240v1)** | 2026-04-10 | <details><summary>Show</summary><p>High-Level Synthesis (HLS) compiles C/C++ into RTL, but exploring pragma-driven optimization choices remains expensive because each design point requires time-consuming synthesis. We propose \\textbf{\\DiffHLS}, a differential learning framework for HLS Quality-of-Result (QoR) prediction that learns from kernel--design pairs: a kernel baseline and a pragma-inserted design variant. \\DiffHLS~encodes kernel and design intermediate-representation graphs with dedicated graph neural network (GNN) branches, and augments the delta pathway with code embeddings from a pretrained code large language model (LLM). Instead of regressing absolute targets directly, we jointly predict the kernel baseline and the design-induced delta, and compose them to obtain the design prediction. On PolyBench, \\DiffHLS~attains lower average MAPE than GNN baselines under four GNN backbones, and LLM code embeddings consistently improve over a GNN-only ablation. We further validate scalability on the ForgeHLS dataset.</p></details> |  |\n| **[On the Role of DAG topology in Energy-Aware Cloud Scheduling : A GNN-Based Deep Reinforcement Learning Approach](https://arxiv.org/abs/2604.09202v1)** | 2026-04-10 | <details><summary>Show</summary><p>Cloud providers must assign heterogeneous compute resources to workflow DAGs while balancing competing objectives such as completion time, cost, and energy consumption. In this work, we study a single-workflow, queue-free scheduling setting and consider a graph neural network (GNN)-based deep reinforcement learning scheduler designed to minimize workflow completion time and energy usage. We identify specific out-of-distribution (OOD) conditions under which GNN-based deep reinforcement learning schedulers fail and provide a principled explanation of why these failures occur. Through controlled OOD evaluations, we demonstrate that performance degradation stems from structural mismatches between training and deployment environments, which disrupt message passing and undermine policy generalization. Our analysis exposes fundamental limitations of current GNN-based schedulers and highlights the need for more robust representations to ensure reliable scheduling performance under distribution shifts.</p></details> |  |\n| **[FIT-GNN: Faster Inference Time for GNNs that 'FIT' in Memory Using Coarsening](https://arxiv.org/abs/2410.15001v5)** | 2026-04-10 | <details><summary>Show</summary><p>Scalability of Graph Neural Networks (GNNs) remains a significant challenge. To tackle this, methods like coarsening, condensation, and computation trees are used to train on a smaller graph, resulting in faster computation. Nonetheless, prior research has not adequately addressed the computational costs during the inference phase. This paper presents a novel approach to improve the scalability of GNNs by reducing computational burden during the inference phase using graph coarsening. We demonstrate two different methods -- Extra Nodes and Cluster Nodes. Our study extends the application of graph coarsening for graph-level tasks, including graph classification and graph regression. We conduct extensive experiments on multiple benchmark datasets to evaluate the performance of our approach. Our results show that the proposed method achieves orders of magnitude improvements in single-node inference time compared to traditional approaches. Furthermore, it significantly reduces memory consumption for node and graph classification and regression tasks, enabling efficient training and inference on low-resource devices where conventional methods are impractical. Notably, these computational advantages are achieved while maintaining competitive performance relative to baseline models.</p></details> | <details><summary>Publi...</summary><p>Published in Transactions on Machine Learning Research (TMLR), 2026. Available at https://openreview.net/forum?id=g7r7y2I7Sz</p></details> |\n| **[Graph Defense Diffusion Model](https://arxiv.org/abs/2501.11568v2)** | 2026-04-10 | <details><summary>Show</summary><p>Graph Neural Networks (GNNs) are highly vulnerable to adversarial attacks, which can greatly degrade their performance. Existing graph purification methods attempt to address this issue by filtering attacked graphs. However, they struggle to defend effectively against multiple types of adversarial attacks (e.g., targeted attacks and non-targeted attacks) simultaneously due to limited flexibility. Additionally, these methods lack comprehensive modeling of graph data, relying heavily on heuristic prior knowledge. To overcome these challenges, we introduce the Graph Defense Diffusion Model (GDDM), a flexible purification method that leverages the denoising and modeling capabilities of diffusion models. The iterative nature of diffusion models aligns well with the stepwise process of adversarial attacks, making them particularly suitable for defense. By iteratively adding and removing noises (edges), GDDM effectively purifies attacked graphs, restoring their original structures and features. Our GDDM consists of two key components: (1) Graph Structure-Driven Refiner, which preserves the basic fidelity of the graph during the denoising process, and ensures that the generated graph remains consistent with the original scope; and (2) Node Feature-Constrained Regularizer, which removes residual impurities from the denoised graph, further enhancing the purification effect. By designing tailored denoising strategies to handle different types of adversarial attacks, we improve the GDDM's adaptability to various attack scenarios. Furthermore, GDDM demonstrates strong scalability, leveraging its structural properties to seamlessly transfer across similar datasets without retraining. Extensive experiments on three real-world datasets demonstrate that GDDM outperforms state-of-the-art methods in defending against various adversarial attacks, showcasing its robustness and effectiveness.</p></details> | <details><summary>Accep...</summary><p>Accepted by The 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1 (KDD 2026)</p></details> |\n| **[EquiformerV3: Scaling Efficient, Expressive, and General SE(3)-Equivariant Graph Attention Transformers](https://arxiv.org/abs/2604.09130v1)** | 2026-04-10 | <details><summary>Show</summary><p>As $SE(3)$-equivariant graph neural networks mature as a core tool for 3D atomistic modeling, improving their efficiency, expressivity, and physical consistency has become a central challenge for large-scale applications. In this work, we introduce EquiformerV3, the third generation of the $SE(3)$-equivariant graph attention Transformer, designed to advance all three dimensions: efficiency, expressivity, and generality. Building on EquiformerV2, we have the following three key advances. First, we optimize the software implementation, achieving $1.75\\times$ speedup. Second, we introduce simple and effective modifications to EquiformerV2, including equivariant merged layer normalization, improved feedforward network hyper-parameters, and attention with smooth radius cutoff. Third, we propose SwiGLU-$S^2$ activations to incorporate many-body interactions for better theoretical expressivity and to preserve strict equivariance while reducing the complexity of sampling $S^2$ grids. Together, SwiGLU-$S^2$ activations and smooth-cutoff attention enable accurate modeling of smoothly varying potential energy surfaces (PES), generalizing EquiformerV3 to tasks requiring energy-conserving simulations and higher-order derivatives of PES. With these improvements, EquiformerV3 trained with the auxiliary task of denoising non-equilibrium structures (DeNS) achieves state-of-the-art results on OC20, OMat24, and Matbench Discovery.</p></details> |  |\n| **[Dual Mamba for Node-Specific Representation Learning: Tackling Over-Smoothing with Selective State Space Modeling](https://arxiv.org/abs/2511.06756v3)** | 2026-04-10 | <details><summary>Show</summary><p>Over-smoothing remains a fundamental challenge in deep Graph Neural Networks (GNNs), where repeated message passing causes node representations to become indistinguishable. While existing solutions, such as residual connections and skip layers, alleviate this issue to some extent, they fail to explicitly model how node representations evolve in a node-specific and progressive manner across layers. Moreover, these methods do not take global information into account, which is also crucial for mitigating the over-smoothing problem. To address the aforementioned issues, in this work, we propose a Dual Mamba-enhanced Graph Convolutional Network (DMbaGCN), which is a novel framework that integrates Mamba into GNNs to address over-smoothing from both local and global perspectives. DMbaGCN consists of two modules: the Local State-Evolution Mamba (LSEMba) for local neighborhood aggregation and utilizing Mamba's selective state space modeling to capture node-specific representation dynamics across layers, and the Global Context-Aware Mamba (GCAMba) that leverages Mamba's global attention capabilities to incorporate global context for each node. By combining these components, DMbaGCN enhances node discriminability in deep GNNs, thereby mitigating over-smoothing. Extensive experiments on multiple benchmarks demonstrate the effectiveness and efficiency of our method.</p></details> | <details><summary>Accep...</summary><p>Accepted by The 40th Annual AAAI Conference on Artificial Intelligence (AAAI 2026)</p></details> |\n| **[Mamba-Based Graph Convolutional Networks: Tackling Over-smoothing with Selective State Space](https://arxiv.org/abs/2501.15461v4)** | 2026-04-10 | <details><summary>Show</summary><p>Graph Neural Networks (GNNs) have shown great success in various graph-based learning tasks. However, it often faces the issue of over-smoothing as the model depth increases, which causes all node representations to converge to a single value and become indistinguishable. This issue stems from the inherent limitations of GNNs, which struggle to distinguish the importance of information from different neighborhoods. In this paper, we introduce MbaGCN, a novel graph convolutional architecture that draws inspiration from the Mamba paradigm-originally designed for sequence modeling. MbaGCN presents a new backbone for GNNs, consisting of three key components: the Message Aggregation Layer, the Selective State Space Transition Layer, and the Node State Prediction Layer. These components work in tandem to adaptively aggregate neighborhood information, providing greater flexibility and scalability for deep GNN models. While MbaGCN may not consistently outperform all existing methods on each dataset, it provides a foundational framework that demonstrates the effective integration of the Mamba paradigm into graph representation learning. Through extensive experiments on benchmark datasets, we demonstrate that MbaGCN paves the way for future advancements in graph neural network research.</p></details> | <details><summary>Accep...</summary><p>Accepted by The Thirty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2025)</p></details> |\n| **[BLEG: LLM Functions as Powerful fMRI Graph-Enhancer for Brain Network Analysis](https://arxiv.org/abs/2604.07361v2)** | 2026-04-10 | <details><summary>Show</summary><p>Graph Neural Networks (GNNs) have been widely used in diverse brain network analysis tasks based on preprocessed functional magnetic resonance imaging (fMRI) data. However, their performances are constrained due to high feature sparsity and inherent limitations of domain knowledge within uni-modal neurographs. Meanwhile, large language models (LLMs) have demonstrated powerful representation capabilities. Combining LLMs with GNNs presents a promising direction for brain network analysis. While LLMs and MLLMs have emerged in neuroscience, integration of LLMs with graph-based data remains unexplored. In this work, we deal with these issues by incorporating LLM's powerful representation and generalization capabilities. Considering great cost for directly tuning LLMs, we instead function LLM as enhancer to boost GNN's performance on downstream tasks. Our method, namely BLEG, can be divided into three stages. We firstly prompt LLM to get augmented texts for fMRI graph data, then we design a LLM-LM instruction tuning method to get enhanced textual representations at a relatively lower cost. GNN is trained together for coarsened alignment. Finally we finetune an adapter after GNN for given downstream tasks. Alignment loss between LM and GNN logits is designed to further enhance GNN's representation. Extensive experiments on different datasets confirmed BLEG's superiority.Code can be available at https://github.com/KamonRiderDR/BLEG.</p></details> |  |\n| **[Neighbourhood Transformer: Switchable Attention for Monophily-Aware Graph Learning](https://arxiv.org/abs/2604.08980v1)** | 2026-04-10 | <details><summary>Show</summary><p>Graph neural networks (GNNs) have been widely adopted in engineering applications such as social network analysis, chemical research and computer vision. However, their efficacy is severely compromised by the inherent homophily assumption, which fails to hold for heterophilic graphs where dissimilar nodes are frequently connected. To address this fundamental limitation in graph learning, we first draw inspiration from the recently discovered monophily property of real-world graphs, and propose Neighbourhood Transformers (NT), a novel paradigm that applies self-attention within every local neighbourhood instead of aggregating messages to the central node as in conventional message-passing GNNs. This design makes NT inherently monophily-aware and theoretically guarantees its expressiveness is no weaker than traditional message-passing frameworks. For practical engineering deployment, we further develop a neighbourhood partitioning strategy equipped with switchable attentions, which reduces the space consumption of NT by over 95% and time consumption by up to 92.67%, significantly expanding its applicability to larger graphs. Extensive experiments on 10 real-world datasets (5 heterophilic and 5 homophilic graphs) show that NT outperforms all current state-of-the-art methods on node classification tasks, demonstrating its superior performance and cross-domain adaptability. The full implementation code of this work is publicly available at https://github.com/cf020031308/MoNT to facilitate reproducibility and industrial adoption.</p></details> |  |\n| **[R2G: A Multi-View Circuit Graph Benchmark Suite from RTL to GDSII](https://arxiv.org/abs/2604.08810v1)** | 2026-04-09 | <details><summary>Show</summary><p>Graph neural networks (GNNs) are increasingly applied to physical design tasks such as congestion prediction and wirelength estimation, yet progress is hindered by inconsistent circuit representations and the absence of controlled evaluation protocols. We present R2G (RTL-to-GDSII), a multi-view circuit-graph benchmark suite that standardizes five stage-aware views with information parity (every view encodes the same attribute set, differing only in where features attach) over 30 open-source IP cores (up to $10^6$ nodes/edges). R2G provides an end-to-end DEF-to-graph pipeline spanning synthesis, placement, and routing stages, together with loaders, unified splits, domain metrics, and reproducible baselines. By decoupling representation choice from model choice, R2G isolates a confound that prior EDA and graph-ML benchmarks leave uncontrolled. In systematic studies with GINE, GAT, and ResGatedGCN, we find: (i) view choice dominates model choice, with Test R$^2$ varying by more than 0.3 across representations for a fixed GNN; (ii) node-centric views generalize best across both placement and routing; and (iii) decoder-head depth (3--4 layers) is the primary accuracy driver, turning divergent training into near-perfect predictions (R$^2$$>$0.99). Code and datasets are available at https://github.com/ShenShan123/R2G.</p></details> | <details><summary>Accep...</summary><p>Accepted as a poster by CVPR2026</p></details> |\n| **[An Algorithm for Fast Assembling Large-Scale Defect-Free Atom Arrays](https://arxiv.org/abs/2604.08669v1)** | 2026-04-09 | <details><summary>Show</summary><p>It is widely believed that tens of thousands of physical qubits are needed to build a practically useful quantum computer. Atom arrays formed by optical tweezers are among the most promising platforms for achieving this goal, owing to the excellent scalability and mobility of atomic qubits. However, assembling a defect-free atom array with ~ 10^4 qubits remains algorithmically challenging, alongside other hardware limitations. This is due to the computationally hard path-planning problems and the time-consuming generation of suffciently smooth trajectories for optical tweezer potentials by spatial light modulators (SLM). Here, we present a unified framework comprising two innovative components to fully address these algorithmic challenges: (1) a path-planning module that employs a supervised learning approach using a graph neural network combined with a modified auction decoder, and (2) a potential-generation module called the phase and profile-aware Weighted Gerchberg-Saxton algorithm. The inference time for the first module is nearly a size-independent constant overhead of ~ 5 ms, and the second module generates a potential frame with about 0.5 ms, a timescale shorter than the current commercial SLM refresh time. Altogether, our algorithm enables the assembly of an atom array with 10^4 qubits on a timescale much shorter than the typical vacuum lifetime of the trapped atoms.</p></details> |  |\n| **[Persistence-Augmented Neural Networks](https://arxiv.org/abs/2604.08469v1)** | 2026-04-09 | <details><summary>Show</summary><p>Topological Data Analysis (TDA) provides tools to describe the shape of data, but integrating topological features into deep learning pipelines remains challenging, especially when preserving local geometric structure rather than summarizing it globally. We propose a persistence-based data augmentation framework that encodes local gradient flow regions and their hierarchical evolution using the Morse-Smale complex. This representation, compatible with both convolutional and graph neural networks, retains spatially localized topological information across multiple scales. Importantly, the augmentation procedure itself is efficient, with computational complexity $O(n \\log n)$, making it practical for large datasets. We evaluate our method on histopathology image classification and 3D porous material regression, where it consistently outperforms baselines and global TDA descriptors such as persistence images and landscapes. We also show that pruning the base level of the hierarchy reduces memory usage while maintaining competitive performance. These results highlight the potential of local, structured topological augmentation for scalable and interpretable learning across data modalities.</p></details> |  |\n| **[U-CECE: A Universal Multi-Resolution Framework for Conceptual Counterfactual Explanations](https://arxiv.org/abs/2604.08295v1)** | 2026-04-09 | <details><summary>Show</summary><p>As AI models grow more complex, explainability is essential for building trust, yet concept-based counterfactual methods still face a trade-off between expressivity and efficiency. Representing underlying concepts as atomic sets is fast but misses relational context, whereas full graph representations are more faithful but require solving the NP-hard Graph Edit Distance (GED) problem. We propose U-CECE, a unified, model-agnostic multi-resolution framework for conceptual counterfactual explanations that adapts to data regime and compute budget. U-CECE spans three levels of expressivity: atomic concepts for broad explanations, relational sets-of-sets for simple interactions, and structural graphs for full semantic structure. At the structural level, both a precision-oriented transductive mode based on supervised Graph Neural Networks (GNNs) and a scalable inductive mode based on unsupervised graph autoencoders (GAEs) are supported. Experiments on the structurally divergent CUB and Visual Genome datasets characterize the efficiency-expressivity trade-off across levels, while human surveys and LVLM-based evaluation show that the retrieved structural counterfactuals are semantically equivalent to, and often preferred over, exact GED-based ground-truth explanations.</p></details> |  |\n| **[OpenGLT: A Comprehensive Benchmark of Graph Neural Networks for Graph-Level Tasks](https://arxiv.org/abs/2501.00773v3)** | 2026-04-09 | <details><summary>Show</summary><p>Graphs are fundamental data structures for modeling complex interactions in domains such as social networks, molecular structures, and biological systems. Graph-level tasks, which involve predicting properties or labels for entire graphs, are crucial for applications like molecular property prediction and subgraph counting. While Graph Neural Networks (GNNs) have shown significant promise for these tasks, their evaluations are often limited by narrow datasets, insufficient architecture coverage, restricted task scope and scenarios, and inconsistent experimental setups, making it difficult to draw reliable conclusions across domains. In this paper, we present a comprehensive experimental study of GNNs on graph-level tasks, systematically categorizing them into five types: node-based, hierarchical pooling-based, subgraph-based, graph learning-based, and self-supervised learning-based GNNs. We propose a unified evaluation framework OpenGLT, which standardizes evaluation across four domains (social networks, biology, chemistry, and motif counting), two task types (classification and regression), and three real-world scenarios (clean, noisy, imbalanced, and few-shot graphs). Extensive experiments on 20 models across 26 classification and regression datasets reveal that: (i) no single architecture dominates both effectiveness and efficiency universally, i.e., subgraph-based GNNs excel in expressiveness, graph learning-based and SSL-based methods in robustness, and node-based and pooling-based models in efficiency; and (ii) specific graph topological features such as density and centrality can partially guide the selection of suitable GNN architectures for different graph characteristics.</p></details> |  |\n| **[Graph Neural Networks for Misinformation Detection: Performance-Efficiency Trade-offs](https://arxiv.org/abs/2604.08131v1)** | 2026-04-09 | <details><summary>Show</summary><p>The rapid spread of online misinformation has led to increasingly complex detection models, including large language models and hybrid architectures. However, their computational cost and deployment limitations raise concerns about practical applicability. In this work, we benchmark graph neural networks (GNNs) against non-graph-based machine learning methods under controlled and comparable conditions. We evaluate lightweight GNN architectures (GCN, GraphSAGE, GAT, ChebNet) against Logistic Regression, Support Vector Machines, and Multilayer Perceptrons across seven public datasets in English, Indonesian, and Polish. All models use identical TF-IDF features to isolate the impact of relational structure. Performance is measured using F1 score, with inference time reported to assess efficiency. GNNs consistently outperform non-graph baselines across all datasets. For example, GraphSAGE achieves 96.8% F1 on Kaggle and 91.9% on WELFake, compared to 73.2% and 66.8% for MLP, respectively. On COVID-19, GraphSAGE reaches 90.5% F1 vs. 74.9%, while ChebNet attains 79.1% vs. 66.4% on FakeNewsNet. These gains are achieved with comparable or lower inference times. Overall, the results show that classic GNNs remain effective and efficient, challenging the need for increasingly complex architectures in misinformation detection.</p></details> | <details><summary>Accep...</summary><p>Accepted at Computational Modeling and Artificial Intelligence for Social Systems Track in ICCS 2026</p></details> |\n| **[AFGNN: API Misuse Detection using Graph Neural Networks and Clustering](https://arxiv.org/abs/2604.07891v1)** | 2026-04-09 | <details><summary>Show</summary><p>Application Programming Interfaces (APIs) are crucial to software development, enabling integration of existing systems with new applications by reusing tried and tested code, saving development time and increasing software safety. In particular, the Java standard library APIs, along with numerous third-party APIs, are extensively utilized in the development of enterprise application software. However, their misuse remains a significant source of bugs and vulnerabilities. Furthermore, due to the limited examples in the official API documentation, developers often rely on online portals and generative AI models to learn unfamiliar APIs, but using such examples may introduce unintentional errors in the software. In this paper, we present AFGNN, a novel Graph Neural Network (GNN)-based framework for efficiently detecting API misuses in Java code. AFGNN uses a novel API Flow Graph (AFG) representation that captures the API execution sequence, data, and control flow information present in the code to model the API usage patterns. AFGNN uses self-supervised pre-training with AFG representation to effectively compute the embeddings for unknown API usage examples and cluster them to identify different usage patterns. Experiments on popular API usage datasets show that AFGNN significantly outperforms state-of-the-art small language models and API misuse detectors.</p></details> |  |\n| **[Toward Generalizable Graph Learning for 3D Engineering AI: Explainable Workflows for CAE Mode Shape Classification and CFD Field Prediction](https://arxiv.org/abs/2604.07781v1)** | 2026-04-09 | <details><summary>Show</summary><p>Automotive engineering development increasingly relies on heterogeneous 3D data, including finite element (FE) models, body-in-white (BiW) representations, CAD geometry, and CFD meshes. At the same time, engineering teams face growing pressure to shorten development cycles, improve performance and accelerate innovation. Although artificial intelligence (AI) is increasingly explored in this domain, many current methods remain task-specific, difficult to interpret, and hard to reuse across development stages. This paper presents a practical graph learning framework for 3D engineering AI, in which heterogeneous engineering assets are converted into physics-aware graph representations and processed by Graph Neural Networks (GNNs). The framework is designed to support both classification and prediction tasks. The framework is validated on two automotive applications: CAE vibration mode shape classification and CFD aerodynamic field prediction. For CAE vibration mode classification, a region-aware BiW graph supports explainable mode classification across vehicle and FE variants under label scarcity. For CFD aerodynamic field prediction, a physics-informed surrogate predicts pressure and wall shear stress (WSS) across aerodynamic body shape variants, while symmetry preserving down sampling retains accuracy with lower computational cost. The framework also outlines data generation guidance that can help engineers identify which additional simulations or labels are valuable to collect next. These results demonstrate a practical and reusable engineering AI workflow for more trustworthy CAE and CFD decision support.</p></details> |  |\n| **[Graph Neural ODE Digital Twins for Control-Oriented Reactor Thermal-Hydraulic Forecasting Under Partial Observability](https://arxiv.org/abs/2604.07292v1)** | 2026-04-08 | <details><summary>Show</summary><p>Real-time supervisory control of advanced reactors requires accurate forecasting of plant-wide thermal-hydraulic states, including locations where physical sensors are unavailable. Meeting this need calls for surrogate models that combine predictive fidelity, millisecond-scale inference, and robustness to partial observability. In this work, we present a physics-informed message-passing Graph Neural Network coupled with a Neural Ordinary Differential Equation (GNN-ODE) to addresses all three requirements simultaneously. We represent the whole system as a directed sensor graph whose edges encode hydraulic connectivity through flow/heat transfer-aware message passing, and we advance the latent dynamics in continuous time via a controlled Neural ODE. A topology-guided missing-node initializer reconstructs uninstrumented states at rollout start; prediction then proceeds fully autoregressively. The GNN-ODE surrogate achieves satisfactory results for the system dynamics prediction. On held-out simulation transients, the surrogate achieves an average MAE of 0.91 K at 60 s and 2.18 K at 300 s for uninstrumented nodes, with $R^2$ up to 0.995 for missing-node state reconstruction. Inference runs at approximately 105 times faster than simulated time on a single GPU, enabling 64-member ensemble rollouts for uncertainty quantification. To assess sim-to-real transfer, we adapt the pretrained surrogate to experimental facility data using layerwise discriminative fine-tuning with only 30 training sequences. The learned flow-dependent heat-transfer scaling recovers a Reynolds-number exponent consistent with established correlations, indicating constitutive learning beyond trajectory fitting. The model tracks a steep power change transient and produces accurate trajectories at uninstrumented locations.</p></details> |  |\n| **[BadImplant: Injection-based Multi-Targeted Graph Backdoor Attack](https://arxiv.org/abs/2601.15474v2)** | 2026-04-08 | <details><summary>Show</summary><p>Graph neural network (GNN) have demonstrated exceptional performance in solving critical problems across diverse domains yet remain susceptible to backdoor attacks. Existing studies on backdoor attack for graph classification are limited to single target attack using subgraph replacement based mechanism where the attacker implants only one trigger into the GNN model. In this paper, we introduce the first multi-targeted backdoor attack for graph classification task, where multiple triggers simultaneously redirect predictions to different target labels. Instead of subgraph replacement, we propose subgraph injection which preserves the structure of the original graphs while poisoning the clean graphs. Extensive experiments demonstrate the efficacy of our approach, where our attack achieves high attack success rates for all target labels with minimal impact on the clean accuracy. Experimental results on five dataset demonstrate the superior performance of our attack framework compared to the conventional subgraph replacement-based attack. Our analysis on four GNN models confirms the generalization capability of our attack which is effective regardless of the GNN model architectures and training parameters settings. We further investigate the impact of the attack design parameters including injection methods, number of connections, trigger sizes, trigger edge density and poisoning ratios. Additionally, our evaluation against state-of-the-art defenses (randomized smoothing and fine-pruning) demonstrates the robustness of our proposed multi-target attacks. This work highlights the GNN vulnerability against multi-targeted backdoor attack in graph classification task. Our source codes will be available at https://github.com/SiSL-URI/Multi-Targeted-Graph-Backdoor-Attack.</p></details> |  |\n| **[Self-Discovered Intention-aware Transformer for Multi-modal Vehicle Trajectory Prediction](https://arxiv.org/abs/2604.07126v1)** | 2026-04-08 | <details><summary>Show</summary><p>Predicting vehicle trajectories plays an important role in autonomous driving and ITS applications. Although multiple deep learning algorithms are devised to predict vehicle trajectories, their reliant on specific graph structure (e.g., Graph Neural Network) or explicit intention labeling limit their flexibilities. In this study, we propose a pure Transformer-based network with multiple modals considering their neighboring vehicles. Two separate tracks are employed. One track focuses on predicting the trajectories while the other focuses on predicting the likelihood of each intention considering neighboring vehicles. Study finds that the two track design can increase the performance by separating spatial module from the trajectory generating module. Also, we find the the model can learn an ordered group of trajectories by predicting residual offsets among K trajectories.</p></details> | 5 pages, 2 figures |\n| **[A Texture-Generalizable Deep Material Network via Orientation-Aware Interaction Learning for Polycrystal Modeling and Texture Evolution](https://arxiv.org/abs/2512.06779v2)** | 2026-04-08 | <details><summary>Show</summary><p>Machine learning surrogate models have emerged as a promising approach for accelerating multiscale materials simulations while preserving predictive fidelity. Among them, the Orientation-aware Interaction-based Deep Material Network (ODMN) provides a hierarchical homogenization framework in which material nodes encode crystallographic texture and interaction nodes enforce stress equilibrium under the Hill--Mandel condition. Trained solely on linear-elastic stiffness data, ODMN captures intrinsic microstructure--mechanics relationships, enabling accurate prediction of nonlinear mechanical responses and texture evolution. However, its applicability remains fundamentally limited by the absence of a parametric mapping from arbitrary microstructures to the ODMN parameter space. This limitation necessitates retraining for each new microstructure. To address this challenge, we reformulate ODMN generalization as a microstructure-to-parameter inference problem and propose the TACS--GNN--ODMN framework. The proposed framework combines a Texture-Adaptive Clustering and Sampling (TACS) scheme for texture representation with a Graph Neural Network (GNN) for inferring micromechanical equilibrium parameters. This strategy enables the construction of fully parameterized ODMNs for previously unseen microstructures without retraining. Numerical results demonstrate that the proposed framework accurately predicts nonlinear mechanical responses and texture evolution across diverse texture distributions. The predicted responses show close agreement with direct numerical simulations (DNS), highlighting the framework as a generalizable and physically interpretable surrogate model for microstructure-informed multiscale materials simulations.</p></details> |  |\n| **[Equivariant Multi-agent Reinforcement Learning for Multimodal Vehicle-to-Infrastructure Systems](https://arxiv.org/abs/2604.06914v1)** | 2026-04-08 | <details><summary>Show</summary><p>In this paper, we study a vehicle-to-infrastructure (V2I) system where distributed base stations (BSs) acting as road-side units (RSUs) collect multimodal (wireless and visual) data from moving vehicles. We consider a decentralized rate maximization problem, where each RSU relies on its local observations to optimize its resources, while all RSUs must collaborate to guarantee favorable network performance. We recast this problem as a distributed multi-agent reinforcement learning (MARL) problem, by incorporating rotation symmetries in terms of vehicles' locations. To exploit these symmetries, we propose a novel self-supervised learning framework where each BS agent aligns the latent features of its multimodal observation to extract the positions of the vehicles in its local region. Equipped with this sensing data at each RSU, we train an equivariant policy network using a graph neural network (GNN) with message passing layers, such that each agent computes its policy locally, while all agents coordinate their policies via a signaling scheme that overcomes partial observability and guarantees the equivariance of the global policy. We present numerical results carried out in a simulation environment, where ray-tracing and computer graphics are used to collect wireless and visual data. Results show the generalizability of our self-supervised and multimodal sensing approach, achieving more than two-fold accuracy gains over baselines, and the efficiency of our equivariant MARL training, attaining more than 50% performance gains over standard approaches.</p></details> |  |\n| **[FlowAdam: Implicit Regularization via Geometry-Aware Soft Momentum Injection](https://arxiv.org/abs/2604.06652v1)** | 2026-04-08 | <details><summary>Show</summary><p>Adaptive moment methods such as Adam use a diagonal, coordinate-wise preconditioner based on exponential moving averages of squared gradients. This diagonal scaling is coordinate-system dependent and can struggle with dense or rotated parameter couplings, including those in matrix factorization, tensor decomposition, and graph neural networks, because it treats each parameter independently. We introduce FlowAdam, a hybrid optimizer that augments Adam with continuous gradient-flow integration via an ordinary differential equation (ODE). When EMA-based statistics detect landscape difficulty, FlowAdam switches to clipped ODE integration. Our central contribution is Soft Momentum Injection, which blends ODE velocity with Adam's momentum during mode transitions. This prevents the training collapse observed with naive hybrid approaches. Across coupled optimization benchmarks, the ODE integration provides implicit regularization, reducing held-out error by 10-22% on low-rank matrix/tensor recovery and 6% on Jester (real-world collaborative filtering), also surpassing tuned Lion and AdaBelief, while matching Adam on well-conditioned workloads (CIFAR-10). MovieLens-100K confirms benefits arise specifically from coupled parameter interactions rather than bias estimation. Ablation studies show that soft injection is essential, as hard replacement reduces accuracy from 100% to 82.5%.</p></details> | <details><summary>Accep...</summary><p>Accepted at IJCNN 2026 (IEEE WCCI). 8 pages, 4 figures</p></details> |\n| **[Predicting Alzheimer's disease progression using rs-fMRI and a history-aware graph neural network](https://arxiv.org/abs/2604.06469v1)** | 2026-04-07 | <details><summary>Show</summary><p>Alzheimer's disease (AD) is a neurodegenerative disorder that affects more than seven million people in the United States alone. AD currently has no cure, but there are ways to potentially slow its progression if caught early enough. In this study, we propose a graph neural network (GNN)-based model for predicting whether a subject will transition to a more severe stage of cognitive impairment at their next clinical visit. We consider three stages of cognitive impairment in order of severity: cognitively normal (CN), mild cognitive impairment (MCI), and AD. We use functional connectivity graphs derived from resting-state functional magnetic resonance imaging (rs-fMRI) scans of 303 subjects, each with a different number of visits. Our GNN-based model incorporates a recurrent neural network (RNN) block, enabling it to process data from the subject's entire visit history. It can also work with irregular time gaps between visits by incorporating visit distance information into our input features. Our model demonstrates robust predictive performance, even with missing visits in the subjects' visit histories. It achieves an accuracy of 82.9%, with an especially impressive accuracy of 68.8% on CN to MCI conversions - a task that poses a substantial challenge in the field. Our results highlight the effectiveness of rs-fMRI in predicting the onset of MCI or AD and, in conjunction with other modalities, could offer a viable method for enabling timely interventions to slow the progression of cognitive impairment.</p></details> | <details><summary>Proc....</summary><p>Proc. SPIE 13926, Medical Imaging 2026: Computer-Aided Diagnosis, 1392604</p></details> |\n| **[k-Maximum Inner Product Attention for Graph Transformers and the Expressive Power of GraphGPS](https://arxiv.org/abs/2604.03815v2)** | 2026-04-07 | <details><summary>Show</summary><p>Graph transformers have shown promise in overcoming limitations of traditional graph neural networks, such as oversquashing and difficulties in modeling long-range dependencies. However, their application to large-scale graphs is hindered by the quadratic memory and computational complexity of the all-to-all attention mechanism. Although alternatives such as linearized attention and restricted attention patterns have been proposed, these often degrade performance or limit expressive power. To better balance efficiency and effectiveness, we introduce k-Maximum Inner Product (k-MIP) attention for graph transformers. k-MIP attention selects the most relevant key nodes per query via a top-k operation, yielding a sparse yet flexible attention pattern. Combined with an attention score computation based on symbolic matrices, this results in linear memory complexity and practical speedups of up to an order of magnitude compared to all-to-all attention, enabling the processing of graphs with over 500k nodes on a single A100 GPU. We provide a theoretical analysis of expressive power, showing that k-MIP attention does not compromise the expressiveness of graph transformers: specifically, we prove that k-MIP transformers can approximate any full-attention transformer to arbitrary precision. In addition, we analyze the expressive power of the GraphGPS framework, in which we integrate our attention mechanism, and establish an upper bound on its graph distinguishing capability in terms of the S-SEG-WL test. Finally, we validate our approach on the Long Range Graph Benchmark, the City-Networks benchmark, and two custom large-scale inductive point cloud datasets, consistently ranking among the top-performing scalable graph transformers.</p></details> | <details><summary>Accep...</summary><p>Accepted at the ICLR 2026 GRaM Workshop. 9 pages, 9 figures, 16 tables; 30 pages of supplementary material</p></details> |\n| **[Toward a universal foundation model for graph-structured data](https://arxiv.org/abs/2604.06391v1)** | 2026-04-07 | <details><summary>Show</summary><p>Graphs are a central representation in biomedical research, capturing molecular interaction networks, gene regulatory circuits, cell--cell communication maps, and knowledge graphs. Despite their importance, currently there is not a broadly reusable foundation model available for graph analysis comparable to those that have transformed language and vision. Existing graph neural networks are typically trained on a single dataset and learn representations specific only to that graph's node features, topology, and label space, limiting their ability to transfer across domains. This lack of generalization is particularly problematic in biology and medicine, where networks vary substantially across cohorts, assays, and institutions. Here we introduce a graph foundation model designed to learn transferable structural representations that are not specific to specific node identities or feature schemes. Our approach leverages feature-agnostic graph properties, including degree statistics, centrality measures, community structure indicators, and diffusion-based signatures, and encodes them as structural prompts. These prompts are integrated with a message-passing backbone to embed diverse graphs into a shared representation space. The model is pretrained once on heterogeneous graphs and subsequently reused on unseen datasets with minimal adaptation. Across multiple benchmarks, our pretrained model matches or exceeds strong supervised baselines while demonstrating superior zero-shot and few-shot generalization on held-out graphs. On the SagePPI benchmark, supervised fine-tuning of the pretrained backbone achieves a mean ROC-AUC of 95.5%, a gain of 21.8% over the best supervised message-passing baseline. The proposed technique thus provides a unique approach toward reusable, foundation-scale models for graph-structured data in biomedical and network science applications.</p></details> | <details><summary>19 pa...</summary><p>19 pages, 5 figures, 12 supplementary figures</p></details> |\n| **[BiScale-GTR: Fragment-Aware Graph Transformers for Multi-Scale Molecular Representation Learning](https://arxiv.org/abs/2604.06336v1)** | 2026-04-07 | <details><summary>Show</summary><p>Graph Transformers have recently attracted attention for molecular property prediction by combining the inductive biases of graph neural networks (GNNs) with the global receptive field of Transformers. However, many existing hybrid architectures remain GNN-dominated, causing the resulting representations to remain heavily shaped by local message passing. Moreover, most existing methods operate at only a single structural granularity, limiting their ability to capture molecular patterns that span multiple molecular scales. We introduce BiScale-GTR, a unified framework for self-supervised molecular representation learning that combines chemically grounded fragment tokenization with adaptive multi-scale reasoning. Our method improves graph Byte Pair Encoding (BPE) tokenization to produce consistent, chemically valid, and high-coverage fragment tokens, which are used as fragment-level inputs to a parallel GNN-Transformer architecture. Architecturally, atom-level representations learned by a GNN are pooled into fragment-level embeddings and fused with fragment token embeddings before Transformer reasoning, enabling the model to jointly capture local chemical environments, substructure-level motifs, and long-range molecular dependencies. Experiments on MoleculeNet, PharmaBench, and the Long Range Graph Benchmark (LRGB) demonstrate state-of-the-art performance across both classification and regression tasks. Attribution analysis further shows that BiScale-GTR highlights chemically meaningful functional motifs, providing interpretable links between molecular structure and predicted properties. Code will be released upon acceptance.</p></details> |  |\n| **[Interpreting Temporal Graph Neural Networks with Koopman Theory](https://arxiv.org/abs/2410.13469v2)** | 2026-04-07 | <details><summary>Show</summary><p>Spatiotemporal graph neural networks (STGNNs) have shown promising results in many domains, from forecasting to epidemiology. However, understanding the dynamics learned by these models and explaining their behaviour is significantly more difficult than for models that deal with static data. Inspired by Koopman theory, which allows a simple description of intricate, nonlinear dynamical systems, we introduce new explainability approaches for temporal graphs. Specifically, we present two methods to interpret the STGNN's decision process and identify the most relevant spatial and temporal patterns in the input for the task at hand. The first relies on dynamic mode decomposition (DMD), a Koopman-inspired dimensionality reduction method. The second relies on sparse identification of nonlinear dynamics (SINDy), a popular method for discovering governing equations of dynamical systems, which we use for the first time as a general tool for explainability. On semi-synthetic dissemination datasets, our methods correctly identify interpretable features such as the times at which infections occur and the infected nodes. We also validate the methods qualitatively on a real-world human motion dataset, where the explanations highlight the body parts most relevant for action recognition.</p></details> |  |\n| **[Graph-PiT: Enhancing Structural Coherence in Part-Based Image Synthesis via Graph Priors](https://arxiv.org/abs/2604.06074v1)** | 2026-04-07 | <details><summary>Show</summary><p>Achieving fine-grained and structurally sound controllability is a cornerstone of advanced visual generation. Existing part-based frameworks treat user-provided parts as an unordered set and therefore ignore their intrinsic spatial and semantic relationships, which often results in compositions that lack structural integrity. To bridge this gap, we propose Graph-PiT, a framework that explicitly models the structural dependencies of visual components using a graph prior. Specifically, we represent visual parts as nodes and their spatial-semantic relationships as edges. At the heart of our method is a Hierarchical Graph Neural Network (HGNN) module that performs bidirectional message passing between coarse-grained part-level super-nodes and fine-grained IP+ token sub-nodes, refining part embeddings before they enter the generative pipeline. We also introduce a graph Laplacian smoothness loss and an edge-reconstruction loss so that adjacent parts acquire compatible, relation-aware embeddings. Quantitative experiments on controlled synthetic domains (character, product, indoor layout, and jigsaw), together with qualitative transfer to real web images, show that Graph-PiT improves structural coherence over vanilla PiT while remaining compatible with the original IP-Prior pipeline. Ablation experiments confirm that explicit relational reasoning is crucial for enforcing user-specified adjacency constraints. Our approach not only enhances the plausibility of generated concepts but also offers a scalable and interpretable mechanism for complex, multi-part image synthesis. The code is available at https://github.com/wolf-bailang/Graph-PiT.</p></details> | <details><summary>11 pa...</summary><p>11 pages, 5 figures, Accepted by ICME 2026</p></details> |\n| **[Incident-Guided Spatiotemporal Traffic Forecasting](https://arxiv.org/abs/2602.02528v2)** | 2026-04-07 | <details><summary>Show</summary><p>Recent years have witnessed the rapid development of deep-learning-based, graph-neural-network-based forecasting methods for modern intelligent transportation systems. However, most existing work focuses exclusively on capturing spatio-temporal dependencies from historical traffic data, while overlooking the fact that suddenly occurring transportation incidents, such as traffic accidents and adverse weather, serve as external disturbances that can substantially alter temporal patterns. We argue that this issue has become a major obstacle to modeling the dynamics of traffic systems and improving prediction accuracy, but the unpredictability of incidents makes it difficult to observe patterns from historical sequences. To address these challenges, this paper proposes a novel framework named the Incident-Guided Spatiotemporal Graph Neural Network (IGSTGNN). IGSTGNN explicitly models the incident's impact through two core components: an Incident-Context Spatial Fusion (ICSF) module to capture the initial heterogeneous spatial influence, and a Temporal Incident Impact Decay (TIID) module to model the subsequent dynamic dissipation. To facilitate research on the spatio-temporal impact of incidents on traffic flow, a large-scale dataset is constructed and released, featuring incident records that are time-aligned with traffic time series. On this new benchmark, the proposed IGSTGNN framework is demonstrated to achieve state-of-the-art performance. Furthermore, the generalizability of the ICSF and TIID modules is validated by integrating them into various existing models.</p></details> |  |\n| **[Energy-Balanced Hyperspherical Graph Representation Learning via Structural Binding and Entropic Dispersion](https://arxiv.org/abs/2512.24062v2)** | 2026-04-07 | <details><summary>Show</summary><p>Graph Representation Learning (GRL) can be fundamentally modeled as a physical process of seeking an energy equilibrium state for a node system on a latent manifold. However, existing Graph Neural Networks (GNNs) often suffer from uncontrolled energy dissipation during message passing, driving the system towards a state of Thermal Death--manifested as feature collapse or over-smoothing--due to the absence of explicit thermodynamic constraints. To address this, we propose HyperGRL, a thermodynamics-driven framework that embeds nodes on a unit hypersphere by minimizing a Helmholtz free energy objective composed of two competing potentials. First, we introduce Structural Binding Energy (via Neighbor-Mean Alignment), which functions as a local binding force to strengthen structural cohesion, encouraging structurally related nodes to form compact local clusters. Second, to counteract representation collapse, we impose a Mean-Field Repulsive Potential (via Sampling-Free Uniformity), which acts as a global entropic force to maximize representation dispersion without the need for negative sampling. Crucially, to govern the trade-off between local alignment and global uniformity, we devise an Adaptive Thermostat. This entropy-guided strategy dynamically regulates the system's \"temperature\" during training, guiding the representation towards a robust metastable state that balances local cohesion with global discriminability. Extensive experiments on node classification, node clustering, and link prediction show that HyperGRL consistently achieves strong performance across diverse benchmark datasets, yielding more discriminative and robust representations while alleviating over-smoothing.</p></details> | <details><summary>Submi...</summary><p>Submitted to Knowledge-Based Systems</p></details> |\n| **[CROSS-Net: Region-Agnostic Taxi-Demand Prediction Using Feature Disentanglement](https://arxiv.org/abs/2310.18215v2)** | 2026-04-07 | <details><summary>Show</summary><p>The growing demand for ride-hailing services has led to an increasing need for accurate taxi demand prediction. Existing systems are limited to specific regions, lacking generality to unseen areas. This paper presents a novel taxi demand prediction system, harnessing the strengths of multiview graph neural networks to capture spatial-temporal dependencies and patterns in urban environments. Additionally, the proposed system CROSS-Net employs a spatially transferable approach, enabling it to train a model that can be deployed to previously unseen regions. To achieve this, the framework incorporates the power of a Variational Autoencoder to disentangle the input features into region-specific and region-agnostic components. The region-agnostic features facilitate cross-region taxi demand predictions, allowing the model to generalize well across different urban areas. Experimental results demonstrate the effectiveness of CROSS-Net in accurately forecasting taxi demand, even in previously unobserved regions, thus showcasing its potential for optimizing taxi services and improving transportation efficiency on a broader scale.</p></details> | <details><summary>An ac...</summary><p>An accepted journal article on IEEE Transactions on Intelligent Transportation System</p></details> |\n| **[Ollivier-Ricci Curvature of Riemannian Manifolds and Directed Graphs with Applications to Graph Neural Networks](https://arxiv.org/abs/2604.14211v1)** | 2026-04-06 | <details><summary>Show</summary><p>This thesis is an exposition of Ollivier-Ricci Curvature of metric spaces as introduced by Yann Ollivier, which is based upon the 1-Wasserstein Distance and optimal transport theory. We present some of the major results and proofs that connect Ollivier-Ricci curvature with classical Ricci curvature of Riemannian manifolds, including extensions of various theoretical bounds and theorems such as Bonnet-Myers and Levy-Gromov. Then we shift to results introduced by Lin-Lu-Yau on an extension of Ollivier-Ricci curvature on graphs, as well as the work of Jost-Liu on proving various combinatorial bounds for graph Ollivier-Ricci curvature. At the end of this thesis we present novel ideas and proofs regarding extensions of these results to directed graphs, and finally applications of graph-based Ollivier-Ricci curvature to various algorithms in network science and graph machine learning.</p></details> |  |\n| **[Deterministic and probabilistic neural surrogates of global hybrid-Vlasov simulations](https://arxiv.org/abs/2601.12614v3)** | 2026-04-06 | <details><summary>Show</summary><p>Hybrid-Vlasov simulations resolve ion-kinetic effects in the solar wind-magnetosphere interaction, but even 5D (2D + 3V) configurations are computationally expensive. We show that graph-based machine learning emulators can learn the spatiotemporal evolution of electromagnetic fields and lower order moments of ion velocity distribution in the near-Earth space environment from four 5D Vlasiator runs performed with identical steady solar wind conditions. The initial ion number density is systematically varied, while the grid spacing is held constant, to scan the ratio of the characteristic ion skin depth to the numerical grid size. Using a graph neural network (GNN) operating on the 2D spatial simulation grid comprising 670k cells, we demonstrate that both a deterministic forecasting model (Graph-FM) and a probabilistic ensemble forecasting model (Graph-EFM) based on a latent variable formulation are capable of producing accurate predictions of future plasma states. A divergence penalty is incorporated to encourage divergence-freeness in the magnetic fields. For the probabilistic model, a continuous ranked probability score objective is added to improve the calibration of the ensemble forecasts. The trained emulators achieve over two orders of magnitude speedup per time step on a single GPU compared to 100 CPU Vlasiator simulations. Most forecasted fields have Pearson correlations above 0.95 at 50 seconds lead time. However, we find that fields that exhibit near-zero degenerate distributions in the 5D setting are more challenging for the emulator to maintain high correlations for. Overall, these results demonstrate that GNNs provide a viable framework for rapid ensemble generation in hybrid-Vlasov modeling and highlight promising directions for future work.</p></details> |  |\n| **[Graph Signal Diffusion Models for Wireless Resource Allocation](https://arxiv.org/abs/2604.05175v1)** | 2026-04-06 | <details><summary>Show</summary><p>We consider constrained ergodic resource optimization in wireless networks with graph-structured interference. We train a diffusion model policy to match expert conditional distributions over resource allocations. By leveraging a primal-dual (expert) algorithm, we generate primal iterates that serve as draws from the corresponding expert conditionals for each training network instance. We view the allocations as stochastic graph signals supported on known channel state graphs. We implement the diffusion model architecture as a U-Net hierarchy of graph neural network (GNN) blocks, conditioned on the channel states and additional node states. At inference, the learned generative model amortizes the iterative expert policy by directly sampling allocation vectors from the near-optimal conditional distributions. In a power-control case study, we show that time-sharing the generated power allocations achieves near-optimal ergodic sum-rate utility and near-feasible ergodic minimum-rates, with strong generalization and transferability across network states.</p></details> | <details><summary>Under...</summary><p>Under review for SPAWC'26</p></details> |\n| **[General Geospatial Inference with a Population Dynamics Foundation Model](https://arxiv.org/abs/2411.07207v6)** | 2026-04-06 | <details><summary>Show</summary><p>Supporting the health and well-being of dynamic populations around the world requires governmental agencies, organizations and researchers to understand and reason over complex relationships between human behavior and local contexts in order to identify high-risk groups and strategically allocate limited resources. Traditional approaches to these classes of problems often entail developing manually curated, task-specific features and models to represent human behavior and the natural and built environment, which can be challenging to adapt to new, or even, related tasks. To address this, we introduce a Population Dynamics Foundation Model (PDFM) that aims to capture the relationships between diverse data modalities and is applicable to a broad range of geospatial tasks. We first construct a geo-indexed dataset for postal codes and counties across the United States, capturing rich aggregated information on human behavior from maps, busyness, and aggregated search trends, and environmental factors such as weather and air quality. We then model this data and the complex relationships between locations using a graph neural network, producing embeddings that can be adapted to a wide range of downstream tasks using relatively simple models. We evaluate the effectiveness of our approach by benchmarking it on 27 downstream tasks spanning three distinct domains: health indicators, socioeconomic factors, and environmental measurements. The approach achieves state-of-the-art performance on all 27 geospatial interpolation tasks, and on 25 out of the 27 extrapolation and super-resolution tasks. We combined the PDFM with a state-of-the-art forecasting foundation model, TimesFM, to predict unemployment and poverty, achieving performance that surpasses fully supervised forecasting. The full set of embeddings and sample code are publicly available for researchers.</p></details> | <details><summary>updat...</summary><p>updated access information</p></details> |\n| **[Empowering Power Outage Prediction with Spatially Aware Hybrid Graph Neural Networks and Contrastive Learning](https://arxiv.org/abs/2604.04916v1)** | 2026-04-06 | <details><summary>Show</summary><p>Extreme weather events, such as severe storms, hurricanes, snowstorms, and ice storms, which are exacerbated by climate change, frequently cause widespread power outages. These outages halt industrial operations, impact communities, damage critical infrastructure, profoundly disrupt economies, and have far-reaching effects across various sectors. To mitigate these effects, the University of Connecticut and Eversource Energy Center have developed an outage prediction modeling (OPM) system to provide pre-emptive forecasts for electric distribution networks before such weather events occur. However, existing predictive models in the system do not incorporate the spatial effect of extreme weather events. To this end, we develop Spatially Aware Hybrid Graph Neural Networks (SA-HGNN) with contrastive learning to enhance the OPM predictions for extreme weather-induced power outages. Specifically, we first encode spatial relationships of both static features (e.g., land cover, infrastructure) and event-specific dynamic features (e.g., wind speed, precipitation) via Spatially Aware Hybrid Graph Neural Networks (SA-HGNN). Next, we leverage contrastive learning to handle the imbalance problem associated with different types of extreme weather events and generate location-specific embeddings by minimizing intra-event distances between similar locations while maximizing inter-event distances across all locations. Thorough empirical studies in four utility service territories, i.e., Connecticut, Western Massachusetts, Eastern Massachusetts, and New Hampshire, demonstrate that SA-HGNN can achieve state-of-the-art performance for power outage prediction.</p></details> |  |\n| **[Pixels or Positions? Benchmarking Modalities in Group Activity Recognition](https://arxiv.org/abs/2511.12606v2)** | 2026-04-06 | <details><summary>Show</summary><p>Group Activity Recognition (GAR) is well studied on the video modality for surveillance and indoor team sports (e.g., volleyball, basketball). Yet, other modalities such as agent positions and trajectories over time, i.e. tracking, remain comparatively under-explored despite being compact, agent-centric signals that explicitly encode spatial interactions. Understanding whether pixel (video) or position (tracking) modalities leads to better group activity recognition is therefore important to drive further research on the topic. However, no standardized benchmark currently exists that aligns broadcast video and tracking data for the same group activities, leading to a lack of apples-to-apples comparison between these modalities for GAR. In this work, we introduce SoccerNet-GAR, a multimodal dataset built from the $64$ matches of the football World Cup 2022. Specifically, the broadcast videos and player tracking modalities for $87{,}939$ group activities are synchronized and annotated with $10$ categories. Furthermore, we define a unified evaluation protocol to benchmark two strong unimodal approaches: (i) competitive video-based classifiers and (ii) tracking-based classifiers leveraging graph neural networks. In particular, our novel role-aware graph architecture for tracking-based GAR directly encodes tactical structure through positional edges connecting players by their on-pitch roles. Our tracking model achieves $77.8\\%$ balanced accuracy compared to $60.9\\%$ for the best video baseline, while training with $7 \\times$ less GPU hours and $479 \\times$ fewer parameters ($180K$ vs. $86.3M$). This study provides new insights into the relative strengths of pixels and positions for group activity recognition in sports.</p></details> |  |\n| **[MAVEN: A Mesh-Aware Volumetric Encoding Network for Simulating 3D Flexible Deformation](https://arxiv.org/abs/2604.04474v1)** | 2026-04-06 | <details><summary>Show</summary><p>Deep learning-based approaches, particularly graph neural networks (GNNs), have gained prominence in simulating flexible deformations and contacts of solids, due to their ability to handle unstructured physical fields and nonlinear regression on graph structures. However, existing GNNs commonly represent meshes with graphs built solely from vertices and edges. These approaches tend to overlook higher-dimensional spatial features, e.g., 2D facets and 3D cells, from the original geometry. As a result, it is challenging to accurately capture boundary representations and volumetric characteristics, though this information is critically important for modeling contact interactions and internal physical quantity propagation, particularly under sparse mesh discretization. In this paper, we introduce MAVEN, a mesh-aware volumetric encoding network for simulating 3D flexible deformation, which explicitly models geometric mesh elements of higher dimension to achieve a more accurate and natural physical simulation. MAVEN establishes learnable mappings among 3D cells, 2D facets, and vertices, enabling flexible mutual transformations. Explicit geometric features are incorporated into the model to alleviate the burden of implicitly learning geometric patterns. Experimental results show that MAVEN consistently achieves state-of-the-art performance across established datasets and a novel metal stretch-bending task featuring large deformations and prolonged contacts.</p></details> |  |\n| **[Thermodynamic-Inspired Explainable GeoAI: Uncovering Regime-Dependent Mechanisms in Heterogeneous Spatial Systems](https://arxiv.org/abs/2604.04339v1)** | 2026-04-06 | <details><summary>Show</summary><p>Modeling spatial heterogeneity and associated critical transitions remains a fundamental challenge in geography and environmental science. While conventional Geographically Weighted Regression (GWR) and deep learning models have improved predictive skill, they often fail to elucidate state-dependent nonlinearities where the functional roles of drivers represent opposing effects across heterogeneous domains. We introduce a thermodynamics-inspired explainable geospatial AI framework that integrates statistical mechanics with graph neural networks. By conceptualizing spatial variability as a thermodynamic competition between system Burden (E) and Capacity (S), our model disentangles the latent mechanisms driving spatial processes. Using three simulation datasets and three real-word datasets across distinct domains (housing markets, mental health prevalence, and wildfire-induced PM2.5 anomalies), we show that the new framework successfully identifies regime-dependent role reversals of predictors that standard baselines miss. Notably, the framework explicitly diagnoses the phase transition into a Burden-dominated regime during the 2023 Canadian wildfire event, distinguishing physical mechanism shifts from statistical outliers. These findings demonstrate that thermodynamic constraints can improve the interpretability of GeoAI while preserving strong predictive performance in complex spatial systems.</p></details> |  |\n| **[DSBD: Dual-Aligned Structural Basis Distillation for Graph Domain Adaptation](https://arxiv.org/abs/2604.03154v1)** | 2026-04-03 | <details><summary>Show</summary><p>Graph domain adaptation (GDA) aims to transfer knowledge from a labeled source graph to an unlabeled target graph under distribution shifts. However, existing methods are largely feature-centric and overlook structural discrepancies, which become particularly detrimental under significant topology shifts. Such discrepancies alter both geometric relationships and spectral properties, leading to unreliable transfer of graph neural networks (GNNs). To address this limitation, we propose Dual-Aligned Structural Basis Distillation (DSBD) for GDA, a novel framework that explicitly models and adapts cross-domain structural variation. DSBD constructs a differentiable structural basis by synthesizing continuous probabilistic prototype graphs, enabling gradient-based optimization over graph topology. The basis is learned under source-domain supervision to preserve semantic discriminability, while being explicitly aligned to the target domain through a dual-alignment objective. Specifically, geometric consistency is enforced via permutation-invariant topological moment matching, and spectral consistency is achieved through Dirichlet energy calibration, jointly capturing structural characteristics across domains. Furthermore, we introduce a decoupled inference paradigm that mitigates source-specific structural bias by training a new GNN on the distilled structural basis. Extensive experiments on graph and image benchmarks demonstrate that DSBD consistently outperforms state-of-the-art methods.</p></details> |  |\n\n"
  },
  {
    "path": "main.py",
    "content": "import sys\nimport time\nimport pytz\nfrom datetime import datetime\n\nfrom utils import get_daily_papers_by_keyword_with_retries, generate_table, back_up_files,\\\n    restore_files, remove_backups, get_daily_date\n\n\nbeijing_timezone = pytz.timezone('Asia/Shanghai')\n\n# NOTE: arXiv API seems to sometimes return an unexpected empty list.\n\n# get current beijing time date in the format of \"2021-08-01\"\ncurrent_date = datetime.now(beijing_timezone).strftime(\"%Y-%m-%d\")\n# get last update date from README.md\nwith open(\"README.md\", \"r\") as f:\n    while True:\n        line = f.readline()\n        if \"Last update:\" in line: break\n    last_update_date = line.split(\": \")[1].strip()\n    # if last_update_date == current_date:\n        # sys.exit(\"Already updated today!\")\n\nkeywords = [\"Time Series\", \"Trajectory\", \"Graph Neural Networks\"] # TODO add more keywords\n\nmax_result = 100 # maximum query results from arXiv API for each keyword\nissues_result = 15 # maximum papers to be included in the issue\n\n# all columns: Title, Authors, Abstract, Link, Tags, Comment, Date\n# fixed_columns = [\"Title\", \"Link\", \"Date\"]\n\ncolumn_names = [\"Title\", \"Link\", \"Abstract\", \"Date\", \"Comment\"]\n\nback_up_files() # back up README.md and ISSUE_TEMPLATE.md\n\n# write to README.md\nf_rm = open(\"README.md\", \"w\") # file for README.md\nf_rm.write(\"# Daily Papers\\n\")\nf_rm.write(\"The project automatically fetches the latest papers from arXiv based on keywords.\\n\\nThe subheadings in the README file represent the search keywords.\\n\\nOnly the most recent articles for each keyword are retained, up to a maximum of 100 papers.\\n\\nYou can click the 'Watch' button to receive daily email notifications.\\n\\nLast update: {0}\\n\\n\".format(current_date))\n\n# write to ISSUE_TEMPLATE.md\nf_is = open(\".github/ISSUE_TEMPLATE.md\", \"w\") # file for ISSUE_TEMPLATE.md\nf_is.write(\"---\\n\")\nf_is.write(\"title: Latest {0} Papers - {1}\\n\".format(issues_result, get_daily_date()))\nf_is.write(\"labels: documentation\\n\")\nf_is.write(\"---\\n\")\nf_is.write(\"**Please check the [Github](https://github.com/zezhishao/MTS_Daily_ArXiv) page for a better reading experience and more papers.**\\n\\n\")\n\nfor keyword in keywords:\n    f_rm.write(\"## {0}\\n\".format(keyword))\n    f_is.write(\"## {0}\\n\".format(keyword))\n    if len(keyword.split()) == 1: link = \"AND\" # for keyword with only one word, We search for papers containing this keyword in both the title and abstract.\n    else: link = \"OR\"\n    papers = get_daily_papers_by_keyword_with_retries(keyword, column_names, max_result, link)\n    if papers is None: # failed to get papers\n        print(\"Failed to get papers!\")\n        f_rm.close()\n        f_is.close()\n        restore_files()\n        sys.exit(\"Failed to get papers!\")\n    rm_table = generate_table(papers)\n    is_table = generate_table(papers[:issues_result], ignore_keys=[\"Abstract\"])\n    f_rm.write(rm_table)\n    f_rm.write(\"\\n\\n\")\n    f_is.write(is_table)\n    f_is.write(\"\\n\\n\")\n    time.sleep(5) # avoid being blocked by arXiv API\n\nf_rm.close()\nf_is.close()\nremove_backups()\n"
  },
  {
    "path": "utils.py",
    "content": "import os\nimport time\nimport pytz\nimport shutil\nimport datetime\nfrom typing import List, Dict\nimport urllib, urllib.request\n\nimport feedparser\nfrom easydict import EasyDict\n\n\ndef remove_duplicated_spaces(text: str) -> str:\n    return \" \".join(text.split())\n\ndef request_paper_with_arXiv_api(keyword: str, max_results: int, link: str = \"OR\") -> List[Dict[str, str]]:\n    # keyword = keyword.replace(\" \", \"+\")\n    assert link in [\"OR\", \"AND\"], \"link should be 'OR' or 'AND'\"\n    keyword = \"\\\"\" + keyword + \"\\\"\"\n    url = \"http://export.arxiv.org/api/query?search_query=ti:{0}+{2}+abs:{0}&max_results={1}&sortBy=lastUpdatedDate\".format(keyword, max_results, link)\n    url = urllib.parse.quote(url, safe=\"%/:=&?~#+!$,;'@()*[]\")\n    response = urllib.request.urlopen(url).read().decode('utf-8')\n    feed = feedparser.parse(response)\n\n    # NOTE default columns: Title, Authors, Abstract, Link, Tags, Comment, Date\n    papers = []\n    for entry in feed.entries:\n        entry = EasyDict(entry)\n        paper = EasyDict()\n\n        # title\n        paper.Title = remove_duplicated_spaces(entry.title.replace(\"\\n\", \" \"))\n        # abstract\n        paper.Abstract = remove_duplicated_spaces(entry.summary.replace(\"\\n\", \" \"))\n        # authors\n        paper.Authors = [remove_duplicated_spaces(_[\"name\"].replace(\"\\n\", \" \")) for _ in entry.authors]\n        # link\n        paper.Link = remove_duplicated_spaces(entry.link.replace(\"\\n\", \" \"))\n        # tags\n        paper.Tags = [remove_duplicated_spaces(_[\"term\"].replace(\"\\n\", \" \")) for _ in entry.tags]\n        # comment\n        paper.Comment = remove_duplicated_spaces(entry.get(\"arxiv_comment\", \"\").replace(\"\\n\", \" \"))\n        # date\n        paper.Date = entry.updated\n\n        papers.append(paper)\n    return papers\n\ndef filter_tags(papers: List[Dict[str, str]], target_fileds: List[str]=[\"cs\", \"stat\"]) -> List[Dict[str, str]]:\n    # filtering tags: only keep the papers in target_fileds\n    results = []\n    for paper in papers:\n        tags = paper.Tags\n        for tag in tags:\n            if tag.split(\".\")[0] in target_fileds:\n                results.append(paper)\n                break\n    return results\n\ndef get_daily_papers_by_keyword_with_retries(keyword: str, column_names: List[str], max_result: int, link: str = \"OR\", retries: int = 6) -> List[Dict[str, str]]:\n    for _ in range(retries):\n        papers = get_daily_papers_by_keyword(keyword, column_names, max_result, link)\n        if len(papers) > 0: return papers\n        else:\n            print(\"Unexpected empty list, retrying...\")\n            time.sleep(60 * 30) # wait for 30 minutes\n    # failed\n    return None\n\ndef get_daily_papers_by_keyword(keyword: str, column_names: List[str], max_result: int, link: str = \"OR\") -> List[Dict[str, str]]:\n    # get papers\n    papers = request_paper_with_arXiv_api(keyword, max_result, link) # NOTE default columns: Title, Authors, Abstract, Link, Tags, Comment, Date\n    # NOTE filtering tags: only keep the papers in cs field\n    # TODO filtering more\n    papers = filter_tags(papers)\n    # select columns for display\n    papers = [{column_name: paper[column_name] for column_name in column_names} for paper in papers]\n    return papers\n\ndef generate_table(papers: List[Dict[str, str]], ignore_keys: List[str] = []) -> str:\n    formatted_papers = []\n    keys = papers[0].keys()\n    for paper in papers:\n        # process fixed columns\n        formatted_paper = EasyDict()\n        ## Title and Link\n        formatted_paper.Title = \"**\" + \"[{0}]({1})\".format(paper[\"Title\"], paper[\"Link\"]) + \"**\"\n        ## Process Date (format: 2021-08-01T00:00:00Z -> 2021-08-01)\n        formatted_paper.Date = paper[\"Date\"].split(\"T\")[0]\n        \n        # process other columns\n        for key in keys:\n            if key in [\"Title\", \"Link\", \"Date\"] or key in ignore_keys:\n                continue\n            elif key == \"Abstract\":\n                # add show/hide button for abstract\n                formatted_paper[key] = \"<details><summary>Show</summary><p>{0}</p></details>\".format(paper[key])\n            elif key == \"Authors\":\n                # NOTE only use the first author\n                formatted_paper[key] = paper[key][0] + \" et al.\"\n            elif key == \"Tags\":\n                tags = \", \".join(paper[key])\n                if len(tags) > 10:\n                    formatted_paper[key] = \"<details><summary>{0}...</summary><p>{1}</p></details>\".format(tags[:5], tags)\n                else:\n                    formatted_paper[key] = tags\n            elif key == \"Comment\":\n                if paper[key] == \"\":\n                    formatted_paper[key] = \"\"\n                elif len(paper[key]) > 20:\n                    formatted_paper[key] = \"<details><summary>{0}...</summary><p>{1}</p></details>\".format(paper[key][:5], paper[key])\n                else:\n                    formatted_paper[key] = paper[key]\n        formatted_papers.append(formatted_paper)\n\n    # generate header\n    columns = formatted_papers[0].keys()\n    # highlight headers\n    columns = [\"**\" + column + \"**\" for column in columns]\n    header = \"| \" + \" | \".join(columns) + \" |\"\n    header = header + \"\\n\" + \"| \" + \" | \".join([\"---\"] * len(formatted_papers[0].keys())) + \" |\"\n    # generate the body\n    body = \"\"\n    for paper in formatted_papers:\n        body += \"\\n| \" + \" | \".join(paper.values()) + \" |\"\n    return header + body\n\ndef back_up_files():\n    # back up README.md and ISSUE_TEMPLATE.md\n    shutil.move(\"README.md\", \"README.md.bk\")\n    shutil.move(\".github/ISSUE_TEMPLATE.md\", \".github/ISSUE_TEMPLATE.md.bk\")\n\ndef restore_files():\n    # restore README.md and ISSUE_TEMPLATE.md\n    shutil.move(\"README.md.bk\", \"README.md\")\n    shutil.move(\".github/ISSUE_TEMPLATE.md.bk\", \".github/ISSUE_TEMPLATE.md\")\n\ndef remove_backups():\n    # remove README.md and ISSUE_TEMPLATE.md\n    os.remove(\"README.md.bk\")\n    os.remove(\".github/ISSUE_TEMPLATE.md.bk\")\n\ndef get_daily_date():\n    # get beijing time in the format of \"March 1, 2021\"\n    beijing_timezone = pytz.timezone('Asia/Shanghai')\n    today = datetime.datetime.now(beijing_timezone)\n    return today.strftime(\"%B %d, %Y\")\n"
  }
]