The study developed and evaluated a Vision Transformer-based deep learning model (using the USFM framework) for automatic segmentation of pancreatic tumors on endoscopic ultrasound (EUS) images, trained on over 17,000 images from two public datasets.
In internal 5-fold cross-validation, the model achieved moderate-to-good segmentation performance (mean DSC ≈ 0.65, IoU ≈ 0.58) with high specificity (~99%) and accuracy (~98%), indicating reliable identification of non-tumor regions.
On an independent external test set, the model maintained similar performance (DSC ≈ 0.66, IoU ≈ 0.61, sensitivity ~72%, specificity ~98%), but about 9.7% of cases showed incorrect multiple tumor predictions, underscoring the need for more standardized data and further prospective validation.
Identifies catastrophic forgetting as a key problem when adapting large language models into small language models for low-resource languages in multilingual settings.
Proposes a continual learning approach that combines parts-of-speech-based code-switching with a replay adapter mechanism to preserve previously learned linguistic knowledge while learning new languages.
Demonstrates that the proposed method improves performance and reduces forgetting on both vision–language tasks (like visual question answering) and standard language modeling tasks for low-resource languages.
Introduces the new task of Open-Vocabulary 3D Instruction Ambiguity Detection, where a model must decide whether a natural-language command has a single clear meaning within a specific 3D scene.
Builds Ambi3D, a large benchmark dataset with over 700 diverse 3D scenes and about 22,000 instructions, and shows that current state-of-the-art 3D large language models often fail to reliably detect ambiguous commands.
Proposes AmbiVer, a two-stage framework that gathers explicit visual evidence from multiple viewpoints and feeds it to a vision-language model to judge instruction ambiguity, achieving strong performance and setting a baseline for safer embodied AI.
Introduces VideoAR, a large-scale visual autoregressive framework that combines intra-frame autoregressive modeling with causal next-frame prediction, enabled by a 3D multi-scale tokenizer to efficiently capture spatio-temporal dynamics.
Proposes several techniques—Multi-scale Temporal RoPE, Cross-Frame Error Correction, and Random Frame Mask—to reduce error accumulation over time and improve long-term temporal coherence in generated videos.
Demonstrates state-of-the-art performance among autoregressive video models, significantly improving FVD on UCF-101 while requiring over 10× fewer inference steps, and achieving VBench scores competitive with much larger diffusion-based models via a multi-stage spatial–temporal pretraining pipeline.
Introduces Goal Force, a framework where users specify goals for video world models using explicit force vectors and intermediate dynamics instead of ambiguous text prompts or hard-to-specify target images.
Trains a video generation model on a curated set of simple synthetic physics scenarios (e.g., collisions, falling dominos) so it learns to propagate forces through time and space like an implicit neural physics simulator.
Demonstrates strong zero-shot generalization from these simple training setups to complex, real-world tasks such as tool use and multi-object causal chains, enabling precise, physics-aware planning without external physics engines.
The paper examines how large language model–based tools might help address a core ‘trilemma’ in democracy: balancing broad participation, meaningful deliberation, and political equality at scale.
Using an existing LLM-driven common-ground–finding system as a case study, it analyzes ways AI mediation could enhance participation, support fairer and more equal voice, and improve the quality of discussion by surfacing trustworthy information.
It identifies key risks and open challenges—such as bias, misinformation, and design choices—and argues that substantial empirical, technical, and theoretical work is still needed to safely and effectively integrate AI mediation into democratic deliberation.
Introduces Cedalion, an open-source Python framework that unifies the full analysis pipeline for fNIRS and DOT—covering forward modeling, optode co-registration, signal processing, GLM analysis, and DOT image reconstruction—within a single, standardized architecture.
Ensures reproducible, scalable workflows by adhering to SNIRF and BIDS standards, providing cloud-executable Jupyter notebooks, containerized pipelines, automated documentation linked to source publications, and continuous-integration testing.
Seamlessly connects optical neuroimaging with modern machine learning and multimodal workflows by integrating with tools like scikit-learn and PyTorch, supporting multimodal fusion (e.g., EEG, MEG, physiology), and offering validated algorithms plus simulation and data-augmentation modules; the tutorial supplies seven fully executable example notebooks.
The paper formalizes the task of Data-centric Solution Preference and builds a large dataset of 18,438 pairwise comparisons to study how to choose better machine learning solutions without fully executing them.
It shows that large language models, when given a Verified Data Analysis Report, can predict which solution will perform better with 61.5% accuracy and well-calibrated confidence, effectively acting as a fast surrogate for expensive runtime checks.
The authors implement this idea in an agent called FOREAGENT that uses a Predict-then-Verify loop, achieving 6× faster convergence and a 6% performance gain over traditional execution-based agent baselines.
Identifies and analyzes the problem of exploration collapse in RL-based training of LLM reasoning, showing why traditional entropy-based exploration either leads to reward hacking (verbose but unhelpful outputs) or fails to overcome pre-training biases.
Introduces IIB-LPO, a latent policy optimization method that branches reasoning at high-uncertainty (high-entropy) states and uses the Information Bottleneck both to filter trajectories and as a self-reward, encouraging diverse yet concise and informative reasoning paths.
Demonstrates state-of-the-art performance on four mathematical reasoning benchmarks, improving accuracy by up to 5.3% and reasoning diversity by up to 7.4% compared to previous RLVR approaches.