Latest ML Research

Discover the latest papers from arXiv with AI-generated summaries and social context.

AI ML NLP Vision Stats ML

Updated every 6 hours from arXiv

cs.CV3 days ago

2601.05937v1

Performance of a Deep Learning-Based Segmentation Model for Pancreatic Tumors on Public Endoscopic Ultrasound Datasets

The study developed and evaluated a Vision Transformer-based deep learning model (using the USFM framework) for automatic segmentation of pancreatic tumors on endoscopic ultrasound (EUS) images, trained on over 17,000 images from two public datasets.
In internal 5-fold cross-validation, the model achieved moderate-to-good segmentation performance (mean DSC ≈ 0.65, IoU ≈ 0.58) with high specificity (~99%) and accuracy (~98%), indicating reliable identification of non-tumor regions.
On an independent external test set, the model maintained similar performance (DSC ≈ 0.66, IoU ≈ 0.61, sensitivity ~72%, specificity ~98%), but about 9.7% of cases showed incorrect multiple tumor predictions, underscoring the need for more standardized data and further prospective validation.

PDF arXiv

cs.CL3 days ago

2601.05930v1

Can We Predict Before Executing Machine Learning Agents?

The paper formalizes the task of Data-centric Solution Preference and builds a large dataset of 18,438 pairwise comparisons to study how to choose better machine learning solutions without fully executing them.
It shows that large language models, when given a Verified Data Analysis Report, can predict which solution will perform better with 61.5% accuracy and well-calibrated confidence, effectively acting as a fast surrogate for expensive runtime checks.
The authors implement this idea in an agent called FOREAGENT that uses a Predict-then-Verify loop, achieving 6× faster convergence and a 6% performance gain over traditional execution-based agent baselines.

PDF arXiv

eess.SP3 days ago

2601.05923v1

Cedalion Tutorial: A Python-based framework for comprehensive analysis of multimodal fNIRS & DOT from the lab to the everyday world

Introduces Cedalion, an open-source Python framework that unifies the full analysis pipeline for fNIRS and DOT—covering forward modeling, optode co-registration, signal processing, GLM analysis, and DOT image reconstruction—within a single, standardized architecture.
Ensures reproducible, scalable workflows by adhering to SNIRF and BIDS standards, providing cloud-executable Jupyter notebooks, containerized pipelines, automated documentation linked to source publications, and continuous-integration testing.
Seamlessly connects optical neuroimaging with modern machine learning and multimodal workflows by integrating with tools like scikit-learn and PyTorch, supporting multimodal fusion (e.g., EEG, MEG, physiology), and offering validated algorithms plus simulation and data-augmentation modules; the tutorial supplies seven fully executable example notebooks.

PDF arXiv

cs.LG3 days ago

2601.05909v1

Auditing Fairness under Model Updates: Fundamental Complexity and Property-Preserving Updates

Introduces a formal framework for auditing group fairness when model owners can strategically and adaptively update their models, characterizing which updates are allowed as long as the audited property (e.g., fairness) is preserved.
Proposes a general PAC auditing procedure based on an Empirical Property Optimization (EPO) oracle, enabling efficient estimation of fairness properties using a minimal number of labeled samples even under arbitrary admissible updates.
Defines the SP dimension, a new combinatorial complexity measure that governs distribution-free sample complexity for auditing statistical parity, and shows that the same framework extends to other objectives such as prediction error and robust risk.

PDF arXiv

cs.CL3 days ago

2601.05905v1

Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency

The paper shows that standard point-wise confidence measures like self-consistency can be misleading, because answers that appear perfectly confident can quickly fail when the question is placed in slightly different or distracting contexts.
It introduces Neighbor-Consistency Belief (NCB), a structural metric that checks how consistently a model answers related, contextually perturbed versions of a question, and demonstrates that high-NCB items are more robust under a new cognitive stress-testing protocol.
The authors propose Structure-Aware Training (SAT), a training strategy that explicitly encourages context-invariant belief structures and empirically reduces brittle, long-tail knowledge errors by about 30%.

PDF arXiv

cs.CL3 days ago

2601.05882v1

An Empirical Study on Preference Tuning Generalization and Diversity Under Domain Shift

The paper systematically studies how different preference-tuning alignment objectives generalize when models are used in new domains, focusing on helpfulness in summarization and question-answering tasks.
It compares five popular alignment objectives together with multiple adaptation strategies, including target-domain supervised fine-tuning and pseudo-labeling, to understand their impact on both performance and response diversity under domain shift.
The authors find that alignment objectives differ markedly in how well they transfer to new domains, and show that adaptation methods based on pseudo-labeling can substantially reduce the performance degradation caused by domain shift.

PDF arXiv

cs.LG3 days ago

2601.05870v1

IIB-LPO: Latent Policy Optimization via Iterative Information Bottleneck

Identifies and analyzes the problem of exploration collapse in RL-based training of LLM reasoning, showing why traditional entropy-based exploration either leads to reward hacking (verbose but unhelpful outputs) or fails to overcome pre-training biases.
Introduces IIB-LPO, a latent policy optimization method that branches reasoning at high-uncertainty (high-entropy) states and uses the Information Bottleneck both to filter trajectories and as a self-reward, encouraging diverse yet concise and informative reasoning paths.
Demonstrates state-of-the-art performance on four mathematical reasoning benchmarks, improving accuracy by up to 5.3% and reasoning diversity by up to 7.4% compared to previous RLVR approaches.

PDF arXiv

cs.CL3 days ago

2601.05858v1

CLewR: Curriculum Learning with Restarts for Machine Translation Preference Learning

The paper studies how the order of training examples (a curriculum) affects preference optimization for machine translation, a factor that prior work largely ignored.
It introduces CLewR, a curriculum learning strategy with restarts that repeatedly goes from easy to hard examples to reduce catastrophic forgetting of easier cases during training.
The authors show that CLewR yields consistent translation quality improvements across multiple large language model families (Gemma2, Qwen2.5, Llama3.1) and various preference optimization methods, and they release their code publicly.

PDF arXiv

Latest ML Research

Performance of a Deep Learning-Based Segmentation Model for Pancreatic Tumors on Public Endoscopic Ultrasound Datasets

Can We Predict Before Executing Machine Learning Agents?

Cedalion Tutorial: A Python-based framework for comprehensive analysis of multimodal fNIRS &amp; DOT from the lab to the everyday world

Auditing Fairness under Model Updates: Fundamental Complexity and Property-Preserving Updates

Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency

An Empirical Study on Preference Tuning Generalization and Diversity Under Domain Shift

IIB-LPO: Latent Policy Optimization via Iterative Information Bottleneck

CLewR: Curriculum Learning with Restarts for Machine Translation Preference Learning

Performance of a Deep Learning-Based Segmentation Model for Pancreatic Tumors on Public Endoscopic Ultrasound Datasets

Can We Predict Before Executing Machine Learning Agents?

Cedalion Tutorial: A Python-based framework for comprehensive analysis of multimodal fNIRS &amp; DOT from the lab to the everyday world

Auditing Fairness under Model Updates: Fundamental Complexity and Property-Preserving Updates

Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency

An Empirical Study on Preference Tuning Generalization and Diversity Under Domain Shift

IIB-LPO: Latent Policy Optimization via Iterative Information Bottleneck

CLewR: Curriculum Learning with Restarts for Machine Translation Preference Learning

Cedalion Tutorial: A Python-based framework for comprehensive analysis of multimodal fNIRS & DOT from the lab to the everyday world

Cedalion Tutorial: A Python-based framework for comprehensive analysis of multimodal fNIRS & DOT from the lab to the everyday world