Discover, Discuss, and Read arXiv papers

Discover new, recommended papers

Pinned
Suggested

Filters

21 May 2025
reinforcement-learningvision-language-modelsrobotic-control

Researchers from Renmin University and Huawei Noah's Ark Lab develop GUI-G1, a framework that improves visual grounding in GUI agents through targeted reinforcement learning techniques, achieving state-of-the-art accuracy on ScreenSpot benchmarks while requiring only 17K training samples and generating fewer tokens compared to existing approaches.

20 May 2025
astro-ph.HE
We report the first high-purity identification of cosmic-ray (CR) protons and a precise measurement of their energy spectrum from 0.15 to 12 PeV using the Large High Altitude Air Shower Observatory (LHAASO). Abundant event statistics, combined with the simultaneous detection of electrons/photons, muons, and Cherenkov light in air showers, enable spectroscopic measurements with statistical and systematic accuracy comparable to satellite data at lower energies. The proton spectrum shows significant hardening relative to low-energy extrapolations, culminating at 3 PeV, followed by sharp softening. This distinct spectral structure - closely aligned with the knee in the all-particle spectrum - points to the emergence of a new CR component at PeV energies, likely linked to the dozens of PeVatrons recently discovered by LHAASO, and offers crucial clues to the origin of Galactic cosmic rays.
19 May 2025
reinforcement-learningreasoningchain-of-thought

Researchers from the National University of Singapore introduce Thinkless, a framework that enables LLMs to adaptively choose between short and long-form reasoning based on task complexity, reducing unnecessary chain-of-thought reasoning by 50-90% across mathematical benchmarks while maintaining accuracy through a novel decoupled reinforcement learning algorithm.

21 May 2025
generative-modelsmulti-modal-learningchain-of-thought

Princeton, Peking University, and ByteDance researchers introduce MMaDA, a unified multimodal diffusion framework that handles text reasoning, image generation, and multimodal understanding through a shared architecture and novel post-training methodology, achieving competitive performance against specialized models while enabling cross-task generalization without additional fine-tuning.

14 May 2025
transformershardware-aware-algorithmsmodel-compression

DeepSeek-AI researchers present insights from developing DeepSeek-V3, documenting specific hardware constraints and architectural solutions that enable efficient large language model training through innovations in mixed-precision computation, network topology optimization, and memory management while achieving competitive performance with significantly reduced hardware requirements.

16 May 2025
reinforcement-learningrobotics-perceptionvisual-reasoning

Researchers from Cambridge and UCL introduce Visual Planning, a framework enabling large vision models to reason directly through image sequences without textual mediation, demonstrating superior performance on navigation tasks compared to language-based approaches while reducing the modality gap in multimodal reasoning.

21 May 2025
agent-based-systemsmulti-agent-learningtime-series-analysis

Microsoft Research Asia and collaborating institutions introduce R&D-Agent(Q), a data-centric multi-agent framework that automates quantitative investment strategy development through coordinated factor-model optimization, achieving 2x higher annualized returns than classical factor libraries while using 70% fewer factors and operating at minimal cost.

20 May 2025
generative-modelsmulti-modal-learningtransformers

ByteDance researchers introduce BAGEL, an open-source multimodal AI model that combines understanding and generation capabilities through carefully structured interleaved data and a Mixture-of-Transformer-Experts architecture, achieving competitive performance with proprietary systems while demonstrating emergent abilities in complex visual manipulation and world navigation.

20 May 2025
multi-modal-learningtransformersvisual-reasoning

USC researchers demonstrate that textual steering vectors extracted from language model backbones can improve visual understanding in multimodal LLMs, enabling better performance on spatial reasoning and counting tasks while requiring no additional training data or model modifications.

15 May 2025
agentsagentic-frameworksvision-language-models

A comprehensive taxonomy establishes clear distinctions between AI Agents (autonomous task-specific entities) and Agentic AI (orchestrated multi-agent systems), mapping their architectural differences, capabilities, and limitations while providing structured frameworks for system design and evaluation across domains like robotics, healthcare, and research automation.