Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10822 publications
    Productionizing Quantum Mass Production
    Bill Huggins
    Nathan Wiebe
    arXiv for now (2026) (to appear)
    Preview abstract For many practical applications of quantum computing, the slowest and most costly steps involve coherently accessing classical data. We help address this challenge by applying mass production techniques, which can sometimes allow us to perform operations many times in parallel for a cost that is comparable to a single execution[1-3]. We combine existing mass-production results with modern approaches for loading classical data using ``quantum read-only memory.'' We show that quantum mass production techniques offer no benefit when we consider a cost model that focuses purely on the number of non-Clifford gates. However, analyzing the constant factors in a more nuanced cost model, we find that it may be possible to obtain a reduction in cost of an order or magnitude or more for a variety reasonably-sized fault-tolerant quantum algorithms. We present several applications of quantum mass-production techniques beyond naive parallelization, including a strategy for reducing the cost of serial calls to the same data loading step. View details
    FreshBrew: A Benchmark for Evaluating AI Agents on Java Code Migration
    Diganta Misra
    Yanqi Luo
    Anjali Sridhar
    Justine Gehring
    Silvio Soares Ribeiro Junior
    2026
    Preview abstract AI coding assistants are rapidly becoming integral to modern software development. A key challenge in this space is the continual need to migrate and modernize codebases in response to evolving software ecosystems. Traditionally, such migrations have relied on rule-based systems and human intervention. With the advent of powerful large language models (LLMs), AI-driven agentic frameworks offer a promising alternative—but their effectiveness remains underexplored. In this paper, we introduce FreshBrew, a novel benchmark for evaluating AI-based agentic frameworks on project-level Java migrations. We benchmark several such frameworks, powered by state-of-the-art LLMs, and compare their performance against established rule-based tools. Our evaluation of AI agents on this benchmark of 228 repositories shows that the top-performing model, Gemini 2.5 Flash, can successfully migrate 56.5% of projects to JDK 17. Our empirical analysis reveals novel insights into the critical strengths and limitations of current agentic approaches, offering actionable insights into their real-world applicability. By releasing FreshBrew publicly upon acceptance, we aim to facilitate rigorous, reproducible evaluation and catalyze progress in AI-driven codebase modernization. View details
    Dynamical-generative downscaling of climate model ensembles
    Tapio Schneider
    John Anderson
    Fei Sha
    Proceedings of the National Academy of Sciences, 122 (2025), e2420288122
    Preview abstract Regional high-resolution climate projections are crucial for many applications, such as agriculture, hydrology, and natural hazard risk assessment. Dynamical downscaling, the state-of-the-art method to produce localized future climate information, involves running a regional climate model (RCM) driven by an Earth System Model (ESM), but it is too computationally expensive to apply to large climate projection ensembles. We propose an approach combining dynamical downscaling with generative AI to reduce the cost and improve the uncertainty estimates of downscaled climate projections. In our framework, an RCM dynamically downscales ESM output to an intermediate resolution, followed by a generative diffusion model that further refines the resolution to the target scale. This approach leverages the generalizability of physics-based models and the sampling efficiency of diffusion models, enabling the downscaling of large multimodel ensembles. We evaluate our method against dynamically downscaled climate projections from the Coupled Model Intercomparison Project 6 (CMIP6) ensemble. Our results demonstrate its ability to provide more accurate uncertainty bounds on future regional climate than alternatives such as dynamical downscaling of smaller ensembles, or traditional empirical statistical downscaling methods. We also show that dynamical-generative downscaling results in significantly lower errors than popular statistical downscaling techniques, and captures more accurately the spectra, tail dependence, and multivariate correlations of meteorological fields. These characteristics make the dynamical-generative framework a flexible, accurate, and efficient way to downscale large ensembles of climate projections, currently out of reach for pure dynamical downscaling. View details
    Linear-Time Multilevel Graph Partitioning via Edge Sparsification
    Peter Sanders
    Dominik Rosch
    Nikolai Maas
    Lars Gottesbüren
    Daniel Seemaier
    2025
    Preview abstract The current landscape of balanced graph partitioning is divided into high-quality but expensive multilevel algorithms and cheaper approaches with linear running time, such as single-level algorithms and streaming algorithms. We demonstrate how to achieve the best of both worlds with a linear time multilevel algorithm. Multilevel algorithms construct a hierarchy of increasingly smaller graphs by repeatedly contracting clusters of nodes. Our approach preserves their distinct advantage, allowing refinement of the partition over multiple levels with increasing detail. At the same time, we use edge sparsification to guarantee geometric size reduction between the levels and thus linear running time. We provide a proof of the linear running time as well as additional insights into the behavior of multilevel algorithms, showing that graphs with low modularity are most likely to trigger worst-case running time. We evaluate multiple approaches for edge sparsification and integrate our algorithm into the state-of-the-art multilevel partitioner KaMinPar, maintaining its excellent parallel scalability. As demonstrated in detailed experiments, this results in a 1.49x average speedup (up to 4x for some instances) with only 1% loss in solution quality. Moreover, our algorithm clearly outperforms state-of-the-art single-level and streaming approaches. View details
    Global earthquake detection and warning using Android phones
    Marc Stogaitis
    Youngmin Cho
    Richard Allen
    Boone Spooner
    Patrick Robertson
    Micah Berman
    Greg Wimpey
    Robert Bosch
    Nivetha Thiruverahan
    Steve Malkos
    Alexei Barski
    Science, 389 (2025), pp. 254-259
    Preview abstract Earthquake early-warning systems are increasingly being deployed as a strategy to reduce losses in earthquakes, but the regional seismic networks they require do not exist in many earthquake-prone countries. We use the global Android smartphone network to develop an earthquake detection capability, an alert delivery system, and a user feedback framework. Over 3 years of operation, the system detected an average of 312 earthquakes per month with magnitudes from M 1.9 to M 7.8 in Türkiye. Alerts were delivered in 98 countries for earthquakes with M ≥4.5, corresponding to ~60 events and 18 million alerts per month. User feedback shows that 85% of people receiving an alert felt shaking, and 36, 28, and 23% received the alert before, during, and after shaking, respectively. We show how smartphone-based earthquake detection algorithms can be implemented at scale and improved through postevent analysis. View details
    Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
    Zilong Wang
    Steven Zheng
    Swaroop Mishra
    Yuwei Zhang
    Anush Mattapalli
    Ankur Taly
    Jingbo Shang
    ICLR 2025
    Preview abstract Retrieval augmented generation (RAG) has attracted a lot of attention across both academia and industry due to its capability in inserting timely and accurate evidence to the generation by large language models. However, the introduction of retrieved evidence largely makes the input prompt longer, which would harm the understanding quality of large language models and make it slower in actual usage scenarios. To solve these issues, we propose SpeculativeRAG, which leverages a smaller LLM to conduct the retrieval augmented generation for a larger LLM. The smaller LLM can digest a few pieces of evidence and generate multiple pieces of drafts in parallel rapidly, and these drafts will be verified by a large LLM to guarantee the quality. We achieve a higher speed as well as a better quality in the RAG results. View details
    REGEN: A Dataset and Benchmarks with Natural Language Critiques and Narratives
    Kun Su
    Krishna Sayana
    Hubert Pham
    James Pine
    Yuri Vasilevski
    Raghavendra Vasudeva
    Liam Hebert
    Ambarish Jash
    Anushya Subbiah
    Sukhdeep Sodhi
    (2025)
    Preview abstract This paper introduces a novel dataset REGEN (Reviews Enhanced with GEnerative Narratives), designed to benchmark the conversational capabilities of recommender Large Language Models (LLMs), addressing the limitations of existing datasets that primarily focus on sequential item prediction. REGEN extends the Amazon Product Reviews dataset by inpainting two key natural language features: (1) user critiques, representing user "steering" queries that lead to the selection of a subsequent item, and (2) narratives, rich textual outputs associated with each recommended item taking into account prior context. The narratives include product endorsements, purchase explanations, and summaries of user preferences. Further, we establish an end-to-end modeling benchmark for the task of conversational recommendation, where models are trained to generate both recommendations and corresponding narratives conditioned on user history (items and critiques). For this joint task, we introduce a modeling framework LUMEN (LLM-based Unified Multi-task Model with Critiques, Recommendations, and Narratives) which uses an LLM as a backbone for critiquing, retrieval and generation. We also evaluate the dataset's quality using standard auto-rating techniques and benchmark it by training both traditional and LLM-based recommender models. Our results demonstrate that incorporating critiques enhances recommendation quality by enabling the recommender to learn language understanding and integrate it with recommendation signals. Furthermore, LLMs trained on our dataset effectively generate both recommendations and contextual narratives, achieving performance comparable to state-of-the-art recommenders and language models. View details
    Deep Researcher with Test-time Diffusion
    Guan Sun
    Zoey CuiZhu
    Yuanjun (Sophia) Bi
    Weiming Wen
    Hui Wan
    Chunfeng Wen
    Solène Maître
    George Lee
    Vishy Tirumalashetty
    Emily Xue
    Burak Gokturk
    2025
    Preview abstract Deep research agents, powered by Large Language Models (LLMs), are rapidly advancing; yet, their performance often plateaus when generating complex, long-form research reports using generic test-time scaling algorithms. Drawing inspiration from the iterative nature of human research, which involves cycles of searching, reasoning, and revision, we propose the Test-Time Diffusion Deep Researcher (TTD-DR). This novel framework conceptualizes research report generation as a diffusion process. TTD-DR initiates this process with a preliminary draft, an updatable skeleton that serves as an evolving foundation to guide the research direction. The draft is then iteratively refined through a "denoising" process, which is dynamically informed by a retrieval mechanism that incorporates external information at each step. The core process is further enhanced by a self-evolutionary algorithm applied to each component of the agentic workflow, ensuring the generation of high-quality context for the diffusion process. This draft-centric design guides the report writing process to be more timely and coherent while reducing information loss during the iterative search process. We demonstrate that our TTD-DR achieves state-of-the-art results on a wide array of benchmarks that require intensive search and multi-hop reasoning, significantly outperforming existing deep research agents. View details
    Preview abstract Electrocardiograms (ECGs) are fundamental to cardiac diagnostics, providing noninvasive insights into cardiovascular conditions. Recent advancements in deep learning have led to foundation models (FMs) capable of learning powerful representations of ECG signals. However, these models often fail to fully exploit the periodic nature and diagnostic frequency bands of ECGs, leading to inefficiencies in computational cost and interpretability. We propose a novel ECG foundation model that learns nested embeddings, where each subset of dimensions encodes progressively higher-frequency information. By explicitly modeling frequency structures and applying a correlation penalty, the method achieves compact, high-rank representations that reduce model size without sacrificing performance. We evaluate our approach on two large-scale datasets for embedding redundancy and prediction performance on downstream clinical tasks such as arrhythmia classification, and cardiac condition detection. We observe similar prediction performance AUROC scores and lower embedding redundancy, offering a computationally efficient and interpretable framework for ECG analysis. Finally, the representations obtained from our model in UK Biobank data capture known cardiovascular variants and detect novel loci, which can be applied to drug discovery. View details
    Scaling Large Language Models For Next-Generation Single-Cell Analysis
    Syed Asad Rizvi
    Daniel Levine
    Aakash Patel
    Shiyang Zhang
    Eric Wang
    Curtis Jamison Perry
    Nicole Mayerli Constante
    Sizhuang He
    David Zhang
    Cerise Tang
    Zhuoyang Lyu
    Rayyan Darji
    Chang Li
    Emily Sun
    David Jeong
    Lawrence Zhao
    Jennifer Kwan
    David Braun
    Brian Hafler
    Hattie Chung
    Rahul M. Dhodapkar
    Paul Jaeger
    Jeffrey Ishizuka
    David van Dijk
    biorxiv (2025)
    Preview abstract Single-cell RNA sequencing has transformed our understanding of cellular diversity, yet current singlecell foundation models (scFMs) remain limited in their scalability, flexibility across diverse tasks, and ability to natively integrate textual information. In this work, we build upon the Cell2Sentence (C2S) framework, which represents scRNA-seq profiles as textual “cell sentences,” to train Large Language Models (LLMs) on a corpus comprising over one billion tokens of transcriptomic data, biological text, and metadata. Scaling the model to 27 billion parameters yields consistent improvements in predictive and generative capabilities and supports advanced downstream tasks that require synthesis of information across multi-cellular contexts. Targeted fine-tuning with modern reinforcement learning techniques produces strong performance in perturbation response prediction, natural language interpretation, and complex biological reasoning. This predictive strength directly enabled a dualcontext virtual screen that uncovered a striking context split for the kinase inhibitor silmitasertib (CX-4945), suggesting its potential as a synergistic, interferon-conditional amplifier of antigen presentation. Experimental validation in human cell models unseen during training confirmed this hypothesis, demonstrating that C2S-Scale can generate biologically grounded, testable discoveries of context-conditioned biology. C2S-Scale unifies transcriptomic and textual data at unprecedented scales, surpassing both specialized single-cell models and general-purpose LLMs to provide a platform for next-generation single-cell analysis and the development of “virtual cells.” View details
    Matryoshka Model Learning for Improved Elastic Student Models
    Cho-Jui Hsieh
    Chetan Verma
    Inderjit Dhillon
    Xin Liu
    Wen Chen
    Ngot Bui
    Yang Zhang
    2025
    Preview abstract Production machine learning models in the industry are often devel-oped with a primary focus on maximizing model quality. However,these models must ultimately operate within the resource con-straints of their serving infrastructure, including limitations on com-pute, memory and bandwidth. The rapid evolution of serving hard-ware, particularly with advancements in accelerator technology,necessitates periodic retraining to leverage newer, more efficientinfrastructure. This cyclical retraining process is resource-intensive,demanding significant model development time and incurring sub-stantial training costs. This challenge is further amplified by thetrend towards increasingly complex models, which inherently re-quire greater computational resources for training and deployment.While prior work has explored techniques like supernet sub-modelextraction to address training efficiency, a critical gap remains: theefficient generation of a spectrum of high-quality models froman existing production model, a common requirement in diverseindustrial applications. To bridge this gap, we introduce a novel ap-proach leveraging a "Teaching Assistant" (TA) model, derived froma given production model (referred to as the Student model). Wedemonstrate that through co-training the Student and TA modelswith Matryoshka structure while using online distillation, we notonly enhance the Student model’s performance but also enable theflexible creation of a model family offering a compelling trade-offbetween model quality and model size. View details
    On Design Principles for Private Adaptive Optimizers
    Abhradeep Guha Thakurta
    Arun Ganesh
    Privacy-Preserving Machine Learning Workshop 2025 (2025) (to appear)
    Preview abstract The spherical noise added to gradients in differentially private (DP) training undermines the performance of adaptive optimizers like AdaGrad and Adam, and hence many recent works have proposed algorithms to address this challenge. However, the empirical results in these works focus on simple tasks and models and the conclusions may not generalize to model training in practice. In this paper we survey several of these variants, and develop better theoretical intuition for them as well as perform empirical studies comparing them. We find that a common intuition of aiming for unbiased estimates of second moments of gradients in adaptive optimizers is misguided, and instead that a simple technique called scale-then-privatize (which does not achieve unbiased second moments) has more desirable theoretical behaviors and outperforms all other variants we study on a small-scale language model training task. We additionally argue that scale-then-privatize causes the noise addition to better match the application of correlated noise mechanisms which are more desirable to use in practice. View details
    Preview abstract In the differentially private partition selection problem (a.k.a. private set union, private key discovery), users hold subsets of items from an unbounded universe. The goal is to output as many items as possible from the union of the users' sets while maintaining user-level differential privacy. Solutions to this problem are a core building block for many privacy-preserving ML applications including vocabulary extraction in a private corpus, computing statistics over categorical data and learning embeddings over user-provided items. We propose an algorithm for this problem, MaxAdaptiveDegree(MAD), which adaptively reroutes weight from items with weight far above the threshold needed for privacy to items with smaller weight, thereby increasing the probability that less frequent items are output. Our algorithm can be efficiently implemented in massively parallel computation systems allowing scalability to very large datasets. We prove that our algorithm stochastically dominates the standard parallel algorithm for this problem. We also develop a two-round version of our algorithm, MAD2R, where results of the computation in the first round are used to bias the weighting in the second round to maximize the number of items output. In experiments, our algorithms provide the best results across the board among parallel algorithms and scale to datasets with hundreds of billions of items, up to three orders of magnitude larger than those analyzed by prior sequential algorithms. View details
    Contextual Dynamic Pricing with Heterogeneous Buyers
    Thodoris Lykouris
    Sloan Nietert
    Princewill Okorafor
    Chara Podimata
    2025
    Preview abstract We initiate the study of contextual dynamic pricing with a heterogeneous population of buyers, where a seller repeatedly (over T rounds) posts prices that depend on the observable dimensional context and receives binary purchase feedback. Unlike prior work assuming homogeneous buyer types, in our setting the buyer's valuation type is drawn from an unknown distribution with finite support K*. We develop a contextual pricing algorithm based on Optimistic Posterior Sampling with regret K* sqrt(dT), which we prove to be tight in d, T up to logarithmic terms. Finally, we refine our analysis for the non-contextual pricing case, proposing a variance-aware Zooming algorithm that achieves the optimal dependence on K*. View details
    ×