Abstract
In the evolving landscape of scientific research, the complexity of global challenges demands innovative approaches to experimental planning and execution. Self-Driving Laboratories (SDLs) automate experimental tasks in chemical and materials sciences and the design and selection of experiments to optimize research processes and reduce material usage. This perspective explores improving access to SDLs via centralized facilities and distributed networks. We discuss the technical and collaborative challenges in realizing SDLsâ potential to enhance humanâmachine and humanâhuman collaboration, ultimately fostering a more inclusive research community and facilitating previously untenable research projects.
Similar content being viewed by others
Introduction
The execution and planning of experiments have become increasingly more rigorous and automated in response to the growing complexity of world problems. Experimental planning has gradually evolved from random to statistically driven design of experiments; meanwhile experimental tools have advanced from simple facilitators of manual actions to highly automated platforms. As the complexity, intersectionality, and scope of challenges in energy1, medicine2,3, ecological harm reduction, and nonrenewable-resource management increase, laboratory research must again leap forward: from individualized research to massively collaborative efforts to incorporate diverse expertise and techniques. In order to bring experimentation into the hands of a broad and diverse community of scientists, however, a certain level of automation, throughput, and accessible design must be achieved.
Self-driving (also known as autonomous) laboratories (SDLs) are the result of technological efforts to automate the execution of experimental tasks to meet the demands of industry and academia, the design and selection of experiments to minimize the material and temporal costs of research, the refinement and generation of hypotheses to discover new relationships and knowledge, and the collaboration of multiple research groups to accelerate research4,5. An SDL typically comprises a suite of digital tools to make predictions, propose experiments, and update beliefs between experimental campaigns and a suite of automated hardware to carry out experiments in the physical world (Fig. 1); these two components then work jointly toward a human-defined objective (e.g., process or material property optimization, compound or property-set discovery, self-improvement, and combinations thereof). The primary differences between established high-throughput/cloud laboratories and SDLs lie in the judicious selection of experiments6,7, the adaptation of experimental methods8, and the development of workflows that can integrate the operation of multiple tools. This automation of experimental design provides the leverage for expert and literature knowledge to efficiently tackle the increasingly incomprehensible, multivariate design spaces required by modern problems9. Adaptability allows for SDLs to develop new techniques to handle new applications, expand the feasible experimental design space, and modify workflows on-the-fly to address and preclude safety and sustainability concerns10. Furthermore, the integration of tools, learning modules, and data enables SDLs to accumulate knowledge and continually improve.
Charged with a human-defined goal, an SDL automatically performs iterations of scientific inquiry by designing and planning experiments, formulating precursors from a local material library, performing a reaction or assembly, preparing samples for characterization or subsequent reaction/assembly steps, performing characterization and analysis, contributing to the broader literature or providing its solution to the goal, learning from the results, and finally refining its design of experiments for the next iteration.
SDLs, by acting as highly capable collaborators in the research process, can serve as nexuses for collaboration and inclusion in the sciencesâhelping coordinate and optimize grand and intersectional research efforts and reducing the physical and technical obstacles of performing research manually. When combined with collaborators who bring broad domain expertise, this potential new paradigm for research (SDL-assisted research) could allow for the scientific community to adequately address previously intractable Grand Challenges11 (such as developing economically viable solar power technologies and industrial processes, making breakthroughs in personalized health and safety, and the creation of new analytical devices and methods). Already, SDLs have shown promise in accelerating molecular discovery3,12, the discovery of new synthesis routes for nanoparticles13, crystallographic phase mapping14, microscopy15,16, and HPLC method development17, among other feats reviewed more thoroughly in the literature4,18,19.
The widespread accessibility of and to SDLs is necessary to fully realize their promise20,21. A team science22 research paradigm involves more actors to conduct research and incorporates more diverse ideas into the formulation and execution of research problems. In discussion of the democratization of research, we have chosen to focus on the questions of how many researchers can participate in the scientific method and how to make the generation and analysis of hypotheses more accessible to said researchers. Toward the democratization of research through SDLs, there is an open question as to how SDL technologies will be balanced between open-access, centralized facilities (Centralized approach)âcf. the European Organization for Nuclear Research (CERN) research facilities and the BioPacific MIP23âand networks of distributed facilities (Distributed approach)âcf. the Galaxy Zoo24 and Foldit25 projects and the Harvard Clean Energy Project26. Both the access to scientific research that SDLs provide and the communities that SDLs necessitate and foster position this research paradigm particularly well for enhancing humanâmachine collaboration, multi-disciplinary and data-driven research, public outreach, and science education27. Furthermore, by accelerating research, SDLs involve industry partnerships, whose economic interest in efficient research and development (R&D) can provide the support to build and improve SDL technologies. These external actors will require the personnel to build and maintain SDLsâwhich in turn involves more people to engage with the technology and can bring in more industry scientists.
Recent efforts on prototype SDL platforms and their associated technologies demonstrate the first steps of democratizing SDL-assisted research. Numerous systems have been released with open-source tools such as Chemspyd28, PyLabRobot29, PerQueue30, and Jubilee31 among others32,33,34,35,36,37,38,39. Access to such tools facilitates others in developing their own research platforms. Others have demonstrated collaborations between research groups40,41, across academic levels42,43, and with industry partners44 as well as tools which increase accessibility to non-computer scientists45,46,47. Moreover, additional studies have begun characterizing and benchmarking the performance of current SDLs13,42,48,49,50,51 to facilitate communal comparison and improvement. While these efforts and other SDL technology demonstrations5 inspire hope for the future of SDLs as a democratizing agent in scientific research, there are critical challenges which need to be addressed.
In this perspective, we first discuss two paradigms (centralized and distributed) by which SDLs can be made accessible to the research community and why we believe both avenues should be explored. We then discuss the current roadblocks toward achieving such a democratized future of SDL-assisted research before addressing how we might overcome these obstacles.
Balancing a centralized and distributed future
While creating automated experimental apparatus may be feasible for a general laboratory, the effort required to develop and maintain an SDL is unarguably large. Centralized facilities that allow (virtual) access by applicants concentrate efforts and personnel52,53,54,55,56,57,58 (Fig. 2, left); alternatively, open-source59,60 networks encourage peer-to-peer collaborations that can leverage specialization and modularization (Fig. 2, right).
Laboratories are represented with Greek letters, their requested research projects (jobs) with numbers (for simplicity, each laboratory is planning one job), and their individual capabilities with icons in gray circles. In the centralized approach, multiple laboratories submit their job requests to a single, highly capable facility, and time on various center facilities is equitably distributed between applicants throughout the day. In the distributed approach, various (temporary) groups form around capabilities, and the groups can distribute tasks amongst themselves (jobs can freely flow between collaborators), share information, and transmit instructions and materials (digital letter and airplane) as appropriate.
The categorization of SDL deployment as centralized or distributed is useful but relative. The delineation between these can vary: an SDL for every individual researcher, research group, university, etc.âextending hyperbolically to a single centralized SDL for the entire world. We have chosen to divide the distributed and centralized paradigms between research groups and shared university facilities as there is a noticeable change in the degree of customization and flexibility in addressing research challenges between these two levels.
Hybrid approaches are also feasible, and potentially preferable at this stage. Individual laboratories could utilize simplified, low-cost automation systems61 for workflow development, testing, and troubleshooting before submitting the finalized workflow to an external facility. Moreover, to tailor centralized facilitiesâ capabilities to the needs of specific research groups, individual laboratories could develop an instrument in accordance with facility guidelines such that the unit can âplug inâ to a centralized facility62,63,64. This requires financial and logistic support for transporting and integrating bespoke equipment, but addresses the throughput concerns of an individual laboratory and the specialization concerns of a centralized facility. Finally, collaborations with national laboratories provide a unique opportunity to explore large-scale coordination and to test how various centralized and distributed technologies can be implemented. National laboratoriesâ intermediate scales may be optimal for the development of self-driving systems to manage both academically and industrially relevant data provenance and metadataâa challenge that transcends fields and informs how disparate industrial sectors must adapt to embrace self-driving workflows.
Centralized, distributed, and hybrid approaches seek to keep SDLs open to researchers regardless of background or financial means and ensure that an enclave of privileged facilities do not have sole access to SDLsâa configuration that would worsen disparity between laboratories for funding and publication and squander the potential of an SDL-assisted research paradigm. Despite these varied approaches, the challenge remains of how to ideally balance priorities between advanced, communal automation technologies and networks of specialized platforms. The optimal strategy for SDL deployment must consider how the initial investments (the barriers to entry) are overcome, how logistics and legal concerns are managed, and how the workforce is prepared to engage with SDLs in industry and academia.
In terms of financing and staffing, centralized facilities may be more attractive to industry and national investors, as funding a single meta-project helps guard against splintering and wasted/redundant effort, helps maintain long-term collaboration, and provides more stability (less risk) than an individual research group65. As the costs of commercial automation units and software decline, distributed SDLs become more feasible. Smaller, designer SDLs are in turn likely to attract local businesses and universities. The proximity and flexibility of distributed SDLs can facilitate rapid collaboration on novel and cutting-edge research for a given scientific or industrial niche. Conversely, a centralized facility may have too much inertia to rapidly address changing needs or may struggle to justify providing highly specialized equipment only a handful of users ever use.
The maintenance and sustainability of the SDL ecosystem is also dependent on good management. Team science management, regardless of approach, requires the coordination of data and experiments in a manner that is both robust, equitable, and accountable66,67. The distributed approach, with its more liquid boundaries, would require considerably more coordination, making maintaining the digital aspect (digital twins68 and datasets) more difficult69. The centralized approach raises ethical questions of how projects are selected, time and resources allocated, and experiments managed. For credit in both approaches, the SDLs could be given identifiers (cf. ORCID) for data provenance and attribution in manuscriptsâthough the task of acknowledging the personnel behind each SDL at the time of publication remains a logistical challenge. Unlike high-performance and distributed computing (a close analogue to future SDLs), SDLs consume and produce material; the rights to which and whose purchasing may be challenged by funding agreements.
In democratized science, data must be generated safely, ethically, and legally and the quality of the data must be made trustworthy70. With respect to local and national regulations for hazardous materials, dangerous processes, and sensitive data, a centralized facility may have an easier time being regulation certified, but also must acquire more/higher-grade certifications; whereas a smaller SDL in a distributed network can only apply for the certifications it needs but may struggle to acquire all the engineering controls required to meet the certification. Concerning data quality, any SDL would require routine testing and quality control in order to maintain public trust in addition to study-specific benchmarking and control experiments to ensure the validity of results for new or novel materials and processes. In the centralized approach, a consortium of key facilities could develop these protocols and use them as the standard; whereas in the distributed approach, more effort would be required to create robust future- and site-proof standards for interoperability, shareability, and reproducibility with periodic checkups to remain a trusted member of the network of collaborators. While overall maintenance and standardization may be easier for centralized facilities, it is worth noting that the research groups using these facilities may likely find the established protocols and standards limiting in what research can be conducted.
SDLs must engage with and support their collaborators: people40. Any proposed education strategy implemented should reinforce itself to ensure the sustainability of the SDL ecosystem. In the centralized paradigm, key facilities create centers of learning and can act as educational institutions to provide intense and fulfilling educations for participants. In the distributed paradigm, having more diffuse facilities can more effectively provide access to a geographically diverse cohort of future SDL researchers and may have a larger overall capacityâincreasing the number and diversity of future-researchers benefited. Furthermore, low-cost and do-it-yourself SDL technologies can serve as educational tools for burgeoning researchers (e.g., âfrugal twinâ platforms61, Educational ARES71,72, and Legolas73). Ultimately, SDLs are in service of people, and how users are trained in and engage with these powerful research tools must be thoughtfully considered.
In both paradigms, the goals to increase throughput and reduce cost must be balanced against the quality of the data. Experimental fidelity can greatly impact the number of experiments required to arrive at a solution50,74, and further work is required to determine the optimal tradeoffs between these design goals. Any low-cost system would need rigorous reproducibility analysis to be of value to SDL-assisted research. The relationship between setup and operating costs and data fidelity will evolve with the advancement of automated laboratory technologies and in turn modify the optimal balance between centralized and distributed SDL research.
In summary, centralized approaches pool resources to achieve more technologically advanced SDLs but face challenges in generalizability while distributed approaches provide flexibility but require greater coordination (Table 1). As SDL-related technologies evolve, however, many of the capital and operating expenses (for the creators, managers, and users of SDLs) will change.
Roadblocks on the path to future SDLs
SDLs have the potential to expand (or restrict, if improperly managed) who is afforded the opportunity to do research. The same positive feedback loop that stands to accelerate SDL proliferation also forestalls their widespread use (Fig. 3). Individual SDLs are large projects, and current demonstrations of SDLs are limited by hardware and software capabilities or are deployed conservatively to reduce risk. As a result, there are few exemplar systems that are industrially relevant enough to attract widespread funding, and funding is required for high-impact demonstrations. Support from outside this cycle is needed to kickstart SDL proliferation, and efforts made to improve the power, generalizability, and accessibility of both the physical and digital aspects of SDLs are necessary.
Incomplete integration of artificial intelligence (AI), hardware, and software prevents the creation of agile experimental systems and the complexity of these systems renders SDLs into black-box systems, which can struggle to balance all the requests and questions given to them. Questions and requests to different aspects of SDLs receive unclear answers and this incomplete information results in difficulties collaborating with other scientists and industry partners, which in turn impacts funding and support. A lack of cohesive support challenges the advancement of humanâAIârobot collaboration and the improvement of SDL technologies, starting the cycle anew.
In this section, we review the major roadblocks to SDL proliferation and to the subsequent democratization of automated research. The discussion will focus initially on development of SDLs and their workforce, then into laboratory- and community-level challenges and opportunities for advancement.
In the workforce
The transition from conventional to self-driving research in the materials and chemical space will require developing a specialized, yet multi-faceted, workforce. The status quo of research favors collaborations between others in a closely related field or between members of the same research campus. Unfortunately, the applications where SDLs are most promising also require the greatest diversity and mass of knowledge.
The needed workforce is not, however, a monolithic body: Developers combine hardware, software, processing, and materials innovations to realize new self-driving systems; technicians maintain and tune such systems; and users interact with self-driving research systems through the digital world by selecting hypotheses, guiding learning, and analyzing data. While these may be distinct roles, individuals may move between these roles throughout the course of their careers and as their research needs evolve. Moreover, these roles will require differing levels of expertise within and between fields as well as collaboration skills. As complex, intersectional engineered systems, SDLs will need a healthy distribution of actors to be successful.
Developers of self-driving experimentation systems must combine the expertises of their experimental domain with automation. They must know the subtleties and pitfalls of various laboratory and research techniques in their discipline and be skilled in automation to reify these techniques with programmatic control. In contrast, traditional academic departments are fairly siloed with robotics being separate from chemistry or materials science; and learning to be a developer generally requires practice and dedicated training or self-teaching to fill the educational gaps. The need for practice systems can be ameliorated with low-cost systems that have been specifically designed as pedagogical tools73,75,76,77. The development and dissemination of such systems is an important part of efficiently training SDL developers.
The challenges associated with maintaining SDLs mirror the challenges in developing them. Technicians may be required to monitor processing signals to ensure smooth operation, enhance performance through collaboration78, intervene to repair or recalibrate the system when needed, and restock supplies. Technicians require less depth of expertise in the underlying science and robotics optimization than developers. As such, individuals with some experience with physical experiments (whether vocational or through undergraduate training) and instrument-specific training will be able to perform this role. Here, micro-certifications and online training are appropriate for delivering the specific knowledge needed to operate the relevant systems. Both the academy and industry should be involved in the development of these resources.
A userâs two main responsibilities are to (1) formulate good scientific hypotheses for the SDLs to explore and (2) (virtually) oversee the learning process to make adjustments as needed78. While the former can be thought of as a goal of doctoral education (implying most first-generation users will be domain experts), in contrast to a developer, this domain expertise need not include automation. For proficiency in the latter task, users will have to practice overseeing an SDLârequiring facile tools that allow for this experience and pedagogical resources that define the types of situations that can be experienced (and how to resolve them)79.
While the collaboration between industry and academia is necessary for the advancement of SDL technologies, there is a conflicting argument between what each group seeks to gain from the advancement (cf. Fig. 3, where each circle corresponds roughly to a different role). A user seeking to apply the SDL to solve a scientific problem may desire a more vertically integrated SDL technology, which packages hardware control, data-management, and experimental planning together (such as Atinary or a lab-as-a-service provider52,80,81). A developer seeking to advance laboratory capabilities may desire the ability to combine, adapt, and create new modulesâfavoring horizontal integration. And, presently, there is a large overlap between technicians and both the developer and user rolesâintroducing a bias in the user experience toward having expert knowledge and permitting band-aid solutions in design. A party more interested in workforce development, education, or mitigating job displacement82,83 may desire in-house development and implementation as a form of training. We encourage the community to thoughtfully consider these cases and make which is their focus explicit in their publications. This way there can be more equitable advancements of the technology (theoretical and demonstrated) and the workforce.
Successful efforts in a collaborative SDL will require team science to communicate with diverse researchers and people who may not have shared research experience, vocabulary, or knowledge base. Large groups will need people skilled in team science who can facilitate the development and success of the teamâa role currently not prepared by academia; though there are some programs such as the âGrowing convergence researchâ program at the National Science Foundation (NSF)84 that are pushing this area. A consequence of this lack of a teaching culture is that teamwork is most often learned âon the jobâ. While collaboration is a lifelong learning objective; it has yet to be determined how team science should best be integrated into educational programs. Playing to the strengths of a diverse community, we need a multipronged strategy to address the training of academic and industrial researchers. Universities should engage with industry partners to integrate team and automated sciences into the curriculumâideally these programs should overlap with the micro-credential programs for upskilling existing workers.
In the lab
The integration and automation of advanced data-science strategies for the analysis of results and the proposition of new experiments is a key feature which separates SDLs from prior work in high-throughput experimentation. Despite efforts to democratize research automation technologies, their transferability in practice can be limited by the tension between generality (being applicable to any hardware system) and specificity (taking full advantage of the nuances and capabilities of a given instrument)85. Similarly, machine learning (ML) for materials and molecular discovery struggles to be universal and is currently used primarily to simulate or plan experiments and analyze data.
Laboratory automation is a powerful tool that bridges the digital and physical worlds of SDLs, enabling active learning. Over the past few years, proof-of-concept automated setups have been demonstrated for many upstream86,87 and downstream42,88,89,90,91 steps in the traditional materials science research workflow, as well as more holistic SDLs3,52,55. The integration of disparate components into a single, usable platform, even in a modular fashion, requires establishing engineering controls, networking systems, defining data-collection and management structures, and providing human access points for troubleshooting and collaboration.
Developing automation tools and integrating them into SDLs relies heavily on access to application programming interfaces (APIs) and the documentation of both these APIs and the toolsâ capabilities92. Few APIs are readily provided or supported by manufacturers and many are poorly documented or come with restrictive licensing agreementsâresulting in individual groups producing redundant, rarely universal, solutions. In addition, many vendor-provided APIs may transmit data in opaque data formatsâcurtailing active learning. For commercial equipment, preferential purchasing of solutions that provide native programmatic control with high-quality APIs and documentation (as well as vendor technical support and interoperable data) is a good start. Due to the size of the research market, however, vendors may not foresee sufficient economic returns in recapitalizing their product lines, especially when contrasted with closed ecosystem approaches that are potentially more profitable. As fields emerging into automation, chemical and materials research impose heavy demands on system robustness (e.g., temperature, pressure, and chemical compatibility) and specialization (e.g., flexibility and cutting-edge applications); meanwhile existing automation suppliers, often begotten from a biological domain, are often focused on general-purpose use, high-throughputs, and industrial scale safety. It is imperative, then, that researchers seek to engage with industry R&D departments to co-develop prototype automation toolsâeven if these collaborative efforts cannot be promptly releasedâas by having a seat at the table, gradual improvement to SDL integrations can be achieved.
The technologies the SDL community itself creates must consider accessibility to the broader research community as open-source solutions are less likely to come from industry. Open-source systems that are interchangeable between vendors hinder competitiveness. Academic29, commercial64, and governmental93,94 research efforts into providing open-source and high-quality APIs and SDL development tools show promise in making SDLs more accessible and interoperable. Groups currently developing SDLs should make an effort to reuse (or consult) as much code and data from prior studies as possible, even if it must be modified, and should publicly release their code during publication. While it is often beyond the scope of a single laboratory to develop universal code or experimental techniques, taking the time to analyze code and data reporting decisions can act as an additional form of dialogue between SDL developers. This encourages community involvement, reduces redundant effort, brings people into dialogue, and can help work out the bugs in our standardization efforts. The long-term maintenance95 and cybersecurity of these code libraries as data formats and software evolve, best practices and standards change, and programming language preferences shift is, unlike the popular ML development libraries which are maintained by the technology industry, incompatible with the current funding paradigm96 (see section âIn the communityâ).
Given the effort required to âglueâ each hardware and software module of an SDL together, there have been efforts to automate or assist developers as they construct new SDLs. While powerful middleware, protocol, and orchestration tools that aim to address these interoperability issues have been developed (such as ROS, SiLA 2, and BlueSky64), their adoption is limited to groups developing the most complex SDLs, pointing to a broader issue of fragmented technology ecosystems between laboratories. Universal (highly abstract) middleware suites have learning curves that can challenge less experienced groups and can be difficult to quickly deploy for a specific application. Conversely, for the larger projects of more experienced groups, version management and overhead can become a concern when using these frameworks97 and may motivate a group to create its own middleware suite. Consequently, progress by less experienced groups is limited by their lack of awareness or expertise in the technology and progress by more experienced groups is not easily transferable to other groups. In an attempt to address the inaccessibility of designing or using such middleware suits, several groups have turned toward using pre-trained LLMs to generate this âglueâ code automatically from API documentation98. The unpredictability in existing LLMs requires special attention be put into engineering operational safeguards99. For the near term, achieving genuine planning with LLMs requires considerable supplementation with more traditional, logic-based verification workflows100.
Machine learning is crucial to realize efficient experimental design and drive active learning in SDLs. While ML for SDLs shares in the challenges of multi-objective and multi-property optimization101,102,103 that exist throughout data science, their coupling to physical platforms and the nontrivial inputâoutput spaces of chemical and materials science presents challenging opportunities to advance artificial scientific intelligence.
In practice, experiments are subject to constraints, be they inherent to the physical system or imposed by humans for safety. Constraints are crucial for knowledge transfer between collaborators and evaluating the transferability of a model between physical systems. Constraints can be encoded by using prior knowledge104 (which may introduce bias into the system) or can be learned during experimentation105 (which can necessitate more data: increasing cost). The relative nature of constraints, however, can hinder generalizability and predictive powerâas is notoriously the case with predicting synthesizability106,107,108,109. Similarly, the objectives by which experiments are evaluated need to be quantifiable and measurable within the active learning cycle. For example, future-seeking objectives such as âoptimize for sustainabilityâ need to be translated into short-term measurable, each of which comes with its own set of questions (e.g., metric/assay choice, comparison techniques, economic contexts, etc.). Presently, neither human- nor ML-generated decompositions of these objectives seem to suffice, and new ways of combining and translating objectives are required to address complex or unclear connections between objectives (low- and high-level, immediate and far-off) so that collaborators can make targeted progress toward their diverse goals110,111,112. Recent efforts to address the measurement component of this problem113 include the use of proxies92,114 (i.e., the estimation of inaccessible properties of interest by using correlations to more accessible measurablesâincluding the estimation of device-level properties via inexpensive replicas of real-world devices). While proxy measurements increase the scope of research problems an SDL can investigate (and can often reduce costs), they can struggle in extrapolative campaigns (e.g., material discovery) and so require continually maintaining the correlations with new data115.
A final, foundational ML challenge for SDLs is understanding and quantifying uncertainties. Uncertainty provides a crucial touchstone for human collaborators as a rough estimate of how confident116,117 the ML algorithms are and is used to determine which experiments are proposed. Unlike many data science applications where the uncertainties of observations must be estimated, SDLs present an opportunity to directly measure and transform uncertainties between platforms. Current SDL hardware fails to capture most experimental meta-data; and despite manufacturer testing, experimental uncertainties must be measured for the particular chemical or material system being investigated. An SDL could use such information to learn about laboratory praxis and improve its own workflows for materials discovery and process optimization. Less speculatively, uncertainty, calibration, and benchmark studies are required for human researchers and collaborating SDLs to determine their trust in SDL-generated results50. Despite how crucial such studies are, the current suite of tests and the integration of these controls into SDL workflows is lacking.
In the community
Industrial adoption and implementation of SDL technologies are currently limited to a few specific areas, such as biotech, biopharma, and specialty chemicals/materials where companies have applied SDLs in their discovery pipeline118,119,120,121,122,123,124,125. While traditional funding bodies are typically limited in terms of budget and scope of SDL projects research initiatives (making it difficult to fund ambitious, large-scale projects needed for breakthrough demonstrations), industry participation, can be instrumental in the development of large-scale SDL initiatives. While examples have recently emerged in related fields, further de-risking of the technology is needed for a broad market penetration.
Thorough cost-benefit and other techno-economic analyses are typically required to illustrate the potential savings in time and resources that SDLs offer over traditional R&D processes and will help collaborators make the best choice of tool for the challenge at hand. Mature approaches are often viewed as safer investments for addressing new, complex problems; and even as SDL technologies advance, the question of whether to (or the temptation to) use brute force, high-throughput experimentation will remain as advances in one often apply to the other. While transparent analyses of SDLs may help partners overcome their own barriers (viability, security, IP, and competitiveness), external factors such as legislation over whether compounds or processes discovered by SDLs are patentable hover over industry support.
Data and records of prior work are essential to science, and it is no different for SDL-assisted research. Data-management and modeling must be flexible, interoperable, and provide representations of experiments and results126. Whereas the motivations and challenges of FAIR scientific data have been discussed elsewhere127, attempts have been made to organize experimental information in general purpose relational schema128 and in semantic knowledge graphs38,129 as well as efforts to provide provenance tracking130âfeatures which, once mature, will be indispensable for combating dubious data and for general data-management in a distributed paradigm131. While epistemological and ontological frameworks are in development38,132 and can facilitate collaboration across linguistic, domanial, and cultural barriers, they represent yet another complex system which must be integrated into an SDL; as such, the coming generation of these technologies must seek to be easy to deploy and integrateâpotentially requiring a standardized interface.
As a consequence of the volume of data produced and the (mostly ad hoc) automation of experimental planning, execution, and analysis, there have been concerns about whether the data generated by an SDL can be trusted133,134. While ad hoc solutions are indispensable during the early stages of a technologyâs development, they often result in redundant efforts29,135,136, increased setup times, and suboptimal outcomes and introduce the potential for unreliable experiments. For distributed SDLs, the completeness and thoroughness of platform and results characterization (meta-characterization)50,74,137 must be studied such that collaborators can properly assess literature data and train future users and technicians. Data must encompass not only experimental results (inputs and outputs) but state and environmental information as well (e.g., temperature, relative humidity, and any metadata required to reproduce the results of the ML modelsâmeasurables often overlooked by commercial units)70,138,139. With complete details of an experiment, it may become possible to learn the mapping between different hardware architectures (e.g., batch vs. flow) and further the transferability and interoperability of SDL technologies. Until either a critical mass of SDLs is online or the self-reporting of platform metrics is sufficient to assuage investment risks, the lack of trust to justify SDL investment will continue to stymie the growth of SDL technologies.
Many extant SDLs, however, are beholden to their externally funded projectâs goals (particularly in academia) and either do not have the bandwidth for external validation studies or must conceal crucial aspects of their workflows to protect their sponsorsâ IP (curtailing the cross-validation of their results by other SDLs). Currently, most laboratories can only report calibration, benchmarking, and self-validation data to garner supportâwith only a fraction of self-driving technologies actually incorporating repeatability or reproducibility metrics. Moreover, there are no standards for performing or enforcing such studies, and incentives are non-existent. Different fields and applications have different ideas of what is considered standard for experimental verification; and for multidisciplinary and multi-application SDLs, these discrepancies make defining a singular standard difficult.
While there is pressure to present research in the most promising format while still being fair and honest, supplemental information needs to better characterize the theory and reality of platforms in operations50 (e.g., its performance in well-defined tasks, devices, capabilities, and interfaces) in order to avoid overpromising and backlash. The definition of (partial) success must be considered and discussed when reporting any metrics about SDL performance with respect to its objectivesâe.g., for a âdiscoveryâ campaign, is there success in gaining any insights or only when a new compound is created? how do the properties of interest affect the degree of success? and when does the failure of a chemistry count as a failure of the automation? SDL reports should include calibrations, standards, and comparative or benchmarking studies performed so that others can better build off of the reported results. In addition, metrics such as overall operational performance, the degree of human involvement with the workflow, resources used and wastes generated, as well as an open discussion of areas where the platform could be reasonably improved should be reported (cf. ref. 140, ref. 141, and supplemental materials of ref. 3).
These demands represent additional work in the present; however, it is important to establish rich descriptions of SDLs as the precedent for future work. This rigor will help to build and reinforce trust within and, perhaps more crucially, beyond the SDL community. Similarly, by having code and data available (when legal to do so), the communal development of SDLs can be accelerated and consensus can be achieved for best practices and a shared understanding of SDL language developed. Additionally, rigorous reporting on SDLs, especially their shortcomings, establishes norms that not every published SDL needs to be flawless (an admission that helps the groups behind less âspectacularâ SDLs to enter the conversation) and facilitates the identification of opportunities for improvement (inviting collaboration for perpetual improvement).
The current structure for funding, publication, and maintenance are insufficient for SDLs to act as democratizing agents for scientific research96. Research funds are mostly allocated toward scientific results, rather than the infrastructure and development needed for SDLsâresulting in bootstrapping and leveraging of funds to build SDLs. The singular focus on the scientific results of interest and publishability can result in overlapping technologies tailored to (even pigeonholed in) a specific application. Instead, proposals should be tailored to drive the diversification of collaborators. Diverse expertise supplies SDLs with the necessary breadth of knowledge required to be built and helps to foster and maintain collaborative relationships within the scientific and industrial communities so that future SDLs can thrive in a democratic and collaborative ecosystem. These cross-disciplinary collaborations will also facilitate the training of individuals in scientific communication and team science. Even when projects are simple, the inclusion of non-experts in SDL-related discussions helps prevent the gradual increase of the SDL skill-floor which often occurs when only experts are allowed to participate in discussionsâoutside (even naive) opinions help to challenge assumptions and bring in new ideas. Therefore, it will be important that investments be made in SDL infrastructure (both human and machine), with the understanding and expectation that these new tools will lead to new and impactful scientific advances.
Conclusions
In our collective advancement toward a more democratized future of SDL-assisted research, we must focus our efforts. If the role of SDLs should be to enable more scientists to participate in research and to facilitate the act of research to include more diverse and collaborative ideas, then the efficiency, accessibility, and interoperability of autonomous technologies must be improved. This will only be possible by getting a head-start on team-science and incorporating academic, industrial, and national inputâbuilding standards and protocols, and opening access to our tools and data. Both Centralized and Distributed approaches will require a consortium to outline living standards for automation, software, and data interfaces; Centralized-leaning technologies will require the initial investment; and Distributed-leaning technologies will require a confluence of low-cost assays and modules with interfaces tailored toward the layperson. Advancement of these thrusts are mutually beneficial towards making both SDLs and research more accessible; and the breadth of their coverage will enable SDLs to assist research in addressing problems of industrial and societal impact. The ultimate goal of this democratization for the increase in participation and ideas to germinate into better and more creative solutions, which could not be envisioned by traditional, insular approaches to research.
While advancements to improve individual components must be made, the identified bottlenecks of SDL-aided research in this perspective article hints that additional work is needed in the prediction and management of scopes and throughputs when an SDL is being designed114,142. Such analysis can also help identify which bottlenecks are the result of capital expenditure and which are the result of fundamental limitation of the technology or technique. The latter invites innovation and can better illuminate the cross-SDL benefit of addressing these limitationsâcatalyzing the development of new SDLs. In this spirit, some effort should be made into automating the act of automation by creating SDL âinstallation wizardsâ which can help in selecting equipment and developing variations of traditional workflows to make the most of the resources available.
An institute for automated laboratory infrastructure could better focus the SDL communityâs efforts and provide partners with a meta-project with which to engage and interface. A consortium for SDLs would more readily sustain long-term funding to develop and maintain SDL software infrastructure (as opposed to specific hypothesis-driven research projects) as well as provide pre-competitive, non-proprietary support for academic and industrial researchers. Such a consortium could, as an external entity, attract long-term staffing who could cultivate a set of best practices and engage in developing educational materials (online tutorials, in-person workshops) to train users. By focusing on core SDL issues, the tools developed in part of the consortium would serve both centralized and distributed SDL paradigms.
SDLs provide a means by which to further democratize research. While there are outstanding issues with the technology, they are surmountable; and while the approaches to address these issues vary depending on how centralized or distributed SDLs are implemented, the future of SDLs will encompass both. Addressing these challenges of automation, modeling, data management, collaboration, and training from both angles will help to bring a future of inclusive and accessible research that is flexible and robust against changing paradigms and better fit to address the ever complexifying problems of the world.
References
Chu, S. & Majumdar, A. Opportunities and challenges for a sustainable energy future. Nature 488, 294â303 (2012).
Campos, K. R. et al. The importance of synthetic chemistry in the pharmaceutical industry. Science 363, eaat0805 (2019).
Koscher, B. A. et al. Autonomous, multiproperty-driven molecular discovery: From predictions to measurements and back. Science 382, eadi1407 (2023).
Tom, G. et al. Self-driving laboratories for chemistry and materials science. Chem. Rev. 124, 9633â9732 (2024). A nearly comprehensive review of self-driving research literature through 2023 showcasing the applications and impact of self-driving laboratory technologies.
Bayley, O., Savino, E., Slattery, A. & Noël, T. Autonomous chemistry: navigating self-driving labs in chemical and material sciences. Matter 7, 2382â2398 (2024). A breakdown of automated laboratory paradigms (flow, batch, mobile), automated experimental design, and applications with a discussion on decentralization towards increasing the accessibility of self-driving laboratory technologies.
Boswell-Koller, C. et al. Accelerated Materials Experimentation Enabled by the Autonomous Materials Innovation Infrastructure (AMII) A Workshop Report. https://www.mgi.gov/autonomous-experimentation-materials-rd (2024).
Noack, M. M. et al. Autonomous materials discovery driven by Gaussian process regression with inhomogeneous measurement noise and anisotropic kernels. Sci. Rep. 10, 17663 (2020).
Canty, R. B., Koscher, B. A., McDonald, M. A. & Jensen, K. F. Integrating autonomy into automated research platforms. Digit. Discov. 2, 1259â1268 (2023).
Montoya, J. H. et al. Toward autonomous materials research: Recent progress and future challenges. Appl. Phys. Rev. 9, 011405 (2022).
Sadeghi, S. et al. Engineering a sustainable future: harnessing automation, robotics, and artificial intelligence with self-driving laboratories. ACS Sustain. Chem. Eng. https://doi.org/10.1021/acssuschemeng.4c02177 (2024)
Grand Challenges - 14 Grand Challenges for Engineering. https://www.engineeringchallenges.org/challenges.aspx.
Wu, T. C. et al. A materials acceleration platform for organic laser discovery. Adv. Mater. 35, 2207070 (2023).
Volk, A. A. et al. AlphaFlow: autonomous discovery and optimization of multi-step chemistry using a self-driven fluidic lab guided by reinforcement learning. Nat. Commun. 14, 1403 (2023).
He, D. et al. Algorithm-driven robotic discovery of polyoxometalate-scaffolding metalâorganic frameworks. J. Am. Chem. Soc. 146, 28952â28960 (2024).
Pratiush, U. et al. Building Workflows for Interactive Human in the Loop Automated Experiment (hAE) in STEM-EELS. Preprint at https://doi.org/10.48550/arXiv.2404.07381 (2024).
Liu, Y., Ziatdinov, M. A., Vasudevan, R. K. & Kalinin, S. V. Explainability and human intervention in autonomous scanning probe microscopy. Patterns 4, 100858 (2023).
Dixon, T. M. et al. Operator-free HPLC automated method development guided by Bayesian optimization. Digit. Discov. 3, 1591â1601 (2024).
Smith, S. C., Horbaczewskyj, C. S., Tanner, T. F. N., Walder, J. J. & Fairlamb, I. J. S. Automated approaches, reaction parameterisation, and data science in organometallic chemistry and catalysis: towards improving synthetic chemistry and accelerating mechanistic understanding. Digit. Discov. 3, 1467â1495 (2024).
Abolhasani, M. & Kumacheva, E. The rise of self-driving labs in chemical and materials sciences. Nat. Synth. 1â10, https://doi.org/10.1038/s44160-022-00231-0 (2023).
Stach, E. et al. Autonomous experimentation systems for materials development: a community perspective. Matter 4, 2702â2726 (2021).
Baird, S. G. & Sparks, T. D. What is a minimal working example for a self-driving laboratory? Matter 5, 4170â4178 (2022).
The Science of Team Science | National Academies. https://www.nationalacademies.org/our-work/the-science-of-team-science.
NSF BioPACIFIC MIP (DMR-1933487). https://biopacificmip.org/.
Masters, K. L. & Galaxy Zoo Team. Twelve years of Galaxy Zoo. Proc. Int. Astron. Union 14, 205â212 (2019).
Khatib, F. et al. Crystal structure of a monomeric retroviral protease solved by protein folding game players. Nat. Struct. Mol. Biol. 18, 1175â1177 (2011).
Hachmann, J. et al. The Harvard Clean Energy Project: large-scale computational screening and design of organic photovoltaics on the world community grid. J. Phys. Chem. Lett. 2, 2241â2251 (2011).
NSFâs 10 Big Ideas - Special Report | NSF - National Science Foundation. https://www.nsf.gov/news/special_reports/big_ideas/.
Seifrid, M. et al. Chemspyd: an open-source python interface for Chemspeed robotic chemistry and materials platforms. Digit. Discov. 3, 1319â1326 (2024).
Wierenga, R. P., Golas, S. M., Ho, W., Coley, C. W. & Esvelt, K. M. PyLabRobot: an open-source, hardware-agnostic interface for liquid-handling robots and accessories. Device 1, 100111 (2023).
Heckscher Sjølin, B. et al. PerQueue: managing complex and dynamic workflows. Digit. Discov. 3, 1832â1841 (2024).
Politi, M. et al. A high-throughput workflow for the synthesis of CdSe nanocrystals using a sonochemical materials acceleration platform. Digit. Discov. 2, 1042â1057 (2023).
Ziatdinov, M. A. et al. Hypothesis learning in automated experiment: application to combinatorial materials libraries. Adv. Mater. 34, 2201345 (2022).
Raghavan, A. et al. Evolution of Ferroelectric Properties in SmxBi1âxFeO3 via Automated Piezoresponse Force Microscopy across combinatorial spread libraries. ACS Nano 18, 25591â25600 (2024).
Liu, Y. et al. Autonomous scanning probe microscopy with hypothesis learning: exploring the physics of domain switching in ferroelectric materials. Patterns 4, 100704 (2023).
Liu, Y. et al. Experimental discovery of structureâproperty relationships in ferroelectric materials via active learning. Nat. Mach. Intell. 4, 341â350 (2022).
Roccapriore, K. M., Kalinin, S. V. & Ziatdinov, M. Physics discovery in nanoplasmonic systems via autonomous experiments in scanning transmission electron microscopy. Adv. Sci. 9, 2203422 (2022).
Pratiush, U., Funakubo, H., Vasudevan, R., Kalinin, S. V. & Liu, Y. Scientific Exploration with Expert Knowledge (SEEK) in autonomous scanning probe microscopy with active learning. Digital Discovery 4, 252â263 (2025).
Bai, J. et al. A dynamic knowledge graph approach to distributed self-driving laboratories. Nat. Commun. 15, 462 (2024).
Liu, Y. et al. Exploring the relationship of microstructure and conductivity in metal halide perovskites via active learning-driven automated scanning probe microscopy. J. Phys. Chem. Lett. 14, 3352â3359 (2023).
Leins, A. D., Haase, S. B., Eslami, M., Schrier, J. & Freeman, J. T. Collaborative methods to enhance reproducibility and accelerate discovery. Digit. Discov. 2, 12â27 (2023).
Vogler, M. et al. Autonomous Battery Optimization by Deploying Distributed Experiments and Simulations. Adv. Energy Mater. 14, 2403263 (2024).
Gongora, A. E. et al. A Bayesian experimental autonomous researcher for mechanical design. Sci. Adv. 6, eaaz1708 (2020).
Quinn, H. et al. PANDA: a self-driving lab for studying electrodeposited polymer films. Mater. Horiz. https://doi.org/10.1039/D4MH00797B (2024)
Liu, Y. et al. Machine learning-based reward-driven tuning of scanning probe microscopy: towards fully automated microscopy. Preprint at https://arxiv.org/abs/2408.04055v1 (2024).
Mehr, S. H. M., Craven, M., Leonov, A. I., Keenan, G. & Cronin, L. A universal system for digitization and automatic execution of the chemical synthesis literature. Science 370, 101â108 (2020).
Darvish, K. et al. ORGANA: A robotic assistant for automated chemistry experimentation and characterization. Matter 8, 101897 (2025).
Ren, Z., Zhang, Z., Tian, Y. & Li, J. CRESt â copilot for real-world experimental scientist. Preprint at https://doi.org/10.26434/chemrxiv-2023-tnz1x-v4 (2023).
Gongora, A. E. et al. Using simulation to accelerate autonomous experimentation: a case study using mechanics. iScience 24, 102262 (2021).
Snapp, K. L. et al. Superlative mechanical energy absorbing efficiency discovered through self-driving lab-human partnership. Nat. Commun. 15, 4290 (2024).
Volk, A. A. & Abolhasani, M. Performance metrics to unleash the power of self-driving labs in chemistry and materials science. Nat. Commun. 15, 1378 (2024).
Suvarna, M. et al. Active learning streamlines development of high performance catalysts for higher alcohol synthesis. Nat. Commun. 15, 5844 (2024).
Emerald Cloud Lab: Remote Controlled Life Sciences Lab. https://www.emeraldcloudlab.com/.
German-Canadian Materials Acceleration Centre. https://gcmac.ca/.
CMU Cloud Lab | A Future of Science Initiative. https://cloudlab.cmu.edu/.
Arias, D. S. & Taylor, R. E. Scientific discovery at the press of a button: navigating emerging cloud laboratory technology. Adv. Mater. Technol. 9, 2400084 (2024).
Office of Science User Facilities. Energy.gov. https://www.energy.gov/science/office-science-user-facilities.
Automating Labs through Green Button Go | Biosero. https://biosero.com/ (2021).
Burke, M. D. et al. Molecule Maker Lab Institute: accelerating, advancing, and democratizing molecular innovation. AI Mag. 45, 117â123 (2024).
Li, J. et al. Autonomous discovery of optically active chiral inorganic perovskite nanocrystals through an intelligent cloud lab. Nat. Commun. 11, 2046 (2020).
Kalinin, S. V. et al. Probe microscopy is all you need. Mach. Learn. Sci. Technol. 4, 023001 (2023).
Lo, S. et al. Review of low-cost self-driving laboratories in chemistry and materials science: the âfrugal twinâ concept. Digit. Discov. 3, 842â868 (2024). Introduction to the concept of âFrugal twinsâ, wherein complex physical systems are modeled with low-cost alternatives for the purpose of instruction, integration, troubleshooting, and method development.
Autonomous Discovery. Argonne National Laboratory. https://www.anl.gov/autonomous-discovery.
Beaucage, P. A. & Martin, T. B. The Autonomous Formulation Laboratory: an open liquid handling platform for formulation discovery using X-ray and neutron scattering. Chem. Mater. 35, 846â852 (2023).
Allan, D., Caswell, T., Campbell, S. & Rakitin, M. Blueskyâs ahead: a multi-facility collaboration for an a la carte software project for data acquisition and management. Synchrotron Radiat. N. 32, 19â22 (2019).
Bach, U. & Leisten, I. How to Structure and Foster Innovative Research. In Automation, Communication and Cybernetics in Science and Engineering 2009/2010 (eds Jeschke, S., Isenhardt, I. & Henning, K.) 3â13. https://doi.org/10.1007/978-3-642-16208-4_1 (Springer, Berlin, Heidelberg, 2011). A breakdown of funding priorities and project management for meta-projects in scientific research (as applied to occupational health and safety).
Stokols, D., Misra, S., Moser, R. P., Hall, K. L. & Taylor, B. K. The ecology of team science: understanding contextual influences on transdisciplinary collaboration. Am. J. Prev. Med. 35, S96âS115 (2008).
Bennett, L. M. Team Science: An Exercise in Difference and Diversity. In An Astronomical Inclusion Revolution: Advancing Diversity, Equity, and Inclusion in Professional Astronomy and Astrophysics. https://doi.org/10.1088/2514-3433/ad2174ch8 (IOP Astronomy, 2024).
Slautin, B. N. et al. Bayesian co-navigation: dynamic designing of the materials digital twins via active learning. ACS Nano 18, 24898â24908 (2024).
National Science Data Fabric. https://nationalsciencedatafabric.org/.
Pelkie, B. G. & Pozzo, L. D. The laboratory of Babel: highlighting community needs for integrated materials data management. Digit. Discov. 2, 544â556 (2023). An overview of and guideline for the collaborative management of data in group research projects, notably highlighting the importance of meta-characterization and the logging of the environment under which experiments are conducted so as to evaluate collective trust.
AI research robots key to âdemocratizing and revolutionizing scienceâ, world-class AFRL re. One AFRL â One Fight. https://www.afrl.af.mil/News/Article-Display/Article/3559877/ai-research-robots-key-to-democratizing-and-revolutionizing-science-world-class (2023).
ARES Learning | Project-Based Research-Driven Space-Focused School & Enrichment. ARES Learning. https://www.areslearning.com.
Saar, L. et al. The LEGOLAS Kit: a low-cost robot science kit for education with symbolic regression for hypothesis discovery and validation. MRS Bull. 47, 881â885 (2022).
Epps, R. W. & Abolhasani, M. Modern nanoscience: convergence of AI, robotics, and colloidal synthesis. Appl. Phys. Rev. 8, 041316 (2021).
Ganitano, G. S., Wallace, S. G., Maruyama, B. & Peterson, G. L. A hybrid metaheuristic and computer vision approach to closed-loop calibration of fused deposition modeling 3D printers. Prog. Addit. Manuf. 9, 767â777 (2024).
Baird, S. G. & Sparks, T. D. Building a âHello Worldâ for self-driving labs: The Closed-loop Spectroscopy Lab Light-mixing demo. STAR Protoc. 4, 102329 (2023).
Norquist, A. J., Jones-Thomson, G., He, K., Egg, T. & Schrier, J. A modern twist on an old measurement: using laboratory automation and data science to determine the solubility product of lead iodide. J. Chem. Educ. 100, 3445â3453 (2023).
Hung, L. et al. Autonomous laboratories for accelerated materials discovery: a community survey and practical insights. Digit. Discov. 3, 1273â1279 (2024). A survey analysis providing insight into the desires and fears of automation and autonomization that also highlights key distinctions (such as flexibility vs robustness) as well as the utility of keeping humans in-the-loop.
Snapp, K. L. & Brown, K. A. Driving school for self-driving labs. Digit. Discov. 2, 1620â1629 (2023).
Chemistry Testing Laboratory. Certified Laboratories. https://certified-laboratories.com/chemistry/.
Services & Solutions ⢠Frontage Laboratories. Frontage Laboratories. https://www.frontagelab.com/services/.
Vermeulen, B., Kesselhut, J., Pyka, A. & Saviotti, P. P. The Impact of Automation on Employment: Just the Usual Structural Change? Sustainability 10, 1661 (2018). A labor economic theory analysis of automation describing the observed (rather than theoretical) effects of automation in the workforce and how automation has historically resulted in a restructuring of labor rather than its elimination.
Howard, J. Artificial intelligence: Implications for the future of work. Am. J. Ind. Med. 62, 917â926 (2019).
GROWING CONVERGENCE RESEARCH (GCR)|NSF - National Science Foundation. https://new.nsf.gov/funding/opportunities/growing-convergence-research-gcr (2024).
Canty, R. B. & Abolhasani, M. Reproducibility in automated chemistry laboratories using computer science abstractions. Nat. Synth. 1â13, https://doi.org/10.1038/s44160-024-00649-8 (2024).
Moliner, M. et al. Application of artificial neural networks to high-throughput synthesis of zeolites. Microporous Mesoporous Mater. 78, 73â81 (2005).
Kirman, J. et al. Machine-learning-accelerated perovskite crystallization. Matter 2, 938â947 (2020).
Ludwig, A. Discovery of new materials using combinatorial synthesis and high-throughput characterization of thin-film materials libraries combined with computational methods. Npj Comput. Mater. 5, 1â7 (2019).
Stein, H. S. & Gregoire, J. M. Progress and prospects for accelerating materials science with automated and autonomous workflows. Chem. Sci. 10, 9640â9649 (2019).
Desai, B. et al. Rapid discovery of a novel series of Abl kinase inhibitors by application of an integrated microfluidic synthesis and screening platform. J. Med. Chem. 56, 3033â3047 (2013).
Adamo, A. et al. On-demand continuous-flow production of pharmaceuticals in a compact, reconfigurable system. Science 352, 61â67 (2016).
Seifrid, M. et al. Autonomous chemical experiments: challenges and perspectives on establishing a self-driving lab. Acc. Chem. Res. 55, 2454â2466 (2022). An introduction to proxy measurements and the internal challenges of self-driving laboratory technology (specifically robotics and cognitive models).
MolSSI â The Molecular Sciences Software Institute. https://molssi.org/.
Vescovi, R. et al. Towards a modular architecture for science factories. Digit. Discov. 2, 1980â1998 (2023).
Avoiding âBit Rotâ: Long-Term Preservation of Digital Information [Point of View] | IEEE Journals & Magazine | IEEE Xplore. https://ieeexplore.ieee.org/document/5768098. Definition of the term âbit rotâ and the systematic challenges of producing and maintaining high-quality software.
Monteith, J. Y., McGregor, J. D. & Ingram, J. E. Scientific Research Software Ecosystems. In Proc 2014 European Conference on Software Architecture Workshops 1â6, https://doi.org/10.1145/2642803.2642812 (Association for Computing Machinery, New York, NY, USA, 2014).
Rahmanian, F. et al. Enabling modular autonomous feedback-loops in materials science through hierarchical experimental laboratory automation and orchestration. Adv. Mater. Interfaces 9, 2101987 (2022).
Liang, J. et al. Code as policies: language model programs for embodied control. In 2023 IEEE International Conference on Robotics and Automation (ICRA) 9493â9500. https://doi.org/10.1109/ICRA48891.2023.10160591 (2023).
Tang, X. et al. Prioritizing safeguarding over autonomy: risks of LLM agents for science. Preprint at https://doi.org/10.48550/arXiv.2402.04247 (2024).
Kambhampati, S. et al. Position: LLMs canât plan, but can help planning in LLM-modulo frameworks. in Proceedings of the 41st International Conference on Machine Learning vol. 235 22895â22907 (JMLR.org, Vienna, Austria, 2024).
Skalse, J., Howe, N., Krasheninnikov, D. & Krueger, D. Defining and Characterizing Reward Gaming. Adv. Neural Inf. Process. Syst. 35, 9460â9471 (2022).
Fromer, C. J., Graff, E. D. & Coley, C. W. Pareto optimization to accelerate multi-objective virtual screening. Digit. Discov. 3, 467â481 (2024).
Dietz, T. et al. Introducing multiobjective complex systems. Eur. J. Oper. Res. 280, 581â596 (2020).
Kapusuzoglu, B. & Mahadevan, S. Information fusion and machine learning for sensitivity analysis using physics knowledge and experimental data. Reliab. Eng. Syst. Saf. 214, 107712 (2021).
Epps, W. R., Volk, A. A., Reyes, G. K. & Abolhasani, M. Accelerated AI development for autonomous materials synthesis in flow. Chem. Sci. 12, 6025â6036 (2021).
Gao, W. & Coley, C. W. The synthesizability of molecules proposed by generative models. J. Chem. Inf. Model. 60, 5714â5723 (2020).
Fromer, J. C. & Coley, C. W. An algorithmic framework for synthetic cost-aware decision making in molecular design. Nat. Comput. Sci. 4, 440â450 (2024).
Wang, J. et al. ChemistGA: a chemical synthesizable accessible molecular generation algorithm for real-world drug discovery. J. Med. Chem. 65, 12482â12496 (2022).
Swanson, K. et al. Generative AI for designing and validating easily synthesizable and structurally novel antibiotics. Nat. Mach. Intell. 6, 338â353 (2024).
Fromer, J. C. & Coley, C. W. Computer-aided multi-objective optimization in small molecule discovery. Patterns 4, 100678 (2023).
Slautin, B. N. et al. Co-orchestration of multiple instruments to uncover structureâproperty relationships in combinatorial libraries. Digit. Discov. 3, 1602â1611 (2024).
Batatia, I. et al. A foundation model for atomistic materials chemistry. Preprint at https://doi.org/10.48550/arXiv.2401.00096 (2024).
Strieth-Kalthoff, F. et al. Delocalized, asynchronous, closed-loop discovery of organic laser emitters. Science 384, eadk9227 (2024).
Scheurer, C. & Reuter, K. Role of the human-in-the-loop in emerging self-driving laboratories for heterogeneous catalysis. Nat. Catal. 8, 13â19 (2025).
Fare, C., Fenner, P., Benatan, M., Varsi, A. & Pyzer-Knapp, E. O. A multi-fidelity machine learning approach to high throughput materials screening. Npj Comput. Mater. 8, 1â9 (2022).
Kahle, L. & Zipoli, F. Quality of uncertainty estimates from neural network potential ensembles. Phys. Rev. E 105, 015311 (2022).
Zhan, N. & Kitchin, J. R. Model-specific to model-general uncertainty for physical properties. Ind. Eng. Chem. Res. 61, 8368â8377 (2022).
Bhowmik, A. et al. Implications of the BATTERY 2030+AI-assisted toolkit on future low-TRL Battery Discoveries and Chemistries. Adv. Energy Mater. 12, 2102698 (2022).
Acceleration Consortium. https://acceleration.utoronto.ca/.
CAPeX. https://capex.dtu.dk/.
Savage, N. Tapping into the drug discovery potential of AI. Biopharma Deal. https://doi.org/10.1038/d43747-021-00045-7 (2021).
Hede, K. PNNL Kicks Off Multi-Year Energy Storage, Scientific Discovery Collaboration with Microsoft. https://www.pnnl.gov/news-media/pnnl-kicks-multi-year-energy-storage-scientific-discovery-collaboration-microsoft (2024).
Umicore enters AI platform agreement with Microsoft. https://www.umicore.com/en/newsroom/umicore-enters-ai-platform-agreement-with-microsoft-to-accelerate-and-scale-its-battery-materials-technologies-development/ (2024).
Northvoltâs embrace of machine learning. https://northvolt.com/articles/northvolt-machine-learning/ (2023).
MIF Welcome Video. https://liverpool.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=969aa046-7742-4f18-bfeb-b0730069a06c.
Allec, S. I. et al. A case study of multimodal, multi-institutional data management for the combinatorial materials science community. Integrating Mater. Manuf. Innov. 13, 406â419 (2024).
Scheffler, M. et al. FAIR data enabling new horizons for materials research. Nature 604, 635â642 (2022).
Pendleton, I. M. et al. Experiment Specification, Capture and Laboratory Automation Technology (ESCALATE): a software pipeline for automated chemical experimentation and data management. MRS Commun. 9, 846â859 (2019).
Bai, J. et al. From platform to knowledge graph: evolution of laboratory automation. JACS Au 2, 292â309 (2022).
Pruyne, J., Wozniak, J. M. & Foster, I. Tracking dubious data: protecting scientific workflows from invalidated experiments. In 2022 IEEE 18th International Conference on e-Science (e-Science) 456â461, https://doi.org/10.1109/eScience55777.2022.00082 (2022).
Delgado-Licona, F. & Abolhasani, M. Research acceleration in self-driving labs: technological roadmap toward accelerated materials and molecular discovery. Adv. Intell. Syst. 5, 2200331 (2023).
Clark, S. et al. Toward a unified description of battery data. Adv. Energy Mater. 12, 2102702 (2022).
Szymanski, N. J. et al. An autonomous laboratory for the accelerated synthesis of novel materials. Nature 624, 86â91 (2023).
Leeman, J. et al. Challenges in high-throughput inorganic materials prediction and autonomous synthesis. PRX Energy 3, 011002 (2024).
Yoo, H. J. et al. Bespoke metal nanoparticle synthesis at room temperature and discovery of chemical knowledge on nanoparticle growth via autonomous experimentations. Adv. Funct. Mater. 34, 2312561 (2024).
RodrÃguez, O., Pence, M. A. & RodrÃguez-López, J. Hard potato: a python library to control commercial potentiostats and to automate electrochemical experiments. Anal. Chem. 95, 4840â4845 (2023).
Hein, J. E. & Schrier, J. Guidelines for hardware-focused articles. Digit. Discov. 3, 447â448 (2024).
Willoughby, C. & Frey, J. G. Data management matters. Digit. Discov. 1, 183â194 (2022).
Maffettone, P. M. et al. What is missing in autonomous discovery: open challenges for the community. Digit. Discov. 2, 1644â1659 (2023).
Fakhruldeen, H., Pizzuto, G., Glowacki, J. & Cooper, A. I. ARChemist: autonomous robotic chemistry system architecture. In 2022 International Conference on Robotics and Automation (ICRA) 6013â6019. https://doi.org/10.1109/ICRA46639.2022.9811996 (2022).
MacLeod, B. P. et al. Self-driving laboratory for accelerated discovery of thin-film materials. Sci. Adv. 6, eaaz8867 (2020).
Christensen, M. et al. Automation isnât automatic. Chem. Sci. 12, 15473â15490 (2021). An instructive guide for the establishment of a self-driving laboratory and the practical concerns, especially managing the relative throughputs of each component and the cost-benefit analysis of various approaches.
Acknowledgements
M.A. graciously acknowledges the financial support of the Technology, Innovation, and Partnerships (TIP) Directorate of the National Science Foundation (award #2332452) and the records made by Pragyan Jha, Fernando Delgado-Licona, Sina Sadeghi, and Nikolai Mukhin during the FUTURE Labs Workshop (NSF #2332452) at North Carolina State University. S.V.K. was supported by the Center for Advanced Materials and Manufacturing (CAMM) and the NSF Materials Research Science and Engineering Centers (MRSEC). R.G.M. acknowledges support from the INTERSECT Initiative as part of the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory, managed by UT-Battelle, LLC, for the US Department of Energy under contract DE-AC05-00OR22725. JS acknowledges support from the National Science Foundation (PHY-2226511, OAC-2320718). T.V. acknowledges support from the Pioneer Center for Accelerating P2X Materials Discovery (CAPeX), DNRF grant number P3.
Author information
Authors and Affiliations
Contributions
R.B.C., J.A.B., K.A.B., T.B., S.V.K., J.R.K., B.M., R.G.M., J.S., M.S., S.S., T.V., and M.A. all contributed to the writing of this persperctive article. R.B.C., J.A.B., and M.A. prepared the figures and edited the manuscript. M.A. acquired the funding for the workshop that resulted in this perspective article.
Corresponding author
Ethics declarations
Competing interests
J.S. is on the scientific advisory board of Atinary, mentioned in the article. The authors declare no additional competing interests.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisherâs note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the articleâs Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the articleâs Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Canty, R.B., Bennett, J.A., Brown, K.A. et al. Science acceleration and accessibility with self-driving labs. Nat Commun 16, 3856 (2025). https://doi.org/10.1038/s41467-025-59231-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-025-59231-1