Introduction

The execution and planning of experiments have become increasingly more rigorous and automated in response to the growing complexity of world problems. Experimental planning has gradually evolved from random to statistically driven design of experiments; meanwhile experimental tools have advanced from simple facilitators of manual actions to highly automated platforms. As the complexity, intersectionality, and scope of challenges in energy1, medicine2,3, ecological harm reduction, and nonrenewable-resource management increase, laboratory research must again leap forward: from individualized research to massively collaborative efforts to incorporate diverse expertise and techniques. In order to bring experimentation into the hands of a broad and diverse community of scientists, however, a certain level of automation, throughput, and accessible design must be achieved.

Self-driving (also known as autonomous) laboratories (SDLs) are the result of technological efforts to automate the execution of experimental tasks to meet the demands of industry and academia, the design and selection of experiments to minimize the material and temporal costs of research, the refinement and generation of hypotheses to discover new relationships and knowledge, and the collaboration of multiple research groups to accelerate research4,5. An SDL typically comprises a suite of digital tools to make predictions, propose experiments, and update beliefs between experimental campaigns and a suite of automated hardware to carry out experiments in the physical world (Fig. 1); these two components then work jointly toward a human-defined objective (e.g., process or material property optimization, compound or property-set discovery, self-improvement, and combinations thereof). The primary differences between established high-throughput/cloud laboratories and SDLs lie in the judicious selection of experiments6,7, the adaptation of experimental methods8, and the development of workflows that can integrate the operation of multiple tools. This automation of experimental design provides the leverage for expert and literature knowledge to efficiently tackle the increasingly incomprehensible, multivariate design spaces required by modern problems9. Adaptability allows for SDLs to develop new techniques to handle new applications, expand the feasible experimental design space, and modify workflows on-the-fly to address and preclude safety and sustainability concerns10. Furthermore, the integration of tools, learning modules, and data enables SDLs to accumulate knowledge and continually improve.

Fig. 1: A schematic overview of how a self-driving laboratory (SDL) operates.
figure 1

Charged with a human-defined goal, an SDL automatically performs iterations of scientific inquiry by designing and planning experiments, formulating precursors from a local material library, performing a reaction or assembly, preparing samples for characterization or subsequent reaction/assembly steps, performing characterization and analysis, contributing to the broader literature or providing its solution to the goal, learning from the results, and finally refining its design of experiments for the next iteration.

SDLs, by acting as highly capable collaborators in the research process, can serve as nexuses for collaboration and inclusion in the sciences—helping coordinate and optimize grand and intersectional research efforts and reducing the physical and technical obstacles of performing research manually. When combined with collaborators who bring broad domain expertise, this potential new paradigm for research (SDL-assisted research) could allow for the scientific community to adequately address previously intractable Grand Challenges11 (such as developing economically viable solar power technologies and industrial processes, making breakthroughs in personalized health and safety, and the creation of new analytical devices and methods). Already, SDLs have shown promise in accelerating molecular discovery3,12, the discovery of new synthesis routes for nanoparticles13, crystallographic phase mapping14, microscopy15,16, and HPLC method development17, among other feats reviewed more thoroughly in the literature4,18,19.

The widespread accessibility of and to SDLs is necessary to fully realize their promise20,21. A team science22 research paradigm involves more actors to conduct research and incorporates more diverse ideas into the formulation and execution of research problems. In discussion of the democratization of research, we have chosen to focus on the questions of how many researchers can participate in the scientific method and how to make the generation and analysis of hypotheses more accessible to said researchers. Toward the democratization of research through SDLs, there is an open question as to how SDL technologies will be balanced between open-access, centralized facilities (Centralized approach)—cf. the European Organization for Nuclear Research (CERN) research facilities and the BioPacific MIP23—and networks of distributed facilities (Distributed approach)—cf. the Galaxy Zoo24 and Foldit25 projects and the Harvard Clean Energy Project26. Both the access to scientific research that SDLs provide and the communities that SDLs necessitate and foster position this research paradigm particularly well for enhancing human–machine collaboration, multi-disciplinary and data-driven research, public outreach, and science education27. Furthermore, by accelerating research, SDLs involve industry partnerships, whose economic interest in efficient research and development (R&D) can provide the support to build and improve SDL technologies. These external actors will require the personnel to build and maintain SDLs—which in turn involves more people to engage with the technology and can bring in more industry scientists.

Recent efforts on prototype SDL platforms and their associated technologies demonstrate the first steps of democratizing SDL-assisted research. Numerous systems have been released with open-source tools such as Chemspyd28, PyLabRobot29, PerQueue30, and Jubilee31 among others32,33,34,35,36,37,38,39. Access to such tools facilitates others in developing their own research platforms. Others have demonstrated collaborations between research groups40,41, across academic levels42,43, and with industry partners44 as well as tools which increase accessibility to non-computer scientists45,46,47. Moreover, additional studies have begun characterizing and benchmarking the performance of current SDLs13,42,48,49,50,51 to facilitate communal comparison and improvement. While these efforts and other SDL technology demonstrations5 inspire hope for the future of SDLs as a democratizing agent in scientific research, there are critical challenges which need to be addressed.

In this perspective, we first discuss two paradigms (centralized and distributed) by which SDLs can be made accessible to the research community and why we believe both avenues should be explored. We then discuss the current roadblocks toward achieving such a democratized future of SDL-assisted research before addressing how we might overcome these obstacles.

Balancing a centralized and distributed future

While creating automated experimental apparatus may be feasible for a general laboratory, the effort required to develop and maintain an SDL is unarguably large. Centralized facilities that allow (virtual) access by applicants concentrate efforts and personnel52,53,54,55,56,57,58 (Fig. 2, left); alternatively, open-source59,60 networks encourage peer-to-peer collaborations that can leverage specialization and modularization (Fig. 2, right).

Fig. 2: Schematic illustrating both centralized (top left) and distributed (top right) self-driving laboratories (SDLs).
figure 2

Laboratories are represented with Greek letters, their requested research projects (jobs) with numbers (for simplicity, each laboratory is planning one job), and their individual capabilities with icons in gray circles. In the centralized approach, multiple laboratories submit their job requests to a single, highly capable facility, and time on various center facilities is equitably distributed between applicants throughout the day. In the distributed approach, various (temporary) groups form around capabilities, and the groups can distribute tasks amongst themselves (jobs can freely flow between collaborators), share information, and transmit instructions and materials (digital letter and airplane) as appropriate.

The categorization of SDL deployment as centralized or distributed is useful but relative. The delineation between these can vary: an SDL for every individual researcher, research group, university, etc.—extending hyperbolically to a single centralized SDL for the entire world. We have chosen to divide the distributed and centralized paradigms between research groups and shared university facilities as there is a noticeable change in the degree of customization and flexibility in addressing research challenges between these two levels.

Hybrid approaches are also feasible, and potentially preferable at this stage. Individual laboratories could utilize simplified, low-cost automation systems61 for workflow development, testing, and troubleshooting before submitting the finalized workflow to an external facility. Moreover, to tailor centralized facilities’ capabilities to the needs of specific research groups, individual laboratories could develop an instrument in accordance with facility guidelines such that the unit can “plug in” to a centralized facility62,63,64. This requires financial and logistic support for transporting and integrating bespoke equipment, but addresses the throughput concerns of an individual laboratory and the specialization concerns of a centralized facility. Finally, collaborations with national laboratories provide a unique opportunity to explore large-scale coordination and to test how various centralized and distributed technologies can be implemented. National laboratories’ intermediate scales may be optimal for the development of self-driving systems to manage both academically and industrially relevant data provenance and metadata—a challenge that transcends fields and informs how disparate industrial sectors must adapt to embrace self-driving workflows.

Centralized, distributed, and hybrid approaches seek to keep SDLs open to researchers regardless of background or financial means and ensure that an enclave of privileged facilities do not have sole access to SDLs—a configuration that would worsen disparity between laboratories for funding and publication and squander the potential of an SDL-assisted research paradigm. Despite these varied approaches, the challenge remains of how to ideally balance priorities between advanced, communal automation technologies and networks of specialized platforms. The optimal strategy for SDL deployment must consider how the initial investments (the barriers to entry) are overcome, how logistics and legal concerns are managed, and how the workforce is prepared to engage with SDLs in industry and academia.

In terms of financing and staffing, centralized facilities may be more attractive to industry and national investors, as funding a single meta-project helps guard against splintering and wasted/redundant effort, helps maintain long-term collaboration, and provides more stability (less risk) than an individual research group65. As the costs of commercial automation units and software decline, distributed SDLs become more feasible. Smaller, designer SDLs are in turn likely to attract local businesses and universities. The proximity and flexibility of distributed SDLs can facilitate rapid collaboration on novel and cutting-edge research for a given scientific or industrial niche. Conversely, a centralized facility may have too much inertia to rapidly address changing needs or may struggle to justify providing highly specialized equipment only a handful of users ever use.

The maintenance and sustainability of the SDL ecosystem is also dependent on good management. Team science management, regardless of approach, requires the coordination of data and experiments in a manner that is both robust, equitable, and accountable66,67. The distributed approach, with its more liquid boundaries, would require considerably more coordination, making maintaining the digital aspect (digital twins68 and datasets) more difficult69. The centralized approach raises ethical questions of how projects are selected, time and resources allocated, and experiments managed. For credit in both approaches, the SDLs could be given identifiers (cf. ORCID) for data provenance and attribution in manuscripts—though the task of acknowledging the personnel behind each SDL at the time of publication remains a logistical challenge. Unlike high-performance and distributed computing (a close analogue to future SDLs), SDLs consume and produce material; the rights to which and whose purchasing may be challenged by funding agreements.

In democratized science, data must be generated safely, ethically, and legally and the quality of the data must be made trustworthy70. With respect to local and national regulations for hazardous materials, dangerous processes, and sensitive data, a centralized facility may have an easier time being regulation certified, but also must acquire more/higher-grade certifications; whereas a smaller SDL in a distributed network can only apply for the certifications it needs but may struggle to acquire all the engineering controls required to meet the certification. Concerning data quality, any SDL would require routine testing and quality control in order to maintain public trust in addition to study-specific benchmarking and control experiments to ensure the validity of results for new or novel materials and processes. In the centralized approach, a consortium of key facilities could develop these protocols and use them as the standard; whereas in the distributed approach, more effort would be required to create robust future- and site-proof standards for interoperability, shareability, and reproducibility with periodic checkups to remain a trusted member of the network of collaborators. While overall maintenance and standardization may be easier for centralized facilities, it is worth noting that the research groups using these facilities may likely find the established protocols and standards limiting in what research can be conducted.

SDLs must engage with and support their collaborators: people40. Any proposed education strategy implemented should reinforce itself to ensure the sustainability of the SDL ecosystem. In the centralized paradigm, key facilities create centers of learning and can act as educational institutions to provide intense and fulfilling educations for participants. In the distributed paradigm, having more diffuse facilities can more effectively provide access to a geographically diverse cohort of future SDL researchers and may have a larger overall capacity—increasing the number and diversity of future-researchers benefited. Furthermore, low-cost and do-it-yourself SDL technologies can serve as educational tools for burgeoning researchers (e.g., “frugal twin” platforms61, Educational ARES71,72, and Legolas73). Ultimately, SDLs are in service of people, and how users are trained in and engage with these powerful research tools must be thoughtfully considered.

In both paradigms, the goals to increase throughput and reduce cost must be balanced against the quality of the data. Experimental fidelity can greatly impact the number of experiments required to arrive at a solution50,74, and further work is required to determine the optimal tradeoffs between these design goals. Any low-cost system would need rigorous reproducibility analysis to be of value to SDL-assisted research. The relationship between setup and operating costs and data fidelity will evolve with the advancement of automated laboratory technologies and in turn modify the optimal balance between centralized and distributed SDL research.

Table 1 Summary comparison of centralized and distributed self-driving laboratory (SDL) paradigms

In summary, centralized approaches pool resources to achieve more technologically advanced SDLs but face challenges in generalizability while distributed approaches provide flexibility but require greater coordination (Table 1). As SDL-related technologies evolve, however, many of the capital and operating expenses (for the creators, managers, and users of SDLs) will change.

Roadblocks on the path to future SDLs

SDLs have the potential to expand (or restrict, if improperly managed) who is afforded the opportunity to do research. The same positive feedback loop that stands to accelerate SDL proliferation also forestalls their widespread use (Fig. 3). Individual SDLs are large projects, and current demonstrations of SDLs are limited by hardware and software capabilities or are deployed conservatively to reduce risk. As a result, there are few exemplar systems that are industrially relevant enough to attract widespread funding, and funding is required for high-impact demonstrations. Support from outside this cycle is needed to kickstart SDL proliferation, and efforts made to improve the power, generalizability, and accessibility of both the physical and digital aspects of SDLs are necessary.

Fig. 3: The cycle of challenges forestalling the advancement of self-driving laboratory (SDL) technologies.
figure 3

Incomplete integration of artificial intelligence (AI), hardware, and software prevents the creation of agile experimental systems and the complexity of these systems renders SDLs into black-box systems, which can struggle to balance all the requests and questions given to them. Questions and requests to different aspects of SDLs receive unclear answers and this incomplete information results in difficulties collaborating with other scientists and industry partners, which in turn impacts funding and support. A lack of cohesive support challenges the advancement of human–AI–robot collaboration and the improvement of SDL technologies, starting the cycle anew.

In this section, we review the major roadblocks to SDL proliferation and to the subsequent democratization of automated research. The discussion will focus initially on development of SDLs and their workforce, then into laboratory- and community-level challenges and opportunities for advancement.

In the workforce

The transition from conventional to self-driving research in the materials and chemical space will require developing a specialized, yet multi-faceted, workforce. The status quo of research favors collaborations between others in a closely related field or between members of the same research campus. Unfortunately, the applications where SDLs are most promising also require the greatest diversity and mass of knowledge.

The needed workforce is not, however, a monolithic body: Developers combine hardware, software, processing, and materials innovations to realize new self-driving systems; technicians maintain and tune such systems; and users interact with self-driving research systems through the digital world by selecting hypotheses, guiding learning, and analyzing data. While these may be distinct roles, individuals may move between these roles throughout the course of their careers and as their research needs evolve. Moreover, these roles will require differing levels of expertise within and between fields as well as collaboration skills. As complex, intersectional engineered systems, SDLs will need a healthy distribution of actors to be successful.

Developers of self-driving experimentation systems must combine the expertises of their experimental domain with automation. They must know the subtleties and pitfalls of various laboratory and research techniques in their discipline and be skilled in automation to reify these techniques with programmatic control. In contrast, traditional academic departments are fairly siloed with robotics being separate from chemistry or materials science; and learning to be a developer generally requires practice and dedicated training or self-teaching to fill the educational gaps. The need for practice systems can be ameliorated with low-cost systems that have been specifically designed as pedagogical tools73,75,76,77. The development and dissemination of such systems is an important part of efficiently training SDL developers.

The challenges associated with maintaining SDLs mirror the challenges in developing them. Technicians may be required to monitor processing signals to ensure smooth operation, enhance performance through collaboration78, intervene to repair or recalibrate the system when needed, and restock supplies. Technicians require less depth of expertise in the underlying science and robotics optimization than developers. As such, individuals with some experience with physical experiments (whether vocational or through undergraduate training) and instrument-specific training will be able to perform this role. Here, micro-certifications and online training are appropriate for delivering the specific knowledge needed to operate the relevant systems. Both the academy and industry should be involved in the development of these resources.

A user’s two main responsibilities are to (1) formulate good scientific hypotheses for the SDLs to explore and (2) (virtually) oversee the learning process to make adjustments as needed78. While the former can be thought of as a goal of doctoral education (implying most first-generation users will be domain experts), in contrast to a developer, this domain expertise need not include automation. For proficiency in the latter task, users will have to practice overseeing an SDL—requiring facile tools that allow for this experience and pedagogical resources that define the types of situations that can be experienced (and how to resolve them)79.

While the collaboration between industry and academia is necessary for the advancement of SDL technologies, there is a conflicting argument between what each group seeks to gain from the advancement (cf. Fig. 3, where each circle corresponds roughly to a different role). A user seeking to apply the SDL to solve a scientific problem may desire a more vertically integrated SDL technology, which packages hardware control, data-management, and experimental planning together (such as Atinary or a lab-as-a-service provider52,80,81). A developer seeking to advance laboratory capabilities may desire the ability to combine, adapt, and create new modules—favoring horizontal integration. And, presently, there is a large overlap between technicians and both the developer and user roles—introducing a bias in the user experience toward having expert knowledge and permitting band-aid solutions in design. A party more interested in workforce development, education, or mitigating job displacement82,83 may desire in-house development and implementation as a form of training. We encourage the community to thoughtfully consider these cases and make which is their focus explicit in their publications. This way there can be more equitable advancements of the technology (theoretical and demonstrated) and the workforce.

Successful efforts in a collaborative SDL will require team science to communicate with diverse researchers and people who may not have shared research experience, vocabulary, or knowledge base. Large groups will need people skilled in team science who can facilitate the development and success of the team—a role currently not prepared by academia; though there are some programs such as the “Growing convergence research” program at the National Science Foundation (NSF)84 that are pushing this area. A consequence of this lack of a teaching culture is that teamwork is most often learned “on the job”. While collaboration is a lifelong learning objective; it has yet to be determined how team science should best be integrated into educational programs. Playing to the strengths of a diverse community, we need a multipronged strategy to address the training of academic and industrial researchers. Universities should engage with industry partners to integrate team and automated sciences into the curriculum—ideally these programs should overlap with the micro-credential programs for upskilling existing workers.

In the lab

The integration and automation of advanced data-science strategies for the analysis of results and the proposition of new experiments is a key feature which separates SDLs from prior work in high-throughput experimentation. Despite efforts to democratize research automation technologies, their transferability in practice can be limited by the tension between generality (being applicable to any hardware system) and specificity (taking full advantage of the nuances and capabilities of a given instrument)85. Similarly, machine learning (ML) for materials and molecular discovery struggles to be universal and is currently used primarily to simulate or plan experiments and analyze data.

Laboratory automation is a powerful tool that bridges the digital and physical worlds of SDLs, enabling active learning. Over the past few years, proof-of-concept automated setups have been demonstrated for many upstream86,87 and downstream42,88,89,90,91 steps in the traditional materials science research workflow, as well as more holistic SDLs3,52,55. The integration of disparate components into a single, usable platform, even in a modular fashion, requires establishing engineering controls, networking systems, defining data-collection and management structures, and providing human access points for troubleshooting and collaboration.

Developing automation tools and integrating them into SDLs relies heavily on access to application programming interfaces (APIs) and the documentation of both these APIs and the tools’ capabilities92. Few APIs are readily provided or supported by manufacturers and many are poorly documented or come with restrictive licensing agreements—resulting in individual groups producing redundant, rarely universal, solutions. In addition, many vendor-provided APIs may transmit data in opaque data formats—curtailing active learning. For commercial equipment, preferential purchasing of solutions that provide native programmatic control with high-quality APIs and documentation (as well as vendor technical support and interoperable data) is a good start. Due to the size of the research market, however, vendors may not foresee sufficient economic returns in recapitalizing their product lines, especially when contrasted with closed ecosystem approaches that are potentially more profitable. As fields emerging into automation, chemical and materials research impose heavy demands on system robustness (e.g., temperature, pressure, and chemical compatibility) and specialization (e.g., flexibility and cutting-edge applications); meanwhile existing automation suppliers, often begotten from a biological domain, are often focused on general-purpose use, high-throughputs, and industrial scale safety. It is imperative, then, that researchers seek to engage with industry R&D departments to co-develop prototype automation tools—even if these collaborative efforts cannot be promptly released—as by having a seat at the table, gradual improvement to SDL integrations can be achieved.

The technologies the SDL community itself creates must consider accessibility to the broader research community as open-source solutions are less likely to come from industry. Open-source systems that are interchangeable between vendors hinder competitiveness. Academic29, commercial64, and governmental93,94 research efforts into providing open-source and high-quality APIs and SDL development tools show promise in making SDLs more accessible and interoperable. Groups currently developing SDLs should make an effort to reuse (or consult) as much code and data from prior studies as possible, even if it must be modified, and should publicly release their code during publication. While it is often beyond the scope of a single laboratory to develop universal code or experimental techniques, taking the time to analyze code and data reporting decisions can act as an additional form of dialogue between SDL developers. This encourages community involvement, reduces redundant effort, brings people into dialogue, and can help work out the bugs in our standardization efforts. The long-term maintenance95 and cybersecurity of these code libraries as data formats and software evolve, best practices and standards change, and programming language preferences shift is, unlike the popular ML development libraries which are maintained by the technology industry, incompatible with the current funding paradigm96 (see section “In the community”).

Given the effort required to “glue” each hardware and software module of an SDL together, there have been efforts to automate or assist developers as they construct new SDLs. While powerful middleware, protocol, and orchestration tools that aim to address these interoperability issues have been developed (such as ROS, SiLA 2, and BlueSky64), their adoption is limited to groups developing the most complex SDLs, pointing to a broader issue of fragmented technology ecosystems between laboratories. Universal (highly abstract) middleware suites have learning curves that can challenge less experienced groups and can be difficult to quickly deploy for a specific application. Conversely, for the larger projects of more experienced groups, version management and overhead can become a concern when using these frameworks97 and may motivate a group to create its own middleware suite. Consequently, progress by less experienced groups is limited by their lack of awareness or expertise in the technology and progress by more experienced groups is not easily transferable to other groups. In an attempt to address the inaccessibility of designing or using such middleware suits, several groups have turned toward using pre-trained LLMs to generate this “glue” code automatically from API documentation98. The unpredictability in existing LLMs requires special attention be put into engineering operational safeguards99. For the near term, achieving genuine planning with LLMs requires considerable supplementation with more traditional, logic-based verification workflows100.

Machine learning is crucial to realize efficient experimental design and drive active learning in SDLs. While ML for SDLs shares in the challenges of multi-objective and multi-property optimization101,102,103 that exist throughout data science, their coupling to physical platforms and the nontrivial input–output spaces of chemical and materials science presents challenging opportunities to advance artificial scientific intelligence.

In practice, experiments are subject to constraints, be they inherent to the physical system or imposed by humans for safety. Constraints are crucial for knowledge transfer between collaborators and evaluating the transferability of a model between physical systems. Constraints can be encoded by using prior knowledge104 (which may introduce bias into the system) or can be learned during experimentation105 (which can necessitate more data: increasing cost). The relative nature of constraints, however, can hinder generalizability and predictive power—as is notoriously the case with predicting synthesizability106,107,108,109. Similarly, the objectives by which experiments are evaluated need to be quantifiable and measurable within the active learning cycle. For example, future-seeking objectives such as “optimize for sustainability” need to be translated into short-term measurable, each of which comes with its own set of questions (e.g., metric/assay choice, comparison techniques, economic contexts, etc.). Presently, neither human- nor ML-generated decompositions of these objectives seem to suffice, and new ways of combining and translating objectives are required to address complex or unclear connections between objectives (low- and high-level, immediate and far-off) so that collaborators can make targeted progress toward their diverse goals110,111,112. Recent efforts to address the measurement component of this problem113 include the use of proxies92,114 (i.e., the estimation of inaccessible properties of interest by using correlations to more accessible measurables—including the estimation of device-level properties via inexpensive replicas of real-world devices). While proxy measurements increase the scope of research problems an SDL can investigate (and can often reduce costs), they can struggle in extrapolative campaigns (e.g., material discovery) and so require continually maintaining the correlations with new data115.

A final, foundational ML challenge for SDLs is understanding and quantifying uncertainties. Uncertainty provides a crucial touchstone for human collaborators as a rough estimate of how confident116,117 the ML algorithms are and is used to determine which experiments are proposed. Unlike many data science applications where the uncertainties of observations must be estimated, SDLs present an opportunity to directly measure and transform uncertainties between platforms. Current SDL hardware fails to capture most experimental meta-data; and despite manufacturer testing, experimental uncertainties must be measured for the particular chemical or material system being investigated. An SDL could use such information to learn about laboratory praxis and improve its own workflows for materials discovery and process optimization. Less speculatively, uncertainty, calibration, and benchmark studies are required for human researchers and collaborating SDLs to determine their trust in SDL-generated results50. Despite how crucial such studies are, the current suite of tests and the integration of these controls into SDL workflows is lacking.

In the community

Industrial adoption and implementation of SDL technologies are currently limited to a few specific areas, such as biotech, biopharma, and specialty chemicals/materials where companies have applied SDLs in their discovery pipeline118,119,120,121,122,123,124,125. While traditional funding bodies are typically limited in terms of budget and scope of SDL projects research initiatives (making it difficult to fund ambitious, large-scale projects needed for breakthrough demonstrations), industry participation, can be instrumental in the development of large-scale SDL initiatives. While examples have recently emerged in related fields, further de-risking of the technology is needed for a broad market penetration.

Thorough cost-benefit and other techno-economic analyses are typically required to illustrate the potential savings in time and resources that SDLs offer over traditional R&D processes and will help collaborators make the best choice of tool for the challenge at hand. Mature approaches are often viewed as safer investments for addressing new, complex problems; and even as SDL technologies advance, the question of whether to (or the temptation to) use brute force, high-throughput experimentation will remain as advances in one often apply to the other. While transparent analyses of SDLs may help partners overcome their own barriers (viability, security, IP, and competitiveness), external factors such as legislation over whether compounds or processes discovered by SDLs are patentable hover over industry support.

Data and records of prior work are essential to science, and it is no different for SDL-assisted research. Data-management and modeling must be flexible, interoperable, and provide representations of experiments and results126. Whereas the motivations and challenges of FAIR scientific data have been discussed elsewhere127, attempts have been made to organize experimental information in general purpose relational schema128 and in semantic knowledge graphs38,129 as well as efforts to provide provenance tracking130—features which, once mature, will be indispensable for combating dubious data and for general data-management in a distributed paradigm131. While epistemological and ontological frameworks are in development38,132 and can facilitate collaboration across linguistic, domanial, and cultural barriers, they represent yet another complex system which must be integrated into an SDL; as such, the coming generation of these technologies must seek to be easy to deploy and integrate—potentially requiring a standardized interface.

As a consequence of the volume of data produced and the (mostly ad hoc) automation of experimental planning, execution, and analysis, there have been concerns about whether the data generated by an SDL can be trusted133,134. While ad hoc solutions are indispensable during the early stages of a technology’s development, they often result in redundant efforts29,135,136, increased setup times, and suboptimal outcomes and introduce the potential for unreliable experiments. For distributed SDLs, the completeness and thoroughness of platform and results characterization (meta-characterization)50,74,137 must be studied such that collaborators can properly assess literature data and train future users and technicians. Data must encompass not only experimental results (inputs and outputs) but state and environmental information as well (e.g., temperature, relative humidity, and any metadata required to reproduce the results of the ML models—measurables often overlooked by commercial units)70,138,139. With complete details of an experiment, it may become possible to learn the mapping between different hardware architectures (e.g., batch vs. flow) and further the transferability and interoperability of SDL technologies. Until either a critical mass of SDLs is online or the self-reporting of platform metrics is sufficient to assuage investment risks, the lack of trust to justify SDL investment will continue to stymie the growth of SDL technologies.

Many extant SDLs, however, are beholden to their externally funded project’s goals (particularly in academia) and either do not have the bandwidth for external validation studies or must conceal crucial aspects of their workflows to protect their sponsors’ IP (curtailing the cross-validation of their results by other SDLs). Currently, most laboratories can only report calibration, benchmarking, and self-validation data to garner support—with only a fraction of self-driving technologies actually incorporating repeatability or reproducibility metrics. Moreover, there are no standards for performing or enforcing such studies, and incentives are non-existent. Different fields and applications have different ideas of what is considered standard for experimental verification; and for multidisciplinary and multi-application SDLs, these discrepancies make defining a singular standard difficult.

While there is pressure to present research in the most promising format while still being fair and honest, supplemental information needs to better characterize the theory and reality of platforms in operations50 (e.g., its performance in well-defined tasks, devices, capabilities, and interfaces) in order to avoid overpromising and backlash. The definition of (partial) success must be considered and discussed when reporting any metrics about SDL performance with respect to its objectives—e.g., for a “discovery” campaign, is there success in gaining any insights or only when a new compound is created? how do the properties of interest affect the degree of success? and when does the failure of a chemistry count as a failure of the automation? SDL reports should include calibrations, standards, and comparative or benchmarking studies performed so that others can better build off of the reported results. In addition, metrics such as overall operational performance, the degree of human involvement with the workflow, resources used and wastes generated, as well as an open discussion of areas where the platform could be reasonably improved should be reported (cf. ref. 140, ref. 141, and supplemental materials of ref. 3).

These demands represent additional work in the present; however, it is important to establish rich descriptions of SDLs as the precedent for future work. This rigor will help to build and reinforce trust within and, perhaps more crucially, beyond the SDL community. Similarly, by having code and data available (when legal to do so), the communal development of SDLs can be accelerated and consensus can be achieved for best practices and a shared understanding of SDL language developed. Additionally, rigorous reporting on SDLs, especially their shortcomings, establishes norms that not every published SDL needs to be flawless (an admission that helps the groups behind less “spectacular” SDLs to enter the conversation) and facilitates the identification of opportunities for improvement (inviting collaboration for perpetual improvement).

The current structure for funding, publication, and maintenance are insufficient for SDLs to act as democratizing agents for scientific research96. Research funds are mostly allocated toward scientific results, rather than the infrastructure and development needed for SDLs—resulting in bootstrapping and leveraging of funds to build SDLs. The singular focus on the scientific results of interest and publishability can result in overlapping technologies tailored to (even pigeonholed in) a specific application. Instead, proposals should be tailored to drive the diversification of collaborators. Diverse expertise supplies SDLs with the necessary breadth of knowledge required to be built and helps to foster and maintain collaborative relationships within the scientific and industrial communities so that future SDLs can thrive in a democratic and collaborative ecosystem. These cross-disciplinary collaborations will also facilitate the training of individuals in scientific communication and team science. Even when projects are simple, the inclusion of non-experts in SDL-related discussions helps prevent the gradual increase of the SDL skill-floor which often occurs when only experts are allowed to participate in discussions—outside (even naive) opinions help to challenge assumptions and bring in new ideas. Therefore, it will be important that investments be made in SDL infrastructure (both human and machine), with the understanding and expectation that these new tools will lead to new and impactful scientific advances.

Conclusions

In our collective advancement toward a more democratized future of SDL-assisted research, we must focus our efforts. If the role of SDLs should be to enable more scientists to participate in research and to facilitate the act of research to include more diverse and collaborative ideas, then the efficiency, accessibility, and interoperability of autonomous technologies must be improved. This will only be possible by getting a head-start on team-science and incorporating academic, industrial, and national input—building standards and protocols, and opening access to our tools and data. Both Centralized and Distributed approaches will require a consortium to outline living standards for automation, software, and data interfaces; Centralized-leaning technologies will require the initial investment; and Distributed-leaning technologies will require a confluence of low-cost assays and modules with interfaces tailored toward the layperson. Advancement of these thrusts are mutually beneficial towards making both SDLs and research more accessible; and the breadth of their coverage will enable SDLs to assist research in addressing problems of industrial and societal impact. The ultimate goal of this democratization for the increase in participation and ideas to germinate into better and more creative solutions, which could not be envisioned by traditional, insular approaches to research.

While advancements to improve individual components must be made, the identified bottlenecks of SDL-aided research in this perspective article hints that additional work is needed in the prediction and management of scopes and throughputs when an SDL is being designed114,142. Such analysis can also help identify which bottlenecks are the result of capital expenditure and which are the result of fundamental limitation of the technology or technique. The latter invites innovation and can better illuminate the cross-SDL benefit of addressing these limitations—catalyzing the development of new SDLs. In this spirit, some effort should be made into automating the act of automation by creating SDL “installation wizards” which can help in selecting equipment and developing variations of traditional workflows to make the most of the resources available.

An institute for automated laboratory infrastructure could better focus the SDL community’s efforts and provide partners with a meta-project with which to engage and interface. A consortium for SDLs would more readily sustain long-term funding to develop and maintain SDL software infrastructure (as opposed to specific hypothesis-driven research projects) as well as provide pre-competitive, non-proprietary support for academic and industrial researchers. Such a consortium could, as an external entity, attract long-term staffing who could cultivate a set of best practices and engage in developing educational materials (online tutorials, in-person workshops) to train users. By focusing on core SDL issues, the tools developed in part of the consortium would serve both centralized and distributed SDL paradigms.

SDLs provide a means by which to further democratize research. While there are outstanding issues with the technology, they are surmountable; and while the approaches to address these issues vary depending on how centralized or distributed SDLs are implemented, the future of SDLs will encompass both. Addressing these challenges of automation, modeling, data management, collaboration, and training from both angles will help to bring a future of inclusive and accessible research that is flexible and robust against changing paradigms and better fit to address the ever complexifying problems of the world.