-
Notifications
You must be signed in to change notification settings - Fork 38
Description
npm i @rdkit/rdkit, future directions
Preparing slides for RDKit UGM 2023 got me thinking more seriously about this project's future directions. While there was not a lot of hands raised when I asked the audience who used the package in the past, the number of web/js talks + hackathon interest (without mentionning npm stats, issues created, and feature requests made on GH) mentionning the use of rdkit-js made it clear that there is a demand in using rdkit on the web, and that rdkit-js is usually the first thing people try to make this a reality.
The following Epic lays out what I think are the anticipated next steps for the project across different themes, which aim to improve various facets of the project: usability, usefulness/functionalities, robustness, and good DX. This issue (or related GH issues that will be logged soon) may evolve, but here are the current themes in no particular order:
- CI
- Initialization
- API documentation strategy
- Minimallib-level functionalities
- Higher-level functionalities, and components
- Local development setup
- Deprecation of open source contributed examples
CI
Context: rdkit-js is a web assembly module (called the MinimalLib) at its core, which exposes functionality of the main RDKit project. Currently, when changes occur in the main rdkit project, there is no guarantee that these changes won't break a downstream build of the rdkit-js library. Solution:
- Add an azure pipeline CI in the GH RDKit/RDKit project that will build the rdkit-js library for each change made in the main rdkit package.
Initialization
Context: Many users complained the library is hard to use in certain contexts. Namely, if you work in a modern development environment such as create react app or similar, you cannot simply import rdkit from "@rdkit/rdkit";
, then use. Why this isn't possible is multi-faceted: 1) the minimallib is not built for >= es6, 2) some environments are "use strict"-enforced, and the minimallib isn't currently compiled with the "use strict" option, 3) the library being wasm based, you need to think about where to publicly serve this module for things to work, and since a lot of users do not have a web background or experiencing managing assets in a web context, this can be a barrier to entry or a reason to simply abandon using the package. Lastly, users have reported wanting mechanisms allowing to import the library offline, since currently the library is fetched on each page load (or if you're not careful, everytime you call initRDKitModule()). Solutions:
- Experiment with building the wasm module with emscripten
es6
,strict
andenvironment
options to see if it solves the import problem in modern environments (and don't break loading the lib in environments it currently works in). If this doesn't solve the problem, investigate compiling an initRDKitModule for each target environment vs a generic one for all environments. - Based on findings, potentially expose more than one wasm module people can choose from depending on their use case. This also ties with potentially having different builds for different use cases (depiction only vs "fullstack" rdkit-js).
- Add a wrapper on the low-level initializer using good defaults, making it so that users don't need to think about serving assets properly (the WASM module and JS file) before starting to use the library.
- Use IndexDB as a caching mechanism of the wasm module, so that subsequent page loads will load the module from disk and not from the network. The version cached should be considered as part of the caching mechanism.
- Rewrite the main README to showcase the way(s) users can start using the library in modern web environments.
Minimallib-level functionalities
Context: There is currently functionalities either 1) compiled but not exposed in the docs, 2) not-compiled but already available as part of the minimallib build, and 3) not-compiled and not available, yet user-requested. Solutions:
- Add documentation for existing functionalities not exposed in the docs: Reaction API.
- Compile functionalities already part of the build as part of the rdkit-js offering: MCS API.
- Work toward gradually adding user-requested features: R-group decomposition, Similarity search, getters/setters for atom and bond properties are not exposed, enable an API allowing to draw "on top" of the 2D depiction images (ref draw polygons #295 and Any quick feedback on rdkit-js? Add it here! And let the community upvote! #150 (comment) ), multi-color substructure highlights, lasso highlight, etc.
Higher-level components
Context: considering we are now able to "install, then use" in various contexts, this lays out the foundation for easy to use Higher-level components both in pure JS and modern frameworks. This should also abstract some important performance considerations to keep in mind when leveraging a wasm module and making a large amount of manipulations on molecules. Solutions:
- Move the rdkit-structure-renderer (or its adaptation) inside the rdkit-js GH repository, which already abstracts performance considerations. Add tests.
- Implement a MoleculeStructure React component leveraging the rdkit-structure-renderer, with tests.
- Move rdkit-js GH repo structure to a multi-npm package, mono repo architecture:
@rdkit/rdkit
(core functionalities)@rdkit/renderer
(@ptosco's adaptation of his rdkit-structure-renderer), and@rdkit/react
(proof of concept of building and maintaining framework specific components) .
API documentation strategy
Context: Examples only documentation (current) is insufficient. Manually maintained API documentation (current) is error prone. We must move towards an API documentation that is as automated as possible. Solutions:
-
Generate the index.d.ts file for the API during the wasm module build (possible via an emcc flag --embind-emit-tsd <name of .d.ts file>). Since this file is complete, but pretty raw, add proper description of each type manually on the first generation, and make it the official types for the library. Then, commit the initial/raw index.d.ts to be able to maintain a diff of the API changes before each release; edit the client-facing index.d.ts according to the changes before a release. Finally, deploy the API documentation via typedoc (already implemented).
-
Include TS definitions for API functionalities sitting on top of the core minimallib API (initializer, components, etc); this may simply imply implementing these functionalities in TypeScript directly.
Local development of rdkit-js.
Context: The above changes will add some complexity for the development of the library. Strategies to overcome it will be laid out once progress is made on segments of this epic. For example, to build higher-level JS functionalities on top of the minimallib, you need to build the right version of the minimallib locally first, and the local tooling to do that easily is not currently in place.
Deprecation of open source contributed examples
Context: Issues regarding examples implementation have stirred the rdkit-js efforts in a direction that does not improve the library itself, and making them mainstream in the library introduces maintenance overhead. Future work and GitHub issues should encourage contributions that will either 1) Make the library easier to use, 2) Add functionality to the library, or 3) Improve existing functionality of the library. Solutions:
- Warn contributors that the examples besides rdkit-js and rdkit-react will be removed from the GH repository, and contributors will have X days to host the examples elsewhere. The rdkit-js project will maintain a list of these initiatives in the README of the project, but won't host/deploy the examples code anymore.
- Proceed to removing examples and adapt README consequently.
- For remaining examples, proceed to adding examples covering missing minimallib functionalities not currently covered: MCS and Reaction APIs.
Relevant links
- Building Emscripten projects https://emscripten.org/docs/compiling/Building-Projects.html#emscripten-linker-output-files
- Emscripten (mostly) link time options https://github.com/emscripten-core/emscripten/blob/main/src/settings.js
- Emscripten compile time options https://emscripten.org/docs/tools_reference/emcc.html#emcc-compiler-optimization-options
- Discussion forum https://github.com/rdkit/rdkit-js/discussions
- GH Issues https://github.com/rdkit/rdkit-js/issues
- Minimallib source code https://github.com/rdkit/rdkit/tree/master/Code/MinimalLib