The Wayback Machine - https://web.archive.org/web/20210104141814/https://github.com/bazelbuild/rules_python/issues/340
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: poetry_install rule #340

Open
alexeagle opened this issue Jul 14, 2020 · 5 comments
Open

RFC: poetry_install rule #340

alexeagle opened this issue Jul 14, 2020 · 5 comments

Comments

@alexeagle
Copy link
Collaborator

@alexeagle alexeagle commented Jul 14, 2020

I've written a poetry_install rule and would like to upstream.

High-level design

From my reading of https://github.com/soniaai/rules_poetry it runs pip to download and install individual wheels, and I don't understand why you would do it that way.

My proposal is that in a repository rule,

  • figure out the path to the python interpreter
  • http_archive to fetch poetry distro like https://github.com/python-poetry/poetry/releases/download/1.0.9/poetry-1.0.9-darwin.tar.gz
  • stamp out a 10 line launcher similar to https://github.com/python-poetry/poetry/blob/1.0.9/get-poetry.py#L200-L218 so we don't have to "install" poetry, merely invoke it
  • create the new repository directory and copy/symlink pyproject.toml and poetry.lock there
  • simply run python poetry_bin.py install --no-interaction --no-ansi in that working directory, with "POETRY_VIRTUALENVS_IN_PROJECT": "true", in the env
  • now you get `[output_base]/external/some_dir_deps/.venv/.../site-packages with all the stuff poetry gave you
  • run a generate_build_files.py to populate the directory with py_library targets. This program always runs after poetry so you can look at whatever you like on disk to mirror the dependency graph and sources into BUILD files

It's simple and does exactly the same thing poetry install does outside of Bazel. Here's the prototype code: https://gist.github.com/alexeagle/9fd6684e9306cf741f246dd3518a48ec

How it should look for users

In WORKSPACE you should:

load("@rules_python//poetry:install.bzl", "poetry_install")

poetry_install(
    # free for users to select name
    name = "some_dir_deps",
    lockfile = "//some/dir:poetry.lock",
    pyproject = "//some/dir:pyproject.toml",
    # optionally tell what interpreter binary to use, otherwise it should default to
    # whatever `toolchain_type = "@bazel_tools//tools/python:toolchain_type"` is in `register_toolchains`
    python_interpreter = "@python_interpreter//:python_bin",
)

(it has no transitive dependencies)

That should create a @some_dir_deps repository with these labels:

  • @some_dir_deps//:all - a convenience py_library exposing all the dependencies, useful since there is no BUILD file generator and it matches existing semantics so it makes migration easier
  • @some_dir_deps//pkg - a py_library providing the pkg package

TODO: I need to figure out how "extras" should work - I think that's the reason pip makes you load a "requirements" helper function?

So you'd use it in your BUILD file:

    py_library(
        ...
        deps = ["@some_dir_deps//statsd"],
    )

(no load statement is required)

Here's my plan:

  • pick where it will go. The README already indicates that "The packaging rules (pip_import, etc.) are less stable. We may make breaking changes as they evolve." so my preference is a new top-level poetry folder
  • figure out the story for extras, native package dependencies
  • make sure the install time is only once. For example on a CI setup we need to make sure --repository_cache is shared among workers (maybe requires better documentation on bazel.build)
  • add a first version with bazel-integration-test'ed example directory
@groodt
Copy link
Contributor

@groodt groodt commented Jul 17, 2020

I'm not a contributor, only a user of rules_python and rules_python_external.

My question(s): Why poetry and why this repo?

I'd personally prefer to see the existing rules improved for standard pip first, instead of the secondary tools like pipenv or poetry. There is good work in: https://github.com/dillon-giacoppo/rules_python_external that I would personally like to see upstreamed here instead.

If there is a desire for a "lockfile", it is perfectly valid to use a requirements.txt with all the hashes, produced by pip-tools.
See: https://hynek.me/articles/python-app-deps-2018/#pip-tools--everything-old-is-new-again

poetry and pipenv etc. all do end up using pip under the covers anyway and with the new dependency resolver coming to pip, I'm wondering if there won't be a flight back to simple old pip.

I'm not saying I'd be against poetry (I've looked deeply into it too), rather that I'd like to see consistent treatment of packaging tools. If the contributions you are proposing can't be upstreamed to rules_poetry, why should they be added to the official rules over others?

Of course, I'd be generally supportive of seeing rules_python centralise support for all of pip-tools, poetry or pipenv so the community isn't fragmented, but equally, these rules still carry some Google baggage, so the general push has been to create external rules.

@alexeagle
Copy link
Collaborator Author

@alexeagle alexeagle commented Jul 17, 2020

IMO we should fix the Google baggage here, and make this a monorepo that houses as much high-quality, self-consistent stuff as the maintainer community has time to properly support. So yes to pip, poetry, pipenv, and other package managers, provided of course that we minimize the surface area to what's strictly needed.

My understanding is that poetry is the only package manager that pins your transitive dependencies. It's not sufficient to just give the hashes of direct dependencies - you can still get a non-reproducible build if a newer version of a transitive dep is published and satisfies the semver constraints of the direct dependencies. If there isn't a file exhaustively listing the entire set of transitive dependencies, then it's not a sufficient guarantee IIUC.

Anyway even if poetry were feature-compatible with other package managers, I think we should make the Bazel onramp less steep by supporting all common existing codebases including their dependency manifest files.

@groodt
Copy link
Contributor

@groodt groodt commented Jul 17, 2020

IMO we should fix the Google baggage here, and make this a monorepo that houses as much high-quality, self-consistent stuff as the maintainer community has time to properly support. So yes to pip, poetry, pipenv, and other package managers, provided of course that we minimize the surface area to what's strictly needed.

Yes, there have been recent movements which finally start to open this as a possibility. rules_python is a bit different to the other language rules due to the history of how it is used in core bazelbuild/bazel. It isn't completely standalone like other language rules unfortunately. Hence the core vs packaging split. Like I said, I too would probably prefer if there was a single set of "official" Python language rules for the language and all ecosystem packaging. I also understand the viewpoint of having distinct rules given that Bazel is so easily extensible with rules. However, I would prefer to see the "language native" tooling (pip) prioritised to the highest standard over the "secondary tools" like pipenv and poetry, particularly with the new resolver going into beta with pip 20.2

My understanding is that poetry is the only package manager that pins your transitive dependencies.

Probably off-topic to your RFC, but this is a common misconception. A fairly common way of "locking" a requirements.txt file is to use python -m piptools compile --generate-hashes requirements.in which produces a fully hashed/locked transitive closure. Before pipenv and poetry existed, all one needed was venv, pip-tools and pip. Now one can also use pipenv lock -r or poetry export to get a requirements.txt with the transitive dependency hashes.

Additional thought:

If it is decided to consolidate support for the various combinations of Python dependency management, package management and project management tools and specifications that are used in the ecosystem such as pip, pipenv, poetry, Pipfile, pyproject.yaml, requirements.txt etc. There would probably also need to be some criteria and documented levels of support for accepting the tool or contribution. For a hypothetical example would conda be accepted? I'd be wary of the rules being pulled into the quagmire that Python packaging/dependency management is often renowned for.

Finally:

Good discussion. I'm only an interested party. Will be good to hear what the maintainers and others think.

@alexeagle
Copy link
Collaborator Author

@alexeagle alexeagle commented Jul 19, 2020

Probably off-topic to your RFC, but this is a common misconception. A fairly common way of "locking" a requirements.txt file is to use python -m piptools compile --generate-hashes requirements.in which produces a fully hashed/locked transitive closure. Before pipenv and poetry existed, all one needed was venv, pip-tools and pip. Now one can also use pipenv lock -r or poetry export to get a requirements.txt with the transitive dependency hashes.

Ooh this is interesting to clear up! But pip-tools isn't part of the python distribution right? You'd still need to get that on your machine as a bootstrap step. Also it's not maintained by python core so it doesn't seem obviously more canonical than pipenv or poetry, it just predates them right?

@thundergolfer
Copy link
Collaborator

@thundergolfer thundergolfer commented Jul 20, 2020

pipenv actually vendored a bunch of the pip-tools code I think. So yeah it pre-dates.

We used to use Pipenv with Bazel but abandoned it in favour of pip-tools because the former's locking behaviour wasn't good enough for us.

So far I like that pip-tools does one thing, transitive locking, and does it well. It also means we keep our locked requirements 'out in the open' outside of Bazel's internals, which is nice when managing platform-dependent behaviour in Python's packaging ecosystem.

Platform-dependent behaviour is my biggest reservation with the above. Delegating the locking to Poetry means you have low visibility on any platform inconsistency that arises. With pip-tools with have CI checks on the lock file that can detect platform inconsistency early.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.