The Wayback Machine - https://web.archive.org/web/20210104141812/https://github.com/bazelbuild/rules_python/issues/345
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Py_binary implementation requires changes #345

Open
systemlogic opened this issue Aug 2, 2020 · 8 comments
Open

Py_binary implementation requires changes #345

systemlogic opened this issue Aug 2, 2020 · 8 comments

Comments

@systemlogic
Copy link

@systemlogic systemlogic commented Aug 2, 2020

🚀 feature request

Python binaries are not hermatic

Relevant Rules

python_binary need to be rewritten. I need to install pip packages when I have trying to ship it to the container. Copy of python_binary to any external environment or container should be self-executable,

Description

Current feature force user to install all dependencies in the external environment.

Describe the solution you'd like

py binary rule first download the python source,
compile the it to python package.
pip packages specified in py_library or py_binary rule should go to its sandbox,
copy of py_binary should dump complete environment that includes, python + pip +python source file.

Describe alternatives you've considered

@systemlogic
Copy link
Author

@systemlogic systemlogic commented Aug 2, 2020

@wadejensen
Copy link

@wadejensen wadejensen commented Aug 2, 2020

I'll let the owners chime in, but I believe this is working as intended.
At work we use a combination of https://github.com/dillon-giacoppo/rules_python_external and https://github.com/google/subpar to create self-contained python executables with pip dependencies.

Although I won't make a claim about their hermeticity.

@thundergolfer
Copy link
Collaborator

@thundergolfer thundergolfer commented Aug 3, 2020

python_binary need to be rewritten.

I don't exactly disagree with you, but that's not really actionable feedback.

I need to install pip packages when I have trying to ship it to the container. Copy of python_binary to any external environment or container should be self-executable,

There is some odd behaviour in py_binary around its package dependencies and how those deps are managed as runfiles. How are you "[shipping] it to the container"?

Here's an example BUILD file of how I use the py_binary zip output to ship a self-executable to a container. All that's required is the interpreter.

load("@rules_python//python:defs.bzl", "py_binary", "py_library", "py_test")
load("@io_bazel_rules_docker//container:container.bzl", "container_image", "container_push")
load("@pypi//:requirements.bzl", "requirement")

py_binary(
    name = "producer",
    srcs = glob(["*.py"]),
    deps = [
        requirement("boto3"),
        requirement("click"),
        requirement("goodreads"),
        requirement("loguru"),
    ],
    python_version = "PY3",
)

py_test(
    name = "goodreads_data_collector_test",
    srcs = ["goodreads_data_collector_test.py"),
    deps = [
        ":producer",
    ],
    size="small",
)

filegroup(
  name = "producer_zip",
  srcs=  [":producer"],
  output_group = "python_zip_file",
)

container_image(
    name = "image",
    base = "@python_base//image",
    files = [
        ":producer_zip",
    ],
    entrypoint = [
        "python3",
        "producer.zip",
    ],
    cmd = []
)

container_push(
   name = "push_image",
   image = ":image",
   format = "Docker",
   registry = "index.docker.io",
   # NOTE: The repository must already exist, and when executing this rule can hang for a while not outputting anything.
   #       Expect to wait up to 10 minutes on slower internet connections.
   repository = "thundergolfer/workflows-and-pipelines-producer",
   tag = "dev", # NOTE: Don't do this for non-demo stuff. Stick to immutable tagging/SHA256
   tags = [
       "docker_image",
   ]
)

rules_docker also provides py3_image which helps get Python binaries into containers. Eg. https://github.com/bazelbuild/rules_docker/blob/4dedeca5e17d73d708f5f5acb01551c67ba4fcbb/tests/container/python/BUILD#L26


Regarding "Describe the solution you'd like", we you say "py binary rule first download the python source," do you mean that py_binary should depend on a downloaded (or workspace built) interpreter, and not the system interpreter? If so, agree, and there's issue open about this.

@systemlogic
Copy link
Author

@systemlogic systemlogic commented Aug 26, 2020

Lets say I have created rest.py file. and bazel rule that represent that is rest. When I copy it to container, I have noticed two files files, rest and rest.py.
running a rest file is looking for runfiles directory.

root@307d91a65419:~# ls
rest  rest.py
root@307d91a65419:~# ./rest
Traceback (most recent call last):
  File "./rest", line 349, in <module>
    Main()
  File "./rest", line 272, in Main
    module_space = FindModuleSpace()
  File "./rest", line 111, in FindModuleSpace
    raise AssertionError('Cannot find .runfiles directory for %s' % sys.argv[0])
AssertionError: Cannot find .runfiles directory for ./rest

Also, problem with above example is binary developed and shipped from a python version which may be different from container version. I know we have recently get rid of python 2 completely.
According to my ship able code should go with compiler/interpreter to remove any conflict.

@thundergolfer
Copy link
Collaborator

@thundergolfer thundergolfer commented Aug 31, 2020

running a rest file is looking for runfiles directory.

Yes you can't simply copying the files into a container and have it work. py_binary files are intended to be executed by bazel run. As I've shown in my example posted above, there are ways in Bazel to ship a 'zipapp' style executable.

Also, problem with above example is binary developed and shipped from a python version which may be different from container version.

Yes this is something to watch out for. Are you using rules_docker? If so, you can using Bazel's toolchain and platform constraints features to avoid shipping mismatched Python code and containers.

@thundergolfer
Copy link
Collaborator

@thundergolfer thundergolfer commented Aug 31, 2020

(Also @systemlogic you've posted the same comment I think 15-20 times. Can you remove the duplicates?)

@systemlogic
Copy link
Author

@systemlogic systemlogic commented Nov 12, 2020

@thundergolfer This is what I did with my earlier employer. Since I have moved to a different organization. I thought the idea should be implemented by Bazel.
1). Use of py binary rule first downloads the python source defined in Workspace,
2). compile it to python binary.
3). download and install pip.
4). pip packages specified in py_library or py_binary rule should go to a specified pip binary definition location in bazel-bin
5) use makeself to create self-executable contains python + pip packages + users python source file.
6). self-executable can be copied to non python OS of the same platform.

The option should be given in the WORKSPACE file to use custom jfrog aritfactory pypi location.
It took me 2 months to implement the tooling and works pretty well with complete isolation from other python packages in bazel project.
In the above example, you have shared, you are still using python docker image and using python from the docker image and is not hermatic.

@systemlogic
Copy link
Author

@systemlogic systemlogic commented Nov 13, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.