The Wayback Machine - https://web.archive.org/web/20210104142424/https://github.com/bazelbuild/rules_python/issues/382
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys.path[0] breaks out of runfile tree. #382

Open
dillon-giacoppo opened this issue Nov 19, 2020 · 3 comments
Open

sys.path[0] breaks out of runfile tree. #382

dillon-giacoppo opened this issue Nov 19, 2020 · 3 comments

Comments

@dillon-giacoppo
Copy link
Contributor

@dillon-giacoppo dillon-giacoppo commented Nov 19, 2020

🐞 bug report

Affected Rule

py_binary, py_test.

Is this a regression?

No

Description

When Python initializes it adds the directory of the script to the sys.path:

As initialized upon program startup, the first item of this list, path[0], is the directory containing the script that was used to invoke the Python interpreter. If the script directory is not available (e.g. if the interpreter is invoked interactively or if the script is read from standard input), path[0] is the empty string, which directs Python to search modules in the current directory first. Notice that the script directory is inserted before the entries inserted as a result of PYTHONPATH.

This would imply that a py target such as {REPO_DIR}/bazel-bin/{target_path}/{target}.runfiles/{script_path}/{python_script}.py would cause {REPO_DIR}/bazel-bin/{target_path}/{target}.runfiles/{script_path}/ to be inserted as sys.path[0].

However, because the script is a symlink to the real python script in the repository, Python follows the symlink and appends the actual underlying directory to sys.path[0], in this case: {REPO_DIR}/{script_path}/. This issue was raised in Python's bug tracker and noted as expected behaviour that won't be fixed issue17639.

This allows the script to "break out" of the runfiles tree and import code from the same directory which it does not necessarily have access to via src or deps.

This causes issues when creating multiple targets in the same directory which should not automatically depend on each other.

For example, given:

repo/
  src/
    main.py
    foo.py

main can depend on foo without the file being explicitly listed in the BUILD file.

🔬 Minimal Reproduction

src/BUILD

package(default_visibility = ["//visibility:public"])

py_binary(
    name = "main",
    srcs = ["main.py"],
    python_version = "PY3",
    srcs_version = "PY3",
)

src/main.py

import sys

import foo

print(sys.path[0])

src/foo.py

print("foo imported")

The output of bazel run //src:main will be:

{REPO}/src/
foo imported

Which should not be possible where foo.py is not an input to //src:main.

🌍 Your Environment

Operating System:

  
macOS 11.0.1 (20B29)
  

Output of bazel version:

  
Build label: 3.7.0-homebrew
Build target: bazel-out/darwin-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Sat Nov 14 13:29:41 2020 (1605360581)
Build timestamp: 1605360581
Build timestamp as int: 1605360581
  

Rules_python version:

  
0.1.0
  

Anything else relevant?
The most obvious solution that comes to mind for this would be to copy into the runfile tree rather than symlinking. It seems unlikely this is going to change upstream in Python.

@thundergolfer
Copy link
Collaborator

@thundergolfer thundergolfer commented Nov 19, 2020

Thanks for the report. This same problem is currently documented here bazelbuild/bazel#7091 but I cannot find a counterpart already in this project.

The Python stub template has an example of a 'hack' to fix the problem https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/bazel/rules/python/python_stub_template.txt#L19. But that only fixes the problem in that python process, and the stub template starts the user's process and the problem is reintroduced like you observe.

@person142
Copy link

@person142 person142 commented Nov 20, 2020

@dillon-giacoppo
Copy link
Contributor Author

@dillon-giacoppo dillon-giacoppo commented Nov 22, 2020

@person142, neither -s or -S appears to stop the directory of the script being added to the path, rather, it prevents site wide packages being added to the path (perhaps this should be done anyway or provided as an option). For example, on my machine, it prevents /usr/local/lib/python3.7/site-packages' being added. This should probably be a separate issue ticket.

The problem will still exist because {REPO_DIR}/{script_path}/ is still added at sys.path[0]. It doesn't look like there is an interpreter option to disable this behavior.

There are some potential solutions noted in the bazel issue:

  1. Copy the runfiles instead of symlink.
  2. Invoke the script with -c or -m which does not set sys.path[0].
  3. Potentially reuse the interpreter of the wrapper rather than creating a new one (which already deletes sys.path[0]).

I am unsure which would be the most appropriate in this case. I think 1 might not be trivial because IIUC the runfiles behaviour is part of core Bazel and not rules_python specific.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.