xref: /aosp_15_r20/external/bazelbuild-rules_python/docs/pypi-dependencies.md (revision 60517a1edbc8ecf509223e9af94a7adec7d736b8)
1:::{default-domain} bzl
2:::
3
4# Using dependencies from PyPI
5
6Using PyPI packages (aka "pip install") involves two main steps.
7
81. [Installing third party packages](#installing-third-party-packages)
92. [Using third party packages as dependencies](#using-third-party-packages)
10
11{#installing-third-party-packages}
12## Installing third party packages
13
14### Using bzlmod
15
16To add pip dependencies to your `MODULE.bazel` file, use the `pip.parse`
17extension, and call it to create the central external repo and individual wheel
18external repos. Include in the `MODULE.bazel` the toolchain extension as shown
19in the first bzlmod example above.
20
21```starlark
22pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip")
23pip.parse(
24    hub_name = "my_deps",
25    python_version = "3.11",
26    requirements_lock = "//:requirements_lock_3_11.txt",
27)
28use_repo(pip, "my_deps")
29```
30For more documentation, including how the rules can update/create a requirements
31file, see the bzlmod examples under the {gh-path}`examples` folder or the documentation
32for the {obj}`@rules_python//python/extensions:pip.bzl` extension.
33
34```{note}
35We are using a host-platform compatible toolchain by default to setup pip dependencies.
36During the setup phase, we create some symlinks, which may be inefficient on Windows
37by default. In that case use the following `.bazelrc` options to improve performance if
38you have admin privileges:
39
40    startup --windows_enable_symlinks
41
42This will enable symlinks on Windows and help with bootstrap performance of setting up the
43hermetic host python interpreter on this platform. Linux and OSX users should see no
44difference.
45```
46
47### Using a WORKSPACE file
48
49To add pip dependencies to your `WORKSPACE`, load the `pip_parse` function and
50call it to create the central external repo and individual wheel external repos.
51
52```starlark
53load("@rules_python//python:pip.bzl", "pip_parse")
54
55# Create a central repo that knows about the dependencies needed from
56# requirements_lock.txt.
57pip_parse(
58   name = "my_deps",
59   requirements_lock = "//path/to:requirements_lock.txt",
60)
61# Load the starlark macro, which will define your dependencies.
62load("@my_deps//:requirements.bzl", "install_deps")
63# Call it to define repos for your requirements.
64install_deps()
65```
66
67(vendoring-requirements)=
68#### Vendoring the requirements.bzl file
69
70In some cases you may not want to generate the requirements.bzl file as a repository rule
71while Bazel is fetching dependencies. For example, if you produce a reusable Bazel module
72such as a ruleset, you may want to include the requirements.bzl file rather than make your users
73install the WORKSPACE setup to generate it.
74See https://github.com/bazelbuild/rules_python/issues/608
75
76This is the same workflow as Gazelle, which creates `go_repository` rules with
77[`update-repos`](https://github.com/bazelbuild/bazel-gazelle#update-repos)
78
79To do this, use the "write to source file" pattern documented in
80https://blog.aspect.dev/bazel-can-write-to-the-source-folder
81to put a copy of the generated requirements.bzl into your project.
82Then load the requirements.bzl file directly rather than from the generated repository.
83See the example in rules_python/examples/pip_parse_vendored.
84
85(per-os-arch-requirements)=
86### Requirements for a specific OS/Architecture
87
88In some cases you may need to use different requirements files for different OS, Arch combinations. This is enabled via the `requirements_by_platform` attribute in `pip.parse` extension and the `pip_parse` repository rule. The keys of the dictionary are labels to the file and the values are a list of comma separated target (os, arch) tuples.
89
90For example:
91```starlark
92    # ...
93    requirements_by_platform = {
94        "requirements_linux_x86_64.txt": "linux_x86_64",
95        "requirements_osx.txt": "osx_*",
96        "requirements_linux_exotic.txt": "linux_exotic",
97        "requirements_some_platforms.txt": "linux_aarch64,windows_*",
98    },
99    # For the list of standard platforms that the rules_python has toolchains for, default to
100    # the following requirements file.
101    requirements_lock = "requirements_lock.txt",
102```
103
104In case of duplicate platforms, `rules_python` will raise an error as there has
105to be unambiguous mapping of the requirement files to the (os, arch) tuples.
106
107An alternative way is to use per-OS requirement attributes.
108```starlark
109    # ...
110    requirements_windows = "requirements_windows.txt",
111    requirements_darwin = "requirements_darwin.txt",
112    # For the remaining platforms (which is basically only linux OS), use this file.
113    requirements_lock = "requirements_lock.txt",
114)
115```
116
117### pip rules
118
119Note that since `pip_parse` and `pip.parse` are executed at evaluation time,
120Bazel has no information about the Python toolchain and cannot enforce that the
121interpreter used to invoke `pip` matches the interpreter used to run
122`py_binary` targets. By default, `pip_parse` uses the system command
123`"python3"`. To override this, pass in the `python_interpreter` attribute or
124`python_interpreter_target` attribute to `pip_parse`. The `pip.parse` `bzlmod` extension
125by default uses the hermetic python toolchain for the host platform.
126
127You can have multiple `pip_parse`s in the same workspace, or use the pip
128extension multiple times when using bzlmod. This configuration will create
129multiple external repos that have no relation to one another and may result in
130downloading the same wheels numerous times.
131
132As with any repository rule, if you would like to ensure that `pip_parse` is
133re-executed to pick up a non-hermetic change to your environment (e.g., updating
134your system `python` interpreter), you can force it to re-execute by running
135`bazel sync --only [pip_parse name]`.
136
137{#using-third-party-packages}
138## Using third party packages as dependencies
139
140Each extracted wheel repo contains a `py_library` target representing
141the wheel's contents. There are two ways to access this library. The
142first uses the `requirement()` function defined in the central
143repo's `//:requirements.bzl` file. This function maps a pip package
144name to a label:
145
146```starlark
147load("@my_deps//:requirements.bzl", "requirement")
148
149py_library(
150    name = "mylib",
151    srcs = ["mylib.py"],
152    deps = [
153        ":myotherlib",
154        requirement("some_pip_dep"),
155        requirement("another_pip_dep"),
156    ]
157)
158```
159
160The reason `requirement()` exists is to insulate from
161changes to the underlying repository and label strings. However, those
162labels have become directly used, so aren't able to easily change regardless.
163
164On the other hand, using `requirement()` has several drawbacks; see
165[this issue][requirements-drawbacks] for an enumeration. If you don't
166want to use `requirement()`, you can use the library
167labels directly instead. For `pip_parse`, the labels are of the following form:
168
169```starlark
170@{name}//{package}
171```
172
173Here `name` is the `name` attribute that was passed to `pip_parse` and
174`package` is the pip package name with characters that are illegal in
175Bazel label names (e.g. `-`, `.`) replaced with `_`. If you need to
176update `name` from "old" to "new", then you can run the following
177buildozer command:
178
179```shell
180buildozer 'substitute deps @old//([^/]+) @new//${1}' //...:*
181```
182
183[requirements-drawbacks]: https://github.com/bazelbuild/rules_python/issues/414
184
185### Entry points
186
187If you would like to access [entry points][whl_ep], see the `py_console_script_binary` rule documentation,
188which can help you create a `py_binary` target for a particular console script exposed by a package.
189
190[whl_ep]: https://packaging.python.org/specifications/entry-points/
191
192### 'Extras' dependencies
193
194Any 'extras' specified in the requirements lock file will be automatically added
195as transitive dependencies of the package. In the example above, you'd just put
196`requirement("useful_dep")` or `@pypi//useful_dep`.
197
198### Consuming Wheel Dists Directly
199
200If you need to depend on the wheel dists themselves, for instance, to pass them
201to some other packaging tool, you can get a handle to them with the
202`whl_requirement` macro. For example:
203
204```starlark
205load("@pypi//:requirements.bzl", "whl_requirement")
206
207filegroup(
208    name = "whl_files",
209    data = [
210        # This is equivalent to "@pypi//boto3:whl"
211        whl_requirement("boto3"),
212    ]
213)
214```
215
216### Creating a filegroup of files within a whl
217
218The rule {obj}`whl_filegroup` exists as an easy way to extract the necessary files
219from a whl file without the need to modify the `BUILD.bazel` contents of the
220whl repositories generated via `pip_repository`. Use it similarly to the `filegroup`
221above. See the API docs for more information.
222
223(advance-topics)=
224## Advanced topics
225
226(circular-deps)=
227### Circular dependencies
228
229Sometimes PyPi packages contain dependency cycles -- for instance a particular
230version `sphinx` (this is no longer the case in the latest version as of
2312024-06-02) depends on `sphinxcontrib-serializinghtml`. When using them as
232`requirement()`s, ala
233
234```
235py_binary(
236    name = "doctool",
237    ...
238    deps = [
239        requirement("sphinx"),
240    ],
241)
242```
243
244Bazel will protest because it doesn't support cycles in the build graph --
245
246```
247ERROR: .../external/pypi_sphinxcontrib_serializinghtml/BUILD.bazel:44:6: in alias rule @pypi_sphinxcontrib_serializinghtml//:pkg: cycle in dependency graph:
248    //:doctool (...)
249    @pypi//sphinxcontrib_serializinghtml:pkg (...)
250.-> @pypi_sphinxcontrib_serializinghtml//:pkg (...)
251|   @pypi_sphinxcontrib_serializinghtml//:_pkg (...)
252|   @pypi_sphinx//:pkg (...)
253|   @pypi_sphinx//:_pkg (...)
254`-- @pypi_sphinxcontrib_serializinghtml//:pkg (...)
255```
256
257The `experimental_requirement_cycles` argument allows you to work around these
258issues by specifying groups of packages which form cycles. `pip_parse` will
259transparently fix the cycles for you and provide the cyclic dependencies
260simultaneously.
261
262```starlark
263pip_parse(
264    ...
265    experimental_requirement_cycles = {
266        "sphinx": [
267            "sphinx",
268            "sphinxcontrib-serializinghtml",
269        ]
270    },
271)
272```
273
274`pip_parse` supports fixing multiple cycles simultaneously, however cycles must
275be distinct. `apache-airflow` for instance has dependency cycles with a number
276of its optional dependencies, which means those optional dependencies must all
277be a part of the `airflow` cycle. For instance --
278
279```starlark
280pip_parse(
281    ...
282    experimental_requirement_cycles = {
283        "airflow": [
284            "apache-airflow",
285            "apache-airflow-providers-common-sql",
286            "apache-airflow-providers-postgres",
287            "apache-airflow-providers-sqlite",
288        ]
289    }
290)
291```
292
293Alternatively, one could resolve the cycle by removing one leg of it.
294
295For example while `apache-airflow-providers-sqlite` is "baked into" the Airflow
296package, `apache-airflow-providers-postgres` is not and is an optional feature.
297Rather than listing `apache-airflow[postgres]` in your `requirements.txt` which
298would expose a cycle via the extra, one could either _manually_ depend on
299`apache-airflow` and `apache-airflow-providers-postgres` separately as
300requirements. Bazel rules which need only `apache-airflow` can take it as a
301dependency, and rules which explicitly want to mix in
302`apache-airflow-providers-postgres` now can.
303
304Alternatively, one could use `rules_python`'s patching features to remove one
305leg of the dependency manually. For instance by making
306`apache-airflow-providers-postgres` not explicitly depend on `apache-airflow` or
307perhaps `apache-airflow-providers-common-sql`.
308
309
310(bazel-downloader)=
311### Bazel downloader and multi-platform wheel hub repository.
312
313The `bzlmod` `pip.parse` call supports pulling information from `PyPI` (or a
314compatible mirror) and it will ensure that the [bazel
315downloader][bazel_downloader] is used for downloading the wheels. This allows
316the users to use the [credential helper](#credential-helper) to authenticate
317with the mirror and it also ensures that the distribution downloads are cached.
318It also avoids using `pip` altogether and results in much faster dependency
319fetching.
320
321This can be enabled by `experimental_index_url` and related flags as shown in
322the {gh-path}`examples/bzlmod/MODULE.bazel` example.
323
324When using this feature during the `pip` extension evaluation you will see the accessed indexes similar to below:
325```console
326Loading: 0 packages loaded
327    currently loading: docs/
328    Fetching module extension pip in @@//python/extensions:pip.bzl; starting
329    Fetching https://pypi.org/simple/twine/
330```
331
332This does not mean that `rules_python` is fetching the wheels eagerly, but it
333rather means that it is calling the PyPI server to get the Simple API response
334to get the list of all available source and wheel distributions. Once it has
335got all of the available distributions, it will select the right ones depending
336on the `sha256` values in your `requirements_lock.txt` file. The compatible
337distribution URLs will be then written to the `MODULE.bazel.lock` file. Currently
338users wishing to use the lock file with `rules_python` with this feature have
339to set an environment variable `RULES_PYTHON_OS_ARCH_LOCK_FILE=0` which will
340become default in the next release.
341
342Fetching the distribution information from the PyPI allows `rules_python` to
343know which `whl` should be used on which target platform and it will determine
344that by parsing the `whl` filename based on [PEP600], [PEP656] standards. This
345allows the user to configure the behaviour by using the following publicly
346available flags:
347* {obj}`--@rules_python//python/config_settings:py_linux_libc` for selecting the Linux libc variant.
348* {obj}`--@rules_python//python/config_settings:pip_whl` for selecting `whl` distribution preference.
349* {obj}`--@rules_python//python/config_settings:pip_whl_osx_arch` for selecting MacOS wheel preference.
350* {obj}`--@rules_python//python/config_settings:pip_whl_glibc_version` for selecting the GLIBC version compatibility.
351* {obj}`--@rules_python//python/config_settings:pip_whl_muslc_version` for selecting the musl version compatibility.
352* {obj}`--@rules_python//python/config_settings:pip_whl_osx_version` for selecting MacOS version compatibility.
353
354[bazel_downloader]: https://bazel.build/rules/lib/builtins/repository_ctx#download
355[pep600]: https://peps.python.org/pep-0600/
356[pep656]: https://peps.python.org/pep-0656/
357
358(credential-helper)=
359### Credential Helper
360
361The "use Bazel downloader for python wheels" experimental feature includes support for the Bazel
362[Credential Helper][cred-helper-design].
363
364Your python artifact registry may provide a credential helper for you. Refer to your index's docs
365to see if one is provided.
366
367See the [Credential Helper Spec][cred-helper-spec] for details.
368
369[cred-helper-design]: https://github.com/bazelbuild/proposals/blob/main/designs/2022-06-07-bazel-credential-helpers.md
370[cred-helper-spec]: https://github.com/EngFlow/credential-helper-spec/blob/main/spec.md
371
372
373#### Basic Example:
374
375The simplest form of a credential helper is a bash script that accepts an arg and spits out JSON to
376stdout. For a service like Google Artifact Registry that uses ['Basic' HTTP Auth][rfc7617] and does
377not provide a credential helper that conforms to the [spec][cred-helper-spec], the script might
378look like:
379
380```bash
381#!/bin/bash
382# cred_helper.sh
383ARG=$1  # but we don't do anything with it as it's always "get"
384
385# formatting is optional
386echo '{'
387echo '  "headers": {'
388echo '    "Authorization": ["Basic dGVzdDoxMjPCow=="]'
389echo '  }'
390echo '}'
391```
392
393Configure Bazel to use this credential helper for your python index `example.com`:
394
395```
396# .bazelrc
397build --credential_helper=example.com=/full/path/to/cred_helper.sh
398```
399
400Bazel will call this file like `cred_helper.sh get` and use the returned JSON to inject headers
401into whatever HTTP(S) request it performs against `example.com`.
402
403[rfc7617]: https://datatracker.ietf.org/doc/html/rfc7617
404