1:::{default-domain} bzl 2::: 3 4# Using dependencies from PyPI 5 6Using PyPI packages (aka "pip install") involves two main steps. 7 81. [Installing third party packages](#installing-third-party-packages) 92. [Using third party packages as dependencies](#using-third-party-packages) 10 11{#installing-third-party-packages} 12## Installing third party packages 13 14### Using bzlmod 15 16To add pip dependencies to your `MODULE.bazel` file, use the `pip.parse` 17extension, and call it to create the central external repo and individual wheel 18external repos. Include in the `MODULE.bazel` the toolchain extension as shown 19in the first bzlmod example above. 20 21```starlark 22pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip") 23pip.parse( 24 hub_name = "my_deps", 25 python_version = "3.11", 26 requirements_lock = "//:requirements_lock_3_11.txt", 27) 28use_repo(pip, "my_deps") 29``` 30For more documentation, including how the rules can update/create a requirements 31file, see the bzlmod examples under the {gh-path}`examples` folder or the documentation 32for the {obj}`@rules_python//python/extensions:pip.bzl` extension. 33 34```{note} 35We are using a host-platform compatible toolchain by default to setup pip dependencies. 36During the setup phase, we create some symlinks, which may be inefficient on Windows 37by default. In that case use the following `.bazelrc` options to improve performance if 38you have admin privileges: 39 40 startup --windows_enable_symlinks 41 42This will enable symlinks on Windows and help with bootstrap performance of setting up the 43hermetic host python interpreter on this platform. Linux and OSX users should see no 44difference. 45``` 46 47### Using a WORKSPACE file 48 49To add pip dependencies to your `WORKSPACE`, load the `pip_parse` function and 50call it to create the central external repo and individual wheel external repos. 51 52```starlark 53load("@rules_python//python:pip.bzl", "pip_parse") 54 55# Create a central repo that knows about the dependencies needed from 56# requirements_lock.txt. 57pip_parse( 58 name = "my_deps", 59 requirements_lock = "//path/to:requirements_lock.txt", 60) 61# Load the starlark macro, which will define your dependencies. 62load("@my_deps//:requirements.bzl", "install_deps") 63# Call it to define repos for your requirements. 64install_deps() 65``` 66 67(vendoring-requirements)= 68#### Vendoring the requirements.bzl file 69 70In some cases you may not want to generate the requirements.bzl file as a repository rule 71while Bazel is fetching dependencies. For example, if you produce a reusable Bazel module 72such as a ruleset, you may want to include the requirements.bzl file rather than make your users 73install the WORKSPACE setup to generate it. 74See https://github.com/bazelbuild/rules_python/issues/608 75 76This is the same workflow as Gazelle, which creates `go_repository` rules with 77[`update-repos`](https://github.com/bazelbuild/bazel-gazelle#update-repos) 78 79To do this, use the "write to source file" pattern documented in 80https://blog.aspect.dev/bazel-can-write-to-the-source-folder 81to put a copy of the generated requirements.bzl into your project. 82Then load the requirements.bzl file directly rather than from the generated repository. 83See the example in rules_python/examples/pip_parse_vendored. 84 85(per-os-arch-requirements)= 86### Requirements for a specific OS/Architecture 87 88In some cases you may need to use different requirements files for different OS, Arch combinations. This is enabled via the `requirements_by_platform` attribute in `pip.parse` extension and the `pip_parse` repository rule. The keys of the dictionary are labels to the file and the values are a list of comma separated target (os, arch) tuples. 89 90For example: 91```starlark 92 # ... 93 requirements_by_platform = { 94 "requirements_linux_x86_64.txt": "linux_x86_64", 95 "requirements_osx.txt": "osx_*", 96 "requirements_linux_exotic.txt": "linux_exotic", 97 "requirements_some_platforms.txt": "linux_aarch64,windows_*", 98 }, 99 # For the list of standard platforms that the rules_python has toolchains for, default to 100 # the following requirements file. 101 requirements_lock = "requirements_lock.txt", 102``` 103 104In case of duplicate platforms, `rules_python` will raise an error as there has 105to be unambiguous mapping of the requirement files to the (os, arch) tuples. 106 107An alternative way is to use per-OS requirement attributes. 108```starlark 109 # ... 110 requirements_windows = "requirements_windows.txt", 111 requirements_darwin = "requirements_darwin.txt", 112 # For the remaining platforms (which is basically only linux OS), use this file. 113 requirements_lock = "requirements_lock.txt", 114) 115``` 116 117### pip rules 118 119Note that since `pip_parse` and `pip.parse` are executed at evaluation time, 120Bazel has no information about the Python toolchain and cannot enforce that the 121interpreter used to invoke `pip` matches the interpreter used to run 122`py_binary` targets. By default, `pip_parse` uses the system command 123`"python3"`. To override this, pass in the `python_interpreter` attribute or 124`python_interpreter_target` attribute to `pip_parse`. The `pip.parse` `bzlmod` extension 125by default uses the hermetic python toolchain for the host platform. 126 127You can have multiple `pip_parse`s in the same workspace, or use the pip 128extension multiple times when using bzlmod. This configuration will create 129multiple external repos that have no relation to one another and may result in 130downloading the same wheels numerous times. 131 132As with any repository rule, if you would like to ensure that `pip_parse` is 133re-executed to pick up a non-hermetic change to your environment (e.g., updating 134your system `python` interpreter), you can force it to re-execute by running 135`bazel sync --only [pip_parse name]`. 136 137{#using-third-party-packages} 138## Using third party packages as dependencies 139 140Each extracted wheel repo contains a `py_library` target representing 141the wheel's contents. There are two ways to access this library. The 142first uses the `requirement()` function defined in the central 143repo's `//:requirements.bzl` file. This function maps a pip package 144name to a label: 145 146```starlark 147load("@my_deps//:requirements.bzl", "requirement") 148 149py_library( 150 name = "mylib", 151 srcs = ["mylib.py"], 152 deps = [ 153 ":myotherlib", 154 requirement("some_pip_dep"), 155 requirement("another_pip_dep"), 156 ] 157) 158``` 159 160The reason `requirement()` exists is to insulate from 161changes to the underlying repository and label strings. However, those 162labels have become directly used, so aren't able to easily change regardless. 163 164On the other hand, using `requirement()` has several drawbacks; see 165[this issue][requirements-drawbacks] for an enumeration. If you don't 166want to use `requirement()`, you can use the library 167labels directly instead. For `pip_parse`, the labels are of the following form: 168 169```starlark 170@{name}//{package} 171``` 172 173Here `name` is the `name` attribute that was passed to `pip_parse` and 174`package` is the pip package name with characters that are illegal in 175Bazel label names (e.g. `-`, `.`) replaced with `_`. If you need to 176update `name` from "old" to "new", then you can run the following 177buildozer command: 178 179```shell 180buildozer 'substitute deps @old//([^/]+) @new//${1}' //...:* 181``` 182 183[requirements-drawbacks]: https://github.com/bazelbuild/rules_python/issues/414 184 185### Entry points 186 187If you would like to access [entry points][whl_ep], see the `py_console_script_binary` rule documentation, 188which can help you create a `py_binary` target for a particular console script exposed by a package. 189 190[whl_ep]: https://packaging.python.org/specifications/entry-points/ 191 192### 'Extras' dependencies 193 194Any 'extras' specified in the requirements lock file will be automatically added 195as transitive dependencies of the package. In the example above, you'd just put 196`requirement("useful_dep")` or `@pypi//useful_dep`. 197 198### Consuming Wheel Dists Directly 199 200If you need to depend on the wheel dists themselves, for instance, to pass them 201to some other packaging tool, you can get a handle to them with the 202`whl_requirement` macro. For example: 203 204```starlark 205load("@pypi//:requirements.bzl", "whl_requirement") 206 207filegroup( 208 name = "whl_files", 209 data = [ 210 # This is equivalent to "@pypi//boto3:whl" 211 whl_requirement("boto3"), 212 ] 213) 214``` 215 216### Creating a filegroup of files within a whl 217 218The rule {obj}`whl_filegroup` exists as an easy way to extract the necessary files 219from a whl file without the need to modify the `BUILD.bazel` contents of the 220whl repositories generated via `pip_repository`. Use it similarly to the `filegroup` 221above. See the API docs for more information. 222 223(advance-topics)= 224## Advanced topics 225 226(circular-deps)= 227### Circular dependencies 228 229Sometimes PyPi packages contain dependency cycles -- for instance a particular 230version `sphinx` (this is no longer the case in the latest version as of 2312024-06-02) depends on `sphinxcontrib-serializinghtml`. When using them as 232`requirement()`s, ala 233 234``` 235py_binary( 236 name = "doctool", 237 ... 238 deps = [ 239 requirement("sphinx"), 240 ], 241) 242``` 243 244Bazel will protest because it doesn't support cycles in the build graph -- 245 246``` 247ERROR: .../external/pypi_sphinxcontrib_serializinghtml/BUILD.bazel:44:6: in alias rule @pypi_sphinxcontrib_serializinghtml//:pkg: cycle in dependency graph: 248 //:doctool (...) 249 @pypi//sphinxcontrib_serializinghtml:pkg (...) 250.-> @pypi_sphinxcontrib_serializinghtml//:pkg (...) 251| @pypi_sphinxcontrib_serializinghtml//:_pkg (...) 252| @pypi_sphinx//:pkg (...) 253| @pypi_sphinx//:_pkg (...) 254`-- @pypi_sphinxcontrib_serializinghtml//:pkg (...) 255``` 256 257The `experimental_requirement_cycles` argument allows you to work around these 258issues by specifying groups of packages which form cycles. `pip_parse` will 259transparently fix the cycles for you and provide the cyclic dependencies 260simultaneously. 261 262```starlark 263pip_parse( 264 ... 265 experimental_requirement_cycles = { 266 "sphinx": [ 267 "sphinx", 268 "sphinxcontrib-serializinghtml", 269 ] 270 }, 271) 272``` 273 274`pip_parse` supports fixing multiple cycles simultaneously, however cycles must 275be distinct. `apache-airflow` for instance has dependency cycles with a number 276of its optional dependencies, which means those optional dependencies must all 277be a part of the `airflow` cycle. For instance -- 278 279```starlark 280pip_parse( 281 ... 282 experimental_requirement_cycles = { 283 "airflow": [ 284 "apache-airflow", 285 "apache-airflow-providers-common-sql", 286 "apache-airflow-providers-postgres", 287 "apache-airflow-providers-sqlite", 288 ] 289 } 290) 291``` 292 293Alternatively, one could resolve the cycle by removing one leg of it. 294 295For example while `apache-airflow-providers-sqlite` is "baked into" the Airflow 296package, `apache-airflow-providers-postgres` is not and is an optional feature. 297Rather than listing `apache-airflow[postgres]` in your `requirements.txt` which 298would expose a cycle via the extra, one could either _manually_ depend on 299`apache-airflow` and `apache-airflow-providers-postgres` separately as 300requirements. Bazel rules which need only `apache-airflow` can take it as a 301dependency, and rules which explicitly want to mix in 302`apache-airflow-providers-postgres` now can. 303 304Alternatively, one could use `rules_python`'s patching features to remove one 305leg of the dependency manually. For instance by making 306`apache-airflow-providers-postgres` not explicitly depend on `apache-airflow` or 307perhaps `apache-airflow-providers-common-sql`. 308 309 310(bazel-downloader)= 311### Bazel downloader and multi-platform wheel hub repository. 312 313The `bzlmod` `pip.parse` call supports pulling information from `PyPI` (or a 314compatible mirror) and it will ensure that the [bazel 315downloader][bazel_downloader] is used for downloading the wheels. This allows 316the users to use the [credential helper](#credential-helper) to authenticate 317with the mirror and it also ensures that the distribution downloads are cached. 318It also avoids using `pip` altogether and results in much faster dependency 319fetching. 320 321This can be enabled by `experimental_index_url` and related flags as shown in 322the {gh-path}`examples/bzlmod/MODULE.bazel` example. 323 324When using this feature during the `pip` extension evaluation you will see the accessed indexes similar to below: 325```console 326Loading: 0 packages loaded 327 currently loading: docs/ 328 Fetching module extension pip in @@//python/extensions:pip.bzl; starting 329 Fetching https://pypi.org/simple/twine/ 330``` 331 332This does not mean that `rules_python` is fetching the wheels eagerly, but it 333rather means that it is calling the PyPI server to get the Simple API response 334to get the list of all available source and wheel distributions. Once it has 335got all of the available distributions, it will select the right ones depending 336on the `sha256` values in your `requirements_lock.txt` file. The compatible 337distribution URLs will be then written to the `MODULE.bazel.lock` file. Currently 338users wishing to use the lock file with `rules_python` with this feature have 339to set an environment variable `RULES_PYTHON_OS_ARCH_LOCK_FILE=0` which will 340become default in the next release. 341 342Fetching the distribution information from the PyPI allows `rules_python` to 343know which `whl` should be used on which target platform and it will determine 344that by parsing the `whl` filename based on [PEP600], [PEP656] standards. This 345allows the user to configure the behaviour by using the following publicly 346available flags: 347* {obj}`--@rules_python//python/config_settings:py_linux_libc` for selecting the Linux libc variant. 348* {obj}`--@rules_python//python/config_settings:pip_whl` for selecting `whl` distribution preference. 349* {obj}`--@rules_python//python/config_settings:pip_whl_osx_arch` for selecting MacOS wheel preference. 350* {obj}`--@rules_python//python/config_settings:pip_whl_glibc_version` for selecting the GLIBC version compatibility. 351* {obj}`--@rules_python//python/config_settings:pip_whl_muslc_version` for selecting the musl version compatibility. 352* {obj}`--@rules_python//python/config_settings:pip_whl_osx_version` for selecting MacOS version compatibility. 353 354[bazel_downloader]: https://bazel.build/rules/lib/builtins/repository_ctx#download 355[pep600]: https://peps.python.org/pep-0600/ 356[pep656]: https://peps.python.org/pep-0656/ 357 358(credential-helper)= 359### Credential Helper 360 361The "use Bazel downloader for python wheels" experimental feature includes support for the Bazel 362[Credential Helper][cred-helper-design]. 363 364Your python artifact registry may provide a credential helper for you. Refer to your index's docs 365to see if one is provided. 366 367See the [Credential Helper Spec][cred-helper-spec] for details. 368 369[cred-helper-design]: https://github.com/bazelbuild/proposals/blob/main/designs/2022-06-07-bazel-credential-helpers.md 370[cred-helper-spec]: https://github.com/EngFlow/credential-helper-spec/blob/main/spec.md 371 372 373#### Basic Example: 374 375The simplest form of a credential helper is a bash script that accepts an arg and spits out JSON to 376stdout. For a service like Google Artifact Registry that uses ['Basic' HTTP Auth][rfc7617] and does 377not provide a credential helper that conforms to the [spec][cred-helper-spec], the script might 378look like: 379 380```bash 381#!/bin/bash 382# cred_helper.sh 383ARG=$1 # but we don't do anything with it as it's always "get" 384 385# formatting is optional 386echo '{' 387echo ' "headers": {' 388echo ' "Authorization": ["Basic dGVzdDoxMjPCow=="]' 389echo ' }' 390echo '}' 391``` 392 393Configure Bazel to use this credential helper for your python index `example.com`: 394 395``` 396# .bazelrc 397build --credential_helper=example.com=/full/path/to/cred_helper.sh 398``` 399 400Bazel will call this file like `cred_helper.sh get` and use the returned JSON to inject headers 401into whatever HTTP(S) request it performs against `example.com`. 402 403[rfc7617]: https://datatracker.ietf.org/doc/html/rfc7617 404