1.. highlight:: c 2 3*************************** 4Isolating Extension Modules 5*************************** 6 7.. topic:: Abstract 8 9 Traditionally, state belonging to Python extension modules was kept in C 10 ``static`` variables, which have process-wide scope. This document 11 describes problems of such per-process state and shows a safer way: 12 per-module state. 13 14 The document also describes how to switch to per-module state where 15 possible. This transition involves allocating space for that state, potentially 16 switching from static types to heap types, and—perhaps most 17 importantly—accessing per-module state from code. 18 19 20Who should read this 21==================== 22 23This guide is written for maintainers of :ref:`C-API <c-api-index>` extensions 24who would like to make that extension safer to use in applications where 25Python itself is used as a library. 26 27 28Background 29========== 30 31An *interpreter* is the context in which Python code runs. It contains 32configuration (e.g. the import path) and runtime state (e.g. the set of 33imported modules). 34 35Python supports running multiple interpreters in one process. There are 36two cases to think about—users may run interpreters: 37 38- in sequence, with several :c:func:`Py_InitializeEx`/:c:func:`Py_FinalizeEx` 39 cycles, and 40- in parallel, managing "sub-interpreters" using 41 :c:func:`Py_NewInterpreter`/:c:func:`Py_EndInterpreter`. 42 43Both cases (and combinations of them) would be most useful when 44embedding Python within a library. Libraries generally shouldn't make 45assumptions about the application that uses them, which include 46assuming a process-wide "main Python interpreter". 47 48Historically, Python extension modules don't handle this use case well. 49Many extension modules (and even some stdlib modules) use *per-process* 50global state, because C ``static`` variables are extremely easy to use. 51Thus, data that should be specific to an interpreter ends up being shared 52between interpreters. Unless the extension developer is careful, it is very 53easy to introduce edge cases that lead to crashes when a module is loaded in 54more than one interpreter in the same process. 55 56Unfortunately, *per-interpreter* state is not easy to achieve. Extension 57authors tend to not keep multiple interpreters in mind when developing, 58and it is currently cumbersome to test the behavior. 59 60Enter Per-Module State 61---------------------- 62 63Instead of focusing on per-interpreter state, Python's C API is evolving 64to better support the more granular *per-module* state. 65This means that C-level data is be attached to a *module object*. 66Each interpreter creates its own module object, keeping the data separate. 67For testing the isolation, multiple module objects corresponding to a single 68extension can even be loaded in a single interpreter. 69 70Per-module state provides an easy way to think about lifetime and 71resource ownership: the extension module will initialize when a 72module object is created, and clean up when it's freed. In this regard, 73a module is just like any other :c:expr:`PyObject *`; there are no "on 74interpreter shutdown" hooks to think—or forget—about. 75 76Note that there are use cases for different kinds of "globals": 77per-process, per-interpreter, per-thread or per-task state. 78With per-module state as the default, these are still possible, 79but you should treat them as exceptional cases: 80if you need them, you should give them additional care and testing. 81(Note that this guide does not cover them.) 82 83 84Isolated Module Objects 85----------------------- 86 87The key point to keep in mind when developing an extension module is 88that several module objects can be created from a single shared library. 89For example: 90 91.. code-block:: pycon 92 93 >>> import sys 94 >>> import binascii 95 >>> old_binascii = binascii 96 >>> del sys.modules['binascii'] 97 >>> import binascii # create a new module object 98 >>> old_binascii == binascii 99 False 100 101As a rule of thumb, the two modules should be completely independent. 102All objects and state specific to the module should be encapsulated 103within the module object, not shared with other module objects, and 104cleaned up when the module object is deallocated. 105Since this just is a rule of thumb, exceptions are possible 106(see `Managing Global State`_), but they will need more 107thought and attention to edge cases. 108 109While some modules could do with less stringent restrictions, isolated 110modules make it easier to set clear expectations and guidelines that 111work across a variety of use cases. 112 113 114Surprising Edge Cases 115--------------------- 116 117Note that isolated modules do create some surprising edge cases. Most 118notably, each module object will typically not share its classes and 119exceptions with other similar modules. Continuing from the 120`example above <Isolated Module Objects_>`__, 121note that ``old_binascii.Error`` and ``binascii.Error`` are 122separate objects. In the following code, the exception is *not* caught: 123 124.. code-block:: pycon 125 126 >>> old_binascii.Error == binascii.Error 127 False 128 >>> try: 129 ... old_binascii.unhexlify(b'qwertyuiop') 130 ... except binascii.Error: 131 ... print('boo') 132 ... 133 Traceback (most recent call last): 134 File "<stdin>", line 2, in <module> 135 binascii.Error: Non-hexadecimal digit found 136 137This is expected. Notice that pure-Python modules behave the same way: 138it is a part of how Python works. 139 140The goal is to make extension modules safe at the C level, not to make 141hacks behave intuitively. Mutating ``sys.modules`` "manually" counts 142as a hack. 143 144 145Making Modules Safe with Multiple Interpreters 146============================================== 147 148 149Managing Global State 150--------------------- 151 152Sometimes, the state associated with a Python module is not specific to that module, but 153to the entire process (or something else "more global" than a module). 154For example: 155 156- The ``readline`` module manages *the* terminal. 157- A module running on a circuit board wants to control *the* on-board 158 LED. 159 160In these cases, the Python module should provide *access* to the global 161state, rather than *own* it. If possible, write the module so that 162multiple copies of it can access the state independently (along with 163other libraries, whether for Python or other languages). If that is not 164possible, consider explicit locking. 165 166If it is necessary to use process-global state, the simplest way to 167avoid issues with multiple interpreters is to explicitly prevent a 168module from being loaded more than once per process—see 169`Opt-Out: Limiting to One Module Object per Process`_. 170 171 172Managing Per-Module State 173------------------------- 174 175To use per-module state, use 176:ref:`multi-phase extension module initialization <multi-phase-initialization>`. 177This signals that your module supports multiple interpreters correctly. 178 179Set ``PyModuleDef.m_size`` to a positive number to request that many 180bytes of storage local to the module. Usually, this will be set to the 181size of some module-specific ``struct``, which can store all of the 182module's C-level state. In particular, it is where you should put 183pointers to classes (including exceptions, but excluding static types) 184and settings (e.g. ``csv``'s :py:data:`~csv.field_size_limit`) 185which the C code needs to function. 186 187.. note:: 188 Another option is to store state in the module's ``__dict__``, 189 but you must avoid crashing when users modify ``__dict__`` from 190 Python code. This usually means error- and type-checking at the C level, 191 which is easy to get wrong and hard to test sufficiently. 192 193 However, if module state is not needed in C code, storing it in 194 ``__dict__`` only is a good idea. 195 196If the module state includes ``PyObject`` pointers, the module object 197must hold references to those objects and implement the module-level hooks 198``m_traverse``, ``m_clear`` and ``m_free``. These work like 199``tp_traverse``, ``tp_clear`` and ``tp_free`` of a class. Adding them will 200require some work and make the code longer; this is the price for 201modules which can be unloaded cleanly. 202 203An example of a module with per-module state is currently available as 204`xxlimited <https://github.com/python/cpython/blob/master/Modules/xxlimited.c>`__; 205example module initialization shown at the bottom of the file. 206 207 208Opt-Out: Limiting to One Module Object per Process 209-------------------------------------------------- 210 211A non-negative ``PyModuleDef.m_size`` signals that a module supports 212multiple interpreters correctly. If this is not yet the case for your 213module, you can explicitly make your module loadable only once per 214process. For example:: 215 216 static int loaded = 0; 217 218 static int 219 exec_module(PyObject* module) 220 { 221 if (loaded) { 222 PyErr_SetString(PyExc_ImportError, 223 "cannot load module more than once per process"); 224 return -1; 225 } 226 loaded = 1; 227 // ... rest of initialization 228 } 229 230 231Module State Access from Functions 232---------------------------------- 233 234Accessing the state from module-level functions is straightforward. 235Functions get the module object as their first argument; for extracting 236the state, you can use ``PyModule_GetState``:: 237 238 static PyObject * 239 func(PyObject *module, PyObject *args) 240 { 241 my_struct *state = (my_struct*)PyModule_GetState(module); 242 if (state == NULL) { 243 return NULL; 244 } 245 // ... rest of logic 246 } 247 248.. note:: 249 ``PyModule_GetState`` may return ``NULL`` without setting an 250 exception if there is no module state, i.e. ``PyModuleDef.m_size`` was 251 zero. In your own module, you're in control of ``m_size``, so this is 252 easy to prevent. 253 254 255Heap Types 256========== 257 258Traditionally, types defined in C code are *static*; that is, 259``static PyTypeObject`` structures defined directly in code and 260initialized using ``PyType_Ready()``. 261 262Such types are necessarily shared across the process. Sharing them 263between module objects requires paying attention to any state they own 264or access. To limit the possible issues, static types are immutable at 265the Python level: for example, you can't set ``str.myattribute = 123``. 266 267.. impl-detail:: 268 Sharing truly immutable objects between interpreters is fine, 269 as long as they don't provide access to mutable objects. 270 However, in CPython, every Python object has a mutable implementation 271 detail: the reference count. Changes to the refcount are guarded by the GIL. 272 Thus, code that shares any Python objects across interpreters implicitly 273 depends on CPython's current, process-wide GIL. 274 275Because they are immutable and process-global, static types cannot access 276"their" module state. 277If any method of such a type requires access to module state, 278the type must be converted to a *heap-allocated type*, or *heap type* 279for short. These correspond more closely to classes created by Python's 280``class`` statement. 281 282For new modules, using heap types by default is a good rule of thumb. 283 284 285Changing Static Types to Heap Types 286----------------------------------- 287 288Static types can be converted to heap types, but note that 289the heap type API was not designed for "lossless" conversion 290from static types—that is, creating a type that works exactly like a given 291static type. 292So, when rewriting the class definition in a new API, 293you are likely to unintentionally change a few details (e.g. pickleability 294or inherited slots). 295Always test the details that are important to you. 296 297Watch out for the following two points in particular (but note that this is not 298a comprehensive list): 299 300* Unlike static types, heap type objects are mutable by default. 301 Use the :c:data:`Py_TPFLAGS_IMMUTABLETYPE` flag to prevent mutability. 302* Heap types inherit :c:member:`~PyTypeObject.tp_new` by default, 303 so it may become possible to instantiate them from Python code. 304 You can prevent this with the :c:data:`Py_TPFLAGS_DISALLOW_INSTANTIATION` flag. 305 306 307Defining Heap Types 308------------------- 309 310Heap types can be created by filling a :c:struct:`PyType_Spec` structure, a 311description or "blueprint" of a class, and calling 312:c:func:`PyType_FromModuleAndSpec` to construct a new class object. 313 314.. note:: 315 Other functions, like :c:func:`PyType_FromSpec`, can also create 316 heap types, but :c:func:`PyType_FromModuleAndSpec` associates the module 317 with the class, allowing access to the module state from methods. 318 319The class should generally be stored in *both* the module state (for 320safe access from C) and the module's ``__dict__`` (for access from 321Python code). 322 323 324Garbage-Collection Protocol 325--------------------------- 326 327Instances of heap types hold a reference to their type. 328This ensures that the type isn't destroyed before all its instances are, 329but may result in reference cycles that need to be broken by the 330garbage collector. 331 332To avoid memory leaks, instances of heap types must implement the 333garbage collection protocol. 334That is, heap types should: 335 336- Have the :c:data:`Py_TPFLAGS_HAVE_GC` flag. 337- Define a traverse function using ``Py_tp_traverse``, which 338 visits the type (e.g. using :c:expr:`Py_VISIT(Py_TYPE(self))`). 339 340Please refer to the :ref:`the documentation <type-structs>` of 341:c:data:`Py_TPFLAGS_HAVE_GC` and :c:member:`~PyTypeObject.tp_traverse` 342for additional considerations. 343 344If your traverse function delegates to the ``tp_traverse`` of its base class 345(or another type), ensure that ``Py_TYPE(self)`` is visited only once. 346Note that only heap type are expected to visit the type in ``tp_traverse``. 347 348For example, if your traverse function includes:: 349 350 base->tp_traverse(self, visit, arg) 351 352...and ``base`` may be a static type, then it should also include:: 353 354 if (base->tp_flags & Py_TPFLAGS_HEAPTYPE) { 355 // a heap type's tp_traverse already visited Py_TYPE(self) 356 } else { 357 Py_VISIT(Py_TYPE(self)); 358 } 359 360It is not necessary to handle the type's reference count in ``tp_new`` 361and ``tp_clear``. 362 363 364Module State Access from Classes 365-------------------------------- 366 367If you have a type object defined with :c:func:`PyType_FromModuleAndSpec`, 368you can call :c:func:`PyType_GetModule` to get the associated module, and then 369:c:func:`PyModule_GetState` to get the module's state. 370 371To save a some tedious error-handling boilerplate code, you can combine 372these two steps with :c:func:`PyType_GetModuleState`, resulting in:: 373 374 my_struct *state = (my_struct*)PyType_GetModuleState(type); 375 if (state === NULL) { 376 return NULL; 377 } 378 379 380Module State Access from Regular Methods 381---------------------------------------- 382 383Accessing the module-level state from methods of a class is somewhat more 384complicated, but is possible thanks to API introduced in Python 3.9. 385To get the state, you need to first get the *defining class*, and then 386get the module state from it. 387 388The largest roadblock is getting *the class a method was defined in*, or 389that method's "defining class" for short. The defining class can have a 390reference to the module it is part of. 391 392Do not confuse the defining class with :c:expr:`Py_TYPE(self)`. If the method 393is called on a *subclass* of your type, ``Py_TYPE(self)`` will refer to 394that subclass, which may be defined in different module than yours. 395 396.. note:: 397 The following Python code can illustrate the concept. 398 ``Base.get_defining_class`` returns ``Base`` even 399 if ``type(self) == Sub``: 400 401 .. code-block:: python 402 403 class Base: 404 def get_type_of_self(self): 405 return type(self) 406 407 def get_defining_class(self): 408 return __class__ 409 410 class Sub(Base): 411 pass 412 413For a method to get its "defining class", it must use the 414:data:`METH_METHOD | METH_FASTCALL | METH_KEYWORDS` 415:c:type:`calling convention <PyMethodDef>` 416and the corresponding :c:type:`PyCMethod` signature:: 417 418 PyObject *PyCMethod( 419 PyObject *self, // object the method was called on 420 PyTypeObject *defining_class, // defining class 421 PyObject *const *args, // C array of arguments 422 Py_ssize_t nargs, // length of "args" 423 PyObject *kwnames) // NULL, or dict of keyword arguments 424 425Once you have the defining class, call :c:func:`PyType_GetModuleState` to get 426the state of its associated module. 427 428For example:: 429 430 static PyObject * 431 example_method(PyObject *self, 432 PyTypeObject *defining_class, 433 PyObject *const *args, 434 Py_ssize_t nargs, 435 PyObject *kwnames) 436 { 437 my_struct *state = (my_struct*)PyType_GetModuleState(defining_class); 438 if (state === NULL) { 439 return NULL; 440 } 441 ... // rest of logic 442 } 443 444 PyDoc_STRVAR(example_method_doc, "..."); 445 446 static PyMethodDef my_methods[] = { 447 {"example_method", 448 (PyCFunction)(void(*)(void))example_method, 449 METH_METHOD|METH_FASTCALL|METH_KEYWORDS, 450 example_method_doc} 451 {NULL}, 452 } 453 454 455Module State Access from Slot Methods, Getters and Setters 456---------------------------------------------------------- 457 458.. note:: 459 460 This is new in Python 3.11. 461 462 .. After adding to limited API: 463 464 If you use the :ref:`limited API <stable>, 465 you must update ``Py_LIMITED_API`` to ``0x030b0000``, losing ABI 466 compatibility with earlier versions. 467 468Slot methods—the fast C equivalents for special methods, such as 469:c:member:`~PyNumberMethods.nb_add` for :py:attr:`~object.__add__` or 470:c:member:`~PyType.tp_new` for initialization—have a very simple API that 471doesn't allow passing in the defining class, unlike with :c:type:`PyCMethod`. 472The same goes for getters and setters defined with 473:c:type:`PyGetSetDef`. 474 475To access the module state in these cases, use the 476:c:func:`PyType_GetModuleByDef` function, and pass in the module definition. 477Once you have the module, call :c:func:`PyModule_GetState` 478to get the state:: 479 480 PyObject *module = PyType_GetModuleByDef(Py_TYPE(self), &module_def); 481 my_struct *state = (my_struct*)PyModule_GetState(module); 482 if (state === NULL) { 483 return NULL; 484 } 485 486``PyType_GetModuleByDef`` works by searching the 487:term:`method resolution order` (i.e. all superclasses) for the first 488superclass that has a corresponding module. 489 490.. note:: 491 492 In very exotic cases (inheritance chains spanning multiple modules 493 created from the same definition), ``PyType_GetModuleByDef`` might not 494 return the module of the true defining class. However, it will always 495 return a module with the same definition, ensuring a compatible 496 C memory layout. 497 498 499Lifetime of the Module State 500---------------------------- 501 502When a module object is garbage-collected, its module state is freed. 503For each pointer to (a part of) the module state, you must hold a reference 504to the module object. 505 506Usually this is not an issue, because types created with 507:c:func:`PyType_FromModuleAndSpec`, and their instances, hold a reference 508to the module. 509However, you must be careful in reference counting when you reference 510module state from other places, such as callbacks for external 511libraries. 512 513 514Open Issues 515=========== 516 517Several issues around per-module state and heap types are still open. 518 519Discussions about improving the situation are best held on the `capi-sig 520mailing list <https://mail.python.org/mailman3/lists/capi-sig.python.org/>`__. 521 522 523Per-Class Scope 524--------------- 525 526It is currently (as of Python 3.11) not possible to attach state to individual 527*types* without relying on CPython implementation details (which may change 528in the future—perhaps, ironically, to allow a proper solution for 529per-class scope). 530 531 532Lossless Conversion to Heap Types 533--------------------------------- 534 535The heap type API was not designed for "lossless" conversion from static types; 536that is, creating a type that works exactly like a given static type. 537