PythonExtensions

From HerzbubeWiki
Jump to navigation Jump to search

This page provides information about writing Python extensions in C/C++. I compiled and used the information to implement the python-aprmd5 extension.

A lot of stuff on this page has been copied practically verbatim from the Python docs.


References

In the Python docs:


Template of an extension's .c file

// ---------------------------------------------------------------------------
// Python includes
// ---------------------------------------------------------------------------

// If this is defined, PyArg_ParseTuple() will use Py_ssize_t rather than int
// when it encounters the format string "s#".
// As per Python docs: "It is best to always define PY_SSIZE_T_CLEAN" because
// in some future version of Python, support for int will be dropped completely.
#define PY_SSIZE_T_CLEAN

// As per Python docs: This must be included before any standard headers are
// included, because Python may define some pre-processor definitions which
// affect the standard headers on some systems.
#include <Python.h>


// ---------------------------------------------------------------------------
// Normal includes
// ---------------------------------------------------------------------------
[...]


// ---------------------------------------------------------------------------
// Module functions
// ---------------------------------------------------------------------------
[...]

// ---------------------------------------------------------------------------
// The method table: List methods in this module.
// ---------------------------------------------------------------------------
static PyMethodDef <Modulename>Methods[] =
{
  {
    "foo", <modulename>_foo, METH_VARARGS,  // METH_VARARGS means the function should expect parameters to be passed
    "Short method description"              // in as a tuple acceptable for parsing via PyArg_ParseTuple
  },

  {
    "bar", <modulename>_bar, METH_VARARGS | METH_KEYWORDS,  // Keyword arguments are passed to the function; the
    "Short method description"                              // function needs a 3rd parameter
  },

  [...]  // more methods

  {NULL, NULL, 0, NULL}   // Sentinel
};


// ---------------------------------------------------------------------------
// The module definition structure.
// Note: This exists only for Py3k
// ---------------------------------------------------------------------------
#if PY_MAJOR_VERSION >= 3

static struct PyModuleDef <modulename>module =
{
  PyModuleDef_HEAD_INIT,
  "<modulename>",      // name of module
  NULL,                // module documentation, may be NULL
  -1,                  // size of per-interpreter state of the module
                       // or -1 if the module keeps state in global variables
  <Modulename>Methods  // reference to the method table (see above)
};

#endif  // #if PY_MAJOR_VERSION >= 3


// ---------------------------------------------------------------------------
// The module’s initialization function. The initialization function must be
// named PyInit_name, where name is the name of the module, and should be the
// only non-static item defined in the module file. The function is called when
// the Python program imports the module for the first time.
// ---------------------------------------------------------------------------
PyMODINIT_FUNC

#if PY_MAJOR_VERSION >= 3

PyInit_<modulename>(void)
{
  return PyModule_Create(&<modulename>module);
}


#else   // #if PY_MAJOR_VERSION >= 3

init<modulename>(void)
{
  (void) Py_InitModule("<modulename>", <Modulename>Methods);
}

#endif  // #if PY_MAJOR_VERSION >= 3


Function definitions

Unless it expects parameters to be passed in as keyword arguments, a Python C function always has two arguments, conventionally named self and args:

static PyObject*
<modulename>_foo(PyObject* self, PyObject* args)
{
  // implementation
}

Notes:

  • self is only used when the C function implements a built-in method, not a function. If the argument is not used, it is a NULL pointer
  • args is a pointer to a Python tuple object containing the arguments. Each item of the tuple corresponds to an argument in the call’s argument list. The arguments are Python objects — in order to do anything with them in our C function we have to convert them to C values. The function PyArg_ParseTuple in the Python API checks the argument types and converts them to C values. It uses a template string to determine the required types of the arguments as well as the types of the C variables into which to store the converted values.
  • The PyObject type is a pointer to an opaque data type representing an arbitrary Python object. Almost all Python objects live on the heap: you never declare an automatic or static variable of type PyObject, only pointer variables of type PyObject* can be declared. The sole exception are the type objects; since these must never be deallocated, they are typically static PyTypeObject objects.


NULL pointers

Python generally does not pass NULL pointers, so it is safe if your functions do not check for NULL pointers.

Except, of course, when non-Python functions are called (e.g. malloc).


Responsibility for Python objects

Python uses reference counting to prevent memory leaks. There are two macros, Py_INCREF(x) and Py_DECREF(x), which handle the incrementing and decrementing of the reference count. Py_DECREF also frees the object when the count reaches zero.

Conventions/definitions/rules:

  • Nobody "owns" an object
  • Instead, a reference to an object is owned
  • An object's reference count is defined as "the number of owned references to it"
  • The owner of a reference is responsible for calling Py_DECREF when the reference is no longer needed
  • Ownership of a reference can be transferred
  • There are three ways to dispose of an owned reference:
    • Pass it on
    • Store it
    • Call Py_DECREF
    • Forgetting to dispose of an owned reference creates a memory leak
  • A reference can be borrowed from its owner
    • The borrower of a reference should not call Py_DECREF
    • The borrower must not hold on to the object longer than the owner from which it was borrowed
    • A borrowed reference can be changed into an owned reference by calling Py_INCREF. This does not affect the status of the owner from which the reference was borrowed — it creates a new owned reference
  • General ownership rules:
    • Most functions that return a reference to an object pass on ownership with the reference, i.e. they have increased the reference count but do not decrease it prior to returning. This is especially true for C functions called by Python
    • When you pass an object reference into another function, in general, the function borrows the reference from you
    • When a C function is called from Python, it borrows references to its arguments from the caller. The caller owns a reference to the object, so the borrowed reference’s lifetime is guaranteed until the function returns
  • Special functions in Python behave "differently"; see "Ownership rules" section in the Python docs' "Extending and Embedding" tutorial


Handling parameters and return values

PyArg_ParseTuple

Function signature:

int PyArg_ParseTuple(PyObject *arg, char *format, ...);

Parameters and return value:

  • arg must be a tuple object containing an argument list passed from Python to a C function.
  • format must be a format string, whose syntax is explained further down
  • The remaining arguments must be addresses of variables whose type is determined by the format string
  • PyArg_ParseTuple returns true (nonzero) if all arguments have the right type and its components have been stored in the variables whose addresses are passed. It returns false (zero) if an invalid argument list was passed. In the latter case it also raises an appropriate exception (e.g. PyExc_TypeError) so the calling function can return NULL immediately.


Note: Any Python object references which are provided to the caller are borrowed references; do not decrement their reference count!


Format strings:

  • See: Python/C API Reference Manual -> Utilities -> Parsing arguments and building values


Examples:

int ok;
int i, j;
long k, l;
const char *s;
Py_ssize_t size;

// No arguments. Python call: f()
ok = PyArg_ParseTuple(args, "");
// A string. Possible Python call: f('whoops!')
ok = PyArg_ParseTuple(args, "s", &s);
// Two longs and a string. Possible Python call: f(1, 2, 'three')
ok = PyArg_ParseTuple(args, "lls", &k, &l, &s);
// A pair of ints and a string, whose size is also returned
// Possible Python call: f((1, 2), 'three')
ok = PyArg_ParseTuple(args, "(ii)s#", &i, &j, &s, &size);
// A string, and optionally another string and an integer
// Possible Python calls:
//   f('spam')
//   f('spam', 'w')
//   f('spam', 'wb', 100000)
const char *file;
const char *mode = "r";
int bufsize = 0;
ok = PyArg_ParseTuple(args, "s|si", &file, &mode, &bufsize);
// A rectangle and a point
// Possible Python call:
//   f(((0, 0), (400, 300)), (10, 10))
int left, top, right, bottom, h, v;
ok = PyArg_ParseTuple(args, "((ii)(ii))(ii)", &left, &top, &right, &bottom, &h, &v);
// A complex, also providing a function name for errors
// Possible Python call: myfunction(1+2j)
Py_complex c;
ok = PyArg_ParseTuple(args, "D:myfunction", &c);


Py_BuildValue

Function signature:

PyObject *Py_BuildValue(char *format, ...);

Parameters and return value:

  • format is a set of format units similar to the ones recognized by PyArg_ParseTuple
  • The remaining arguments (which are input to the function, not output) must not be pointers, just values
  • Py_BuildValue returns a new Python object, suitable for returning from a C function called from Python

Examples:

Py_BuildValue("")                        None
Py_BuildValue("i", 123)                  123
Py_BuildValue("iii", 123, 456, 789)      (123, 456, 789)
Py_BuildValue("s", "hello")              'hello'
Py_BuildValue("y", "hello")              b'hello'
Py_BuildValue("ss", "hello", "world")    ('hello', 'world')
Py_BuildValue("s#", "hello", 4)          'hell'
Py_BuildValue("y#", "hello", 4)          b'hell'
Py_BuildValue("()")                      ()
Py_BuildValue("(i)", 123)                (123,)
Py_BuildValue("(ii)", 123, 456)          (123, 456)
Py_BuildValue("(i,i)", 123, 456)         (123, 456)
Py_BuildValue("[i,i]", 123, 456)         [123, 456]
Py_BuildValue("{s:i,s:i}",
              "abc", 123, "def", 456)    {'abc': 123, 'def': 456}
Py_BuildValue("((ii)(ii)) (ii)",
              1, 2, 3, 4, 5, 6)          (((1, 2), (3, 4)), (5, 6))


Function without a return value

If you have a C function that returns no useful argument (a function returning void), the corresponding Python function must return None. You need this idiom to do so (which is implemented by the Py_RETURN_NONE macro):

Py_INCREF(Py_None);
return Py_None;


Errors and exceptions

Overview

An important convention throughout the Python interpreter is the following: when a function fails, it should set an exception condition and return an error value (usually a NULL pointer). Exceptions are stored in a static global variable inside the interpreter; if this variable is NULL no exception has occurred. A second global variable stores the “associated value” of the exception (the second argument to raise). A third variable contains the stack traceback in case the error originated in Python code. These three variables are the C equivalents of the result in Python of sys.exc_info(). It is important to know about them to understand how errors are passed around.


Handling errors and exceptions

When a function f() calls another function g() and then detects that g() has failed, f() should itself return an error value (usually NULL or -1). It should not call one of the PyErr_* functions — one has already been called by g(). Once the error reaches the Python interpreter's main loop, this aborts the currently executing Python code and tries to find an exception handler specified by the Python programmer. Note: As usual there are exceptions, but one should really know they are allowed.


You can test non-destructively whether an exception has been set with PyErr_Occurred. This returns the current exception object, or NULL if no exception has occurred. You normally don’t need to call PyErr_Occurred to see whether an error occurred in a function call, since you should be able to tell from the return value.


To ignore an exception set by a function call that failed, the exception condition must be cleared explicitly by calling PyErr_Clear. The only time C code should call PyErr_Clear is if it doesn’t want to pass the error on to the interpreter but wants to handle it completely by itself (possibly by trying something else, or pretending nothing went wrong).


Setting errors and exceptions

The Python API defines a number of functions to set various types of exceptions. You don't need to increment the refcount of the objects passed to any of these functions. A few examples:

  • PyErr_SetString: Its arguments are an exception object and a C string. The exception object is usually a predefined object like PyExc_ZeroDivisionError. The C string indicates the cause of the error and is converted to a Python string object and stored as the “associated value” of the exception.
  • PyErr_SetFromErrno: It only takes an exception argument and constructs the associated value by inspection of the global variable errno. The most general function is PyErr_SetObject, which takes two object arguments, the exception and its associated value.


malloc note: Every failing malloc call must be turned into an exception — the direct caller of malloc (or realloc) must call PyErr_NoMemory and return a failure indicator itself. All the object-creating functions (for example, PyLong_FromLong) already do this, so this note is only relevant to those who call malloc directly.


There are predeclared C objects corresponding to all built-in Python exceptions. Examples that you can use:

  • PyExc_ZeroDivisionError
  • PyExc_ValueError: If an argument value must be in a particular range or must satisfy other conditions
  • PyExc_TypeError: Is usually automatically raised by PyArg_ParseTuple


Calling Python methods from C

Read the corresponding section in the tutorial "Extending Python with C or C++".


Defining new types

For information that goes beyond the stuff in this chapter, see the section "Defining New Types" in the tutorial "Extending Python with C or C++".

TODO

  • add templates
  • write about the following template
  // Allows new objects to be created
  aprmd5_md5Type.tp_new = PyType_GenericNew;
  • Mention that PyObject_New and PyObject_Del are usually default-assigned to tp_alloc and tp_dealloc