LearningPython

The purpose of this page is to keep my notes about my effort to learn the Python progamming language. I doubt very much that this page is of any use to somebody besides myself. If you are not myself, you probably better look at one of the tutorials listed in the "References" section.

Why Python?

The first job of my working career thoroughly taught me shell and awk script programming. Although I was (and am) aware that these script languages are just not suitable for some tasks, I never got around to learn another interpreted programming language that would fill the gap between shell scripts and C++ or Objective-C. I was forced to familiarize myself with perl at some time, but never got along with it, so that doesn't count. When I learned of the existence of ruby, I immediately intended to have a look at it, but somehow there were always other, more important things to do.

Now there is Python which offers itself as another interesting candidate for the "stop-gap" role :-) ... And since I have just started to become involved with ISFDB (whose programs are written in Python) I finally have a reason to get acquainted with something new.

So let's get started...

References

Tutorials:

python.org: http://docs.python.org/tutorial/
Dive Into Python: http://diveintopython.org/

From python.org:

List of beginner's resources: http://wiki.python.org/moin/BeginnersGuide/Programmers
String handling: http://docs.python.org/library/stdtypes.html#string-methods
Style guide for Python code: http://www.python.org/dev/peps/pep-0008/
Docstring conventions: http://www.python.org/dev/peps/pep-0257/
Unit testing framework: http://www.python.org/doc/current/library/unittest.html
Installing Python modules: http://www.python.org/doc/current/install/index.html
Distributing Python modules: http://www.python.org/doc/current/distutils/index.html

From wikipedia.org:

Glossary

Also consult Python's own glossary.

PEP: Pyhton Enhancement Proposal (see this index)
class object: when the interpreter has finished executing the statements of a class definition, a class object is created; the object can be referenced using the class name
class attribute: any name in the class object, i.e. both "variables" and functions
class instance object: the instance of a class
object: general term, i.e. there are other types of objects than just class instance objects (e.g. list objects)
data attribute: a variable that "belongs to" an object
method: a function that "belongs to" an object
function object: if a class MyClass defines a function foo(), the following refers to a function object: MyClass.foo
method object: if myObject is an instance of MyClass (see above), the following refers to a method object: myObject.foo
kwarg: keyword argument
PyPI: Python Package Index (sometimes also known as "The Cheese Shop")

Coding Python

First impressions

Statements are grouped by indentation. Seems yucky!
No variable or argument declarations are necessary. Is this a good thing? Is there a strict mode?
No char type, single characters are simply strings of length 1. Good!
Seems to have good unicode support
Data types: integers, floating points, complex numbers, strings, lists
Strings are immutable, lists are not
Slices are useful to access parts of strings and lists
Right-hand side of an assignment is evaluated before any assignment takes place (important for multiple assignment)
Zero = false, non-zero = true; empty sequence = false, non-empty sequence = true
There is the concept of sequences - lists are sequences

Variables

Declare a variable by assigning it a value:

foo = 'bar'

Remove the variable declaration with the del statement:

del foo

Data types

Boolean

There are two boolean constants. Note: Case is important.

True
False

Sequences

There are six sequence types:

strings (immutable)
Unicode strings (immutable)
lists (mutable)
tuples (immutable)
buffers (immutable)
xrange objects.

Sequences are indexed by a range of numbers.

The in and not in keywords test whether or not a sequence contains a certain value:

a = ['cat', 'window', 'defenestrate']
if 'window' in a:
  print('is in list')

if 'n' not in ('y', 'ye', 'yes'):
  print('is not in list')

See http://www.python.org/doc/current/library/stdtypes.html#string-methods for useful stuff that you can do with sequence types.

Lists

A list is a mutable sequence type.

list1 = [12345, 54321, 'hello!']
# Refer to single list elements by index position
element = list1[2]
# Refer to list elements by slice notation (results in another list); in this example, list2 refers to 54321
list2 = list1[1:2]
# Empty list
list3 = list()
# Two equivalent ways to append the elements of one list (list2) to another list (list1)
list1.extend(list2)
list1[len(list1):] = list2
# Append a single element to a list
list1.append('world')
# Remove an element from a list, in place. The element must exist, otherwise an error is raised.
list1.remove(12345)
# Counting list members
len(list1)
list1.count('world')
# Comparing lists
list1 = [12345, 54321]
list2 = [54321, 12345]
assert(list1 != list2)
assert(sorted(list1) == sorted(list2))
list1.sort()
list2.sort()
assert(list1 == list2)
# Copying a list
list2 = list1[:]
# Testing item presence
if 12345 in list1:
    # do something
if 12345 not in list1:
    # do something

Tuples

A tuple is an immutable sequence type. It consists of a number of values separated by commas:

t = 12345, 54321, 'hello!'
# Tuples may be nested:
u = t, (1, 2, 3, 4, 5)
# Empty tuple
empty = ()
# Tuple with 1 element needs a trailing comma
singleton = 'hello',

Sets

A set is an unordered collection with no duplicate elements.

basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']
fruit = set(basket)               # create a set without duplicates
a = set('abracadabra')

Dictionaries

A dictionary is an unordered set of key/value pairs, with the requirement that the keys are unique. A key can be any immutable type:

Strings and numbers can always be keys
Tuples can be used as keys if they contain only strings, numbers, or tuples; if a tuple contains any mutable object either directly or indirectly, it cannot be used as a key
Lists cannot be used as keys because lists can be modified in place

tel = {'jack': 4098, 'sape': 4139}
# Add an entry
tel['guido'] = 4127
# Use dict() to build dictionary from lists of key-value pairs stored as tuples
dict([('jack', 4098), ('sape', 4139), ('guido', 4127)])
# Use dict() with keyword arguments (keys are strings)
dict(jack=4098, sape=4139, guido=4127)
# Return a list with dictionary keys or values
tel.keys()
tel.values()
# Comparing dictionaries
dictl = {"a": 17, "b": 42}
dict2 = {"b": 42, "a": 17}
assert(dict1 == dict2)
# Copying a dictionary
dict2 = dict1.copy()   # shallow copy (sufficient if values are immutable)
import copy
dict2 = copy.deepcopy(dict1)   # deep copy (e.g. if values are mutable, such as lists)
# Testing key presence
if "a" in dict1:
    # do something
if "a" not in dict1:
    # do something

Strings

References

Introduction: http://www.python.org/doc/current/tutorial/introduction.html#strings
String methods: http://www.python.org/doc/current/library/stdtypes.html#string-methods
String services: http://www.python.org/doc/current/library/string.html

Literals

String literals:

can be enclosed in single or double quotes
multi-line strings when using single or double quotes must use a backslash ("\") to indicate line continuation
"\n" indicates newlines
the following example defines a raw string literal where the backslash loses its special properties:

foo = r"one two \n three"

a string literal can also be enclosed in triple quotes, this is used e.g. in docstrings; there is no need for backslashes to indicate line continuation or newlines
the following example defines a unicode string literal using the unicode character with ordinal value 0x0020 (= a space); note that the interpreation of the "ä" character depends entirely on the encoding of the source file that contains the literal - if it's in latin-1 then the result will be incorrect!

foo = u"Patrick\u0020Näf"

Conversion

The str() function converts its argument into a string according to the string conversion rules specific to the argument's type. TODO: Find exact definitions, e.g. for numeric values, for class instance objects.

Operations

concatenation using the "+" operator
length using the "len()" function
subscription using the [] operator
- "foo[0]" refers to the first character
- "foo[-1]" refers to the last character
- "foo[2:5]" refers to characters at index positions 2-4
- "foo[:5]" refers to characters at index positions 0-4
- "foo[5:]" refers to characters from index position 5 until end-of-string
- "foo[-2:]" refers to the last two characters
splitting into parts using the split() function: "a,b,c".split(",") # results in a list

Flow control

if

if x < 0:
  x = 0
  print('Negative changed to zero')
elif x == 0:
  print('Zero')
elif x == 1:
  print('Single')
else:
  print('More')

for

Python's for statement iterates over the items of any sequence:

a = ['cat', 'window', 'defenestrate']
for x in a:
  print(x, len(x))

It is not safe to modify the sequence being iterated over in the loop. Instead, iterate over a copy, e.g. with a slice:

for x in a[:]: # make a slice copy of the entire list
  if len(x) > 6: a.insert(0, x)

To iterate over the indices of a sequence, combine range() and len() as follows:

a = ['Mary', 'had', 'a', 'little', 'lamb']
for i in range(len(a)):
  print(i, a[i])

while

a, b = 0, 1
while b < 10:
  print(b)
  a, b = b, a+b

break, continue, else on loops

break and continue work as expected.

Loop statements may have an else clause; it is executed when the loop terminates through exhaustion of the list (with for) or when the condition becomes false (with while), but not when the loop is terminated by a break statement.

for n in range(2, 10):
  for x in range(2, n):
    if n % x == 0:
      print(n, 'equals', x, '*', n/x)
      break
  else:
    # loop fell through without finding a factor
    print(n, 'is a prime number')

Logical and other operators

# Logical operators
if x < 0 and y > 42:
  print('and')
if x < 0 or y > 42:
  print('or')
if not (x < 0 and y > 42):
  print('not')

# Membership operators
if x in range(2, 10):
  print('in')
if x not in range(2, 10):
  print('not in')
 
# Identity operators
if x is y:
  print('is')
if x is not y:
  print('is not')

# Arithmetic operators
x = 50 % 42      # modulus; result is 8
x = 2 ** 8       # exponent; result is 256
x = 9 // 2       # floor division; result is 4
y = 9.0 // 2.0   # floor division; result is 4.0

Functions

Use def to start a function definition:

def foobar(n, paramwithdefvalue = 17):
  <do something>
  return 42

# Call function
result = foobar(2000)

Local variables (symbols) shadow global variables (symbols). Global variables can be accessed, but cannot be changed (unless using the global statement).

Parameters are passed using "call by [object] reference". If the parameter is a mutable object, changing the object will let the caller see those changes.

Arbitrary number of arguments

A function can be called with an arbitrary number of arguments. These arguments will be wrapped up in a tuple:

def fprintf(file, format, *args):
  file.write(format % args)

When the arguments are already in a list or tuple but need to be unpacked for a function call:

args = [3, 6]
range(*args)

Keyword arguments (kwargs)

Functions can be called using a "keyword = value" syntax. The keyword must match the name of a formal parameter. The main advantage is that we don't have cryptic function calls like

doIt(1, 7, 2.2, 'hmmm')

For instance, we might call the function in the above example like this:

def foobar(foo, bar):
  <do something>

foobar(bar = 99, foo = 'alright')

Function call with arbitrary number of keyword arguments that will then be packed into a dictionary:

def foobar(foo, bar, **keywords):
  print('foo = ', foo)
  print('bar = ', bar)
  keys = keywords.keys()
  keys.sort()
  for kw in keys:
    print(kw, ':', keywords[kw])

foobar(bar = 99, foo = 'alright', keyword1 = 'value1', keyword2 = 'value2')

When the arguments are already in a dictionary but need to be unpacked for a function call:

def foobar(foo, bar):
  <do something>

dict = {"foo": "99", "bar": "alright'}
foobar(**dict)

Modules

How to use modules

# This loads the file "foobar.py"
import foobar

# Execute a function from the module
foobar.doIt(42)

# Assign and use local name
localDoIt=foobar.doIt()
localDoIt(42)

# Import certain items directly from module
from foobar import doIt, dontDoIt
dontDoIt("why not")

# Import everything from module (except names beginning with underscore)
from foobar import *

# Load a module from a package "Sound" and its sub-package "Effects"
import Sound.Effects.echo
Sound.Effects.echo.doSomething()

# Import an entire module from a package, making it available without package prefix
from Sound.Effects import echo
echo.doSomething()

# Import all modules from a package that are listed in the "__all__" variable
# The variable must be set by the package's file __init__.py
from foobar import *

The search path for modules and packages is the list of directories stored in the variable sys.path. This variable is initialized with the following values:

the current directory (".")
the content of the environment variable PYTHONPATH; this has the same syntax as the shell variable PATH
an installation-dependent default path (e.g. /usr/local/lib/python)

Note: A program that knows what it is doing can change the content of sys.path to influence where modules are searched for.

Find out which names a module defines:

# Examine "sys" module
dir(sys)
# List currently defined names
dir()

How to define modules

If someone says

import foobar

the module foobar must be located in a file named foobar.py. The module file does not need to have a special structure.

A module can be located within a package, which is represented by a directory that contains a file

__init__.py

The file can contain

nothing
arbitrary initialization code
a definition of the variable "__all__"; this allows clients to say something like "from foo import *", which in the following example would import the modules "bar1", "bar2" and "bar3", but not module "bar4" or any other module also present within package "foo"

__all__ = ["bar1", "bar2", "bar3"]

Object Orientation

Features

multiple inheritance
no "virtual" or similar keyword, all methods can be overridden
all members are public
everything is an object: data types, classes
operators can be redefined
objects are passed by reference

Class definition

Example 1:

class MyClass:
  i = 12345
  def f(self):
    return 'hello world'

Class objects

A class definition must be executed before it has any effects. When a class definition is left, a class object is created. The class object acts as a wrapper around the contents of the namespace created by the class definition.

Class objects support two kinds of operations: attribute references and instantiation.

Attributes are referenced as expected: obj.name. Class attributes can also be assigned to.

Instantiation is done using function notation (). The special __init()__ method works as a kind of constructor, to initialize the new object to a given state. The __init()__ method may have arguments.

class Complex:
  def __init__(self, realpart, imagpart):
    self.r = realpart
    self.i = imagpart

x = Complex(3.0, -4.5)
x.r, x.i

Function objects and method objects

Consider this class definition:

class MyClass:
  i = 12345
  def f(self):
    return 'hello world'

MyClass.f is a reference to a function object. The function belongs to the class object.

MyClass().f is a reference to a method object. The method belongs to the class instance object.

If you have a reference to a method object m:

m.im_self refers to the instance object that the method belongs to
m.im_func refers to the function object that corresponds to the method

Data hiding

Data hiding is not possible since Pyhton has no concept of "private" or "protected". Clients may access a class object's and/or class instance object's data members in whatever way they want. They may even

change the value of a member
add new members
delete existing members (using the del keyword)

Inheritance

class DerivedClassName(BaseClassName):
  [...]

When an class attribute is referenced, the attribute is recursively searched for, first in the derived class itself, then in the base class, etc. This works both for data and for function attributes. For function attributes, this effectively provides the mechanism for method overriding.

To call the base class method:

BaseClassName.methodname(self, arguments)"

Multiple inheritance:

class DerivedClassName(Base1, Base2, Base3):
  [...]

With multiple inheritance, attribute lookup occurs depth-first, left-to-right.

Method calls

When an instance object's method is called, the first parameter passed to the method is always the instance object (self, this, ...).

# Equivalent
MyClass().f()
MyClass.f(MyClass())

# Equivalent
myObject = MyClass()
myObject.f()
MyClass.f(myObject)

This is perfect if you keep in mind that within methods you always have to use self to refer to data attributs of the instance that the method is operating on.

Static methods

Definition & use of static method through decorator @staticmethod:

class Foo:
  @staticmethod
  def doIt():
    pass

Foo.doIt()

Note: There is also a decorated called @classmethod. I have not (yet) understood what the difference is.

Object destruction

Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. An object becomes unreachable if there are no references left that point to the object.

# Create a reference to a dictionary object
a = {foo = 123, bar = 456}
# The dictionary object is referenced twice
b = a
# Remove a reference
del a
# Remove the second reference; the dictionary object becomes unreachable and may be garbage-collected
b = None

Introspection

If you have a class instance object o:

o.__class__: refers to the class that the object is an instance of

If you have a class object c:

c.__bases__: the tuple of base classes of a class object; if there are no base classes, this will be an empty tuple
c.__name__: the name of the class or type
c.__doc__: the docstring belonging to the class

If you have a reference to a method object m:

m.im_self refers to the instance object that the method belongs to
m.im_func refers to the function object that corresponds to the method

To see the type of any object you use the type() function:

print(type(aVariable))

To check whether an object is an instance of a class that implements a certain interface:

isinstance(object, class)

To perform a similar operation on a class:

issubclass(class, class)

Exceptions

Handling exceptions

Exception handling is pretty straightforward:

the usual try clause
followed by the exception handlers
followed by an optional else clause which is executed if the try block did not raise an error
followed by an optional finally clause which is always executed, regardless of whether an exception occurred or not

import sys

try:
  f = open('myfile.txt')
  s = f.readline()
  i = int(s.strip())

# Assign exception instance to a variable
except IOError as exc:
  # Extract and print exception arguments
  errno, strerror = exc.args
  print("I/O error(%s): %s" % (errno, strerror))

except ValueError:
  print("Could not convert data to an integer.")

# Catch different exception types by naming them in a parenthesized tuple
except (RuntimeError, TypeError, NameError):
  pass

# Catch all exceptions by omitting the name
except:
  (exc_type, exc_value, exc_traceback) = sys.exc_info()
  # exc_type = the object identifying the exception (object has class "type")
  # exc_value = the actual exception object (class depends on the raised exception); passing this to print() usually prints the "reason" embedded in the exception
  # exc_traceback = a traceback object (object has class "traceback") identifying the point in the program where the exception occurred
  print("Unexpected error: " + str(exc_value))
  raise

# Executes if no exception occurred
else:
  print("executing else clause")

# Always executes
finally:
  f.close()

The "with" keyword

PEP 343 introduced the with keyword to simplify standard cleanup of resources when exceptions occur.

General syntax:

with EXPR as VAR
    BLOCK

Concrete example:

with open('foo.json', 'r') as file:
    data = json.load(file)

What happens behind the scenes is, roughly, the following. Note that in reality it's more complicated, but it should cover the essentials:

try/except/finally clauses are inserted around the expression EXPR and the block of code BLOCK.
Initializing code is executed before entering the try clause. In the example this means that the open() function is called.
The result of the initializiation is assigned to the variable VAR.
The block of code BLOCK is executed inside the try clause.
Cleanup code is executed regardless of whether an exception occurred or not. In the example the cleanup consists of calling file.close().

The with keyword expects that the expression EXPR represents a so-called "Context Manager" which follows a certain convention, the "Context Management Protocol". The protocol consists of

Implementing a function __enter__(), which is executed for initialization. This is expected to allocate the resource(s) that should be cleaned up.
Implementing a function __exit__(), which is executed for cleanup, regardless of whether an exception occurred or not. This is expected to cleanup the resources it allocated in __enter__().

The expression EXPR can be a function (as in the example above) or a type (e.g. a class). If it's a class then an instance of that class is created before __enter__() is called.

TODO: Write details about how to support with when designing a class.

Raising exceptions

In Python, exceptions are raised, not thrown. This is how it works:

# Specify the exception name followed by the exception argument
try:
  raise NameError('HiThere')

# Catch the exception, then re-raise it
except NameError:
  print('An exception flew by!')
  raise

Custom exception types

Best practices:

derive from the Exception class
the exception name should end in "Error"
if a module can raise several exceptions, create a base class for exceptions defined by that module, and subclass that to create specific exception classes for different error conditions

See this example.

Executing a Python script

Basics

The script must have the executable bit set and contain a shebang at the top.

osgiliath:~/py# ls -l helloworld 
-rwxr-xr-x 1 root root 39 Sep 27 22:17 helloworld
osgiliath:~/py# cat helloworld 
#!/usr/bin/python

print("hello world")

main() function

The following construct defines & executes a main() function. Note that it is not at all necessary to have a main() function!

def main():
  print("hello world")

if __name__ == "__main__":
  main()

Discussion

the __name__ attribute in this context refers to the name of the current module
the module named "__main__" is a special module provided by the Python runtime; the module represents the (otherwise anonymous) scope in which the interpreter's main program executes

If a module is executed like this:

python foobar.py <arguments>

the module's __name__ attribute is set to "__main__". The module can therefore include code such as the following to detect when it is run as a standalone program:

if __name__ == "__main__":
  do_something()

To quote from diveintopython.org:

The if __name__ trick allows this program do something useful when run by itself, without interfering with its use as a module for other programs.

Command line arguments

Command line arguments are stored in the sys module's argv attribute as a list:

import sys
print(sys.argv)

The getopt module processes sys.argv using the conventions of the Unix getopt() function. More powerful and flexible command line processing is provided by the optparse module.

Modules:

sys: http://www.python.org/doc/current/library/sys.html
getopt: http://www.python.org/doc/current/library/getopt.html
optparse: http://www.python.org/doc/current/library/optparse.html (example from my mkroesti project)

Coding style guide

I break the following "rules" from the coding style guide in PEP 8:

Limit all lines to a maximum of 79 characters
- I do this for docstrings
- I do this for statements that lend themselves for elegant/clear representation on multiple lines
- I don't do this just for the sake of some hypothetical 80-characters-per-line limited device, because to my eyes a statement spaced out over multiple lines usually looks just like garbage (the example used in PEP 8 to demonstrate line wrapping is just such an example)

Rules that I no longer break because I have seen their wisdom :-)

Use 4 spaces per indentation level.
- I formely used 2 spaces only because I was thinking 4 spaces is excessive
- After a relatively short time I noticed that the structure of the code was often hard to see: Where does the scope of the function/class/if-block begin/end?
- At first I blamed Python's group-statements-by-indentation and bitterly wished for the braces I was accustomed to from C/C++/Java
- After some time I stopped griping because this is just a fact that I can't change
- Instead I tried out 4-spaces-per-indent-level and suddenly my code looked better

Stuff

Callable object

A callable object is an instance object that is "called" as if it were a function.

The class must define a method __call__, then "calling" an instance foo of that class like this

foo(arg1, arg2, ...)

is the same as saying

foo.__call__(arg1, arg2, ...)

Null object

The null object is returned by functions that don't explicitly return a value. It supports no special operations. There is exactly one null object, named

None

(a built-in name).

Statement continuation

Although a backslash can be used to continue a statement on a next line, it is usually better to use paranthesis like this (example copied from the Idioms and Anti-Idioms in Python article):

value = (foo.bar()['first'][0]*baz.quux(1, 2)[5:9]
        + calculate_number(10, 20)*forbulate(500, 360))

(the main reason cited in the article to avoid backslashes is that a stray space character after a backslash will break line continuation)

Source code file encoding

PEP 263 describes how to specify the encoding of files that contain Python source code. This is interesting for me because my surname "Näf" contains a non-ASCII character.

It all boils down to the first or second line of the file containing a comment line that satisfies a regular expression described in the PEP. An example:

# coding=<encoding name>

My files all look like this:

#!/usr/bin/env python
# coding=utf-8

Documentation

Docstrings

Functions, classes, etc. can (and should) all be documented using a Python feature called "Documentation Strings" (or "docstrings" for short). For reference, see this overview and the docstring conventions in [PEP 257.

A docstring must be a string literal that occurs as the first statement in a module, functin, class or method definition. A function documentation might look like this:

def doSomething():
  """Summary line, should start with a capital letter and end with a period.

  The second line should always be blank to visually separate the summary sentence(s)
  from the follow-up detailed paragraphs. The detailed description may consist of
  multiple paragraphs and has no restrictions about what it should contain.

  Documentation parsers determine what indentation to use for formatting from the
  first ''non-blank'' line after the first line of the docstring.
  """

To print out an entity's docstring in code:

print(doSomething.__doc__)

To print out the documentation of a module "bar" within package "foo" on the command line:

pydoc foo.bar

Note: The docstrings feature does not define any specific markup, the markup depends on the tool that is desired for processing the docstrings.

reStructuredText

reStructuredText is a special way to markup Python docstrings (or any other source code documentation, for that matter). It has been developed by the docutils project, and the primary document is found here:

http://docutils.sourceforge.net/rst.html

With the 2.6 release, Python has changed its documentation format from LaTeX to reStructuredText. A primer can be found here:

http://docs.python.org/documenting/rest.html

The toolset that processes the Python documentation into HTML is called Sphinx. Its web site is found here:

http://sphinx.pocoo.org/

Doxygen

Apparently Doxygen also supports the Python language, however I have not investigated this since I am quite happy with Python's docstring feature.

Distributing / Installing Python Modules

Note: The information in this section may be outdated and needs to be reviewed!

Overview

The standard way of distributing a Python module, or installing such a distributed module, is to use the module

distutils

from the Python Standard Library.

Some references:

Creating a distribution package: http://www.python.org/doc/current/distutils/introduction.html
Installing a package: http://www.python.org/doc/current/install/index.html
Index of distutils docs: http://www.python.org/doc/current/distutils/index.html

Creating a distribution

Steps required:

write a setup script (setup.py by convention)
(optional) write a setup configuration file (setup.cfg by convention)
(optional for source distribution) write a manifest template file (MANIFEST.in by convention)
create a source distribution
(optional) create one or more built (binary) distributions (e.g. a Debian package, a Windows installer, etc.)

A simple setup.py:

from distutils.core import setup
setup(name='foo',
      version='1.2.3',
      py_modules=['foo'],
      )

Note: Within the setup script, use "/" as path separator. distutils will take care of converting this into the platform specific path separator.

A simple MANIFEST.in:

include COPYING

Note: Instead of a manifest template, it is also possible to provide the actual manifest. In this case, the manifest file must specify every single file to include in the distribution (even setup.py)

To create a source distribution foo-1.2.3.tar.gz:

python setup.py sdist

The source distribution will contain the following stuff:

Python source files (py_modules and packages options in setup.py)
Script files (scripts options in setup.py)
README.txt (or README)
setup.py
setup.cfg
test/test*.py
files mentioned in MANIFEST.in

Note: Build files and versioning files (e.g. .svn) are removed automatically by distutils).

If a manifest file is already present when the "sdist" command is executed, it will be re-created automatically if setup.py or MANIFEST.in are newer. The manifest file needs to be regenerated manually, however, if only files have been added/removed that match an existing file pattern in setup.py or MANIFEST.in:

# Create a new source distribution
python setup.py sdist --force-manifest
# Regenerate manifest file but do not create a source distribution
python setup.py sdist --manifest-only

`setup.py` for my standard project directory layout

My usual project directory layout looks like this:

base
 +-- doc
 |    +-- README
 |        [...]
 +-- src
 |    +-- packages
 |    |    +-- package_A
 |    |    |    +-- foo.py
 |    |    +-- package_B
 |    |         +-- bar.py
 |    +-- tests
 |    |    +-- package_A
 |    |    |    +-- foo_test.py
 |    |    +-- package_B
 |    |         +-- bar_test.py
 |    +-- scripts
 |         +-- foo
 +-- setup.py
 +-- MANIFEST.in

Note: I would have preferred to have a dist subfolder that contains setup.py and MANIFEST.in. Unfortunately this did not work as intended: although in setup.py I was able to specify the package root as "../src/packages", the MANIFEST.in stubbornly refused to accept a recursive-include directory "../doc" (I always got the error "warning: no files found matching '*' under directory '../doc'")

A more complex setup.py that reflects the above directory structure looks like this:

TODO

The accompanying setup.cfg:

TODO

The accompanying MANIFEST.in:

TODO

Installing a distribution

The usual sequence is this:

tar xfvz python-foo-1.2.3.tar.gz 
cd python-foo-1.2.3
./setup.py build
./setup.py install

If something out of the ordinary is required for building/installing the module, it should be mentioned in the file README.

The build step:

This step is responsible for putting the files to install into a build directory
By default, this is named build, located directly below the distribution root
The build directory can be changed using --build-base option; e.g.

python setup.py build --build-base=/tmp/pybuild/foo-1.2.3

The install step:

This step is responsible for copying everything under build/lib (or build/lib.plat) to the chosen installation directory
The standard location of the installation directory is system-dependent; to find out what it is, do the following in an interactive Python shell:

>>> import sys
>>> sys.prefix
'/System/Library/Frameworks/Python.framework/Versions/2.5'
>>> sys.exec_prefix
'/System/Library/Frameworks/Python.framework/Versions/2.5'

The installation directory can be changed using a number of different schemes
- The "home" scheme: python setup.py install --home=~
- The "prefix" scheme: python setup.py install --prefix=/usr/local
For details about the "home" and "prefix" scheme, or for even more customized schemes, see this reference (already cited further up)

Testing

Running the tests

Preferred way to run all tests of the project:

python setup.py test

(this requires some coding in setup.py, see the next chapter for details)

Running all tests of a test module:

python module.py

(this requires that module.py contains code that calls unittest.main())

Running specific tests in a test module:

python unittest.py module.FooTestSuite
python unittest.py module.FooTestCase
python unittest.py module.FooTestCase.testBar

Directory layout

For maximum ease of use, I want to be able to run my test cases like this:

python setup.py test

To achieve this, I organize my unit tests in a directory structure that parallels the directory structure of the project's source code. An example is provided further up where I explain my standard project directory layout.

Adding a "test" command to `distutils`

Useful reference: http://da44en.wordpress.com/2002/11/22/using-distutils/. Another resource that might be worth investigating is http://peak.telecommunity.com/DevCenter/setuptools.

A bare-bones subclass for distutils.cmd.Command looks like this:

class TestCommand(Command):
    user_options = list()

    def initialize_options(self):
        pass
    def finalize_options(self):
        pass
    def run(self):
        pass

A more interesting example is this:

# PSL
from distutils.cmd import Command
import unittest
import sys

# Extend search path for packages and modules. This is required for finding the
# "tests" package and its modules.
PACKAGES_BASEDIR = "src/packages"
sys.path.append(PACKAGES_BASEDIR)

class test(Command):
    """Implements a distutils command to execute unit tests.

    The class name is the same as the command name string used in the 'cmdclass'
    dictionary passed to the setup() function further down. The reason for this
    is that, unfortunately, 'python setup.py test --help' will print out the
    class name instead of the name used in the dictionary (or the 'command_name'
    attribute defined in this class).
    """

    # This must be a class attribute; it is used by
    # "python setup.py --help-commands"
    description = "execute unit tests"

    # Options must be defined in a class attribute. The attribute value is a
    # list of tuples. Each tuple defines an option and must contain 3 values:
    # long option name, short option name, and a description to print with
    # --help. An option that should have an argument must have the suffix "=".
    # Each option defined in user_options must have a data attribute with a
    # name that corresponds to the long name of the option. For instance, an
    # option "--foo-bar" requires an attribute "foo_bar". If the user has
    # specified the option, a value is set to the data attribute. If the
    # option has no argument, the attribute value is set to 1. If the option
    # has an argument, the attribute value is set to the argument value.
    user_options = [("suite=", "s", "run test suite for a specific module [default: run all tests]")]

    def __init__(self, dist):
        # This data attribute is returned by Command.get_command_name()
        self.command_name = "test"
        Command.__init__(self, dist)

    def initialize_options(self):
        # The default value is a callable defined in tests.__init__.py. The user
        # must specify something like this: "--suite tests.test_algorithm"
        self.suite = "tests.allTests"   

    def finalize_options(self):
        pass

    def run(self):
        tests = unittest.defaultTestLoader.loadTestsFromName(self.suite)
        testRunner = unittest.TextTestRunner(verbosity = 1)
        testRunner.run(tests)

setup(
      # Add a command named "test". The name string in the dict is also used by
      # "python setup.py --help-commands", but not by "python setup.py test -h"
      cmdclass = { "test" : test },
      [...]
     )

Coding

Overview

the smallest unit to test is represented by the TestCase class
subclasses of TestCase implement various test methods; test method names usually begin with "test" (although it's also possible to override the single method runTests())
subclasses of TestCase may implement setUp() and tearDown() to define a test fixture
each test method is executed with a new TestCase instance
the class TestSuite aggregates TestCase and other TestSuite instances
a test runner such as TextTestRunner finally executes a number of tests

TestCase examples:

class MyTestCase(unittest.TestCase):
    def setUp():
        passs
    def tearDown():
        passs
    def testFoo():
        passs
    def testBar():
        passs

# Create instances that will execute the named test method
# Note: This gets tedious with lots of test cases and test methods.
# We will see a better way how to do this.
fooTestCase = MyTestCase('testFoo')
barTestCase = MyTestCase('testBar')

TestSuite examples:

# Simple way to aggregate test cases into a test suite
myTestSuite1 = unittest.TestSuite()
myTestSuite1.addTest(MyTestCase('testFoo'))
myTestSuite1.addTest(MyTestCase('testBar'))

# Another way
tests = ["testFoo", "testBar"]
myTestSuite2 = unittest.TestSuite(map(MyTestCase, tests))

# A third way. TestLoader relies on the fact that test method names
# begin with "test"
myTestSuite3 = unittest.defaultTestLoader.loadTestsFromTestCase(MyTestCase)

# A fourth way to get at all tests within an entire module
myTestSuite4 = unittest.defaultTestLoader.loadTestsFromModule(mymodule)

# A last way to get at tests within a module, TestCase, etc. See
# docs for the unittest module for exact behaviour, options and overloads
myTestSuite5 = unittest.defaultTestLoader.loadTestsFromName("mymodule")

If it should be possible to run a test module in standalone mode, the module must contain this code at the bottom (see docs for the unittest module for more options on the unittest.main() method):

if __name__ == "__main__":
    unittest.main()

Finally, these are some of the assertions that the TestCase class defines. Each takes a string message as an optional last argument that can be used e.g. to indicate the exact nature of the failure:

assert(expr)
assertEqual(first, second)
assertNotEqual(first, second)
assertRaises(exception, callable)
fail()

Using A Python Program In the Web

TODO

IDEs

PyCharm

Introduction

PyCharm is an IDE published by JetBrains. PyCharm's key advantages are that it is maintained by a company (could become a disadvantage if the company fails), that it is cross-platform, and that there is a certain recognition effect if you're used to other JetBrains IDE products (Rider for C#/.NET, WebStorm for web development).

A free-of-charge community edition is available.

Key bindings

On my Mac I like to use the "macOS" keymap and then change the following key bindings:

Cmd + E = Edit > Find > Add selection for next occurrence
Cmd + ' = Code > Comment with line comment

Pydev

Pydev is an Eclipse plugin for Python (and Jython) development. It can be installed from this update site:

http://pydev.sourceforge.net/updates/

Note: The former "Pydev Extensions" plugin is now open source and part of Pydev.

Workspace configuration

The minimal configuration is to define one or more Python interpreters in "Preferences -> Pydev -> Interprether Python -> New". When such an interpreter is added, Pydev automatically finds a number of paths that contain modules for the interpreter. It then suggests to add these paths to the PYTHONPATH for that interpreter. This suggestion is usually OK and should be accepted.

I usually add the following interpreters:

The system interpreter /usr/bin/python (2.5.1 on Mac OS X 10.5)
The interpreter installed via fink /sw/bin/python2.5 (2.5.2 as of this writing)
Interpreters installed into /Library/Frameworks/Python (I usually add the latest 2.6.x and 3.x for compatibility testing)

Project configuration

Now that the interpreters have been configured, work on a project can begin:

Switch to the Pydev perspective
Create a new project: File -> New -> Pydev Project
- Pydev projects consist of a .project and a .pydevproject file
- Pydev suggests creating a src folder; if this is accepted, the folder will be added to the project's PYTHONPATH
To use the project with a new SVN repository
- Create a new, empty repository
- Select project, e.g. in the Pydev Package Explorer view
- Context Menu -> Team -> Share Project
- Select "SVN"
- Enter repository URL (e.g. http://www.herzbube.ch/svn/mkroesti)
- Folder name = trunk
- Enter an initial comment (e.g. "add Eclipse project files")
Alternatively, to connect the project to an SVN repository that already exists and has content
- Do the same as above, but enter an URL + folder name that point to the location where the SVN repository lives
- Subclipse will warn that the specified folder already exists in the given repository; you can now say "yes" to let Subclipse checkout the folder and connect the project to the working copy
- Subclipse will offer to switch to the "Team Synchronize" perspective; say "yes"
- you will see that the two project files .project and a .pydevproject are marked as "added"
- right-click on the project and select "Commit..." to add the project files to the repository
- switch back to the "Pydev" perspective
configure the project's PYTHONPATH
- Select project, e.g. in the Pydev Package Explorer view
- Context Menu -> Properties -> PyDev PYTHONPATH
- Add whatever path is needed, e.g. if the project has packages, replace the src folder (which was automatically added when the project was created) by src/packages

Run unit tests

setup.py with "test" command
- open "Run configurations" dialog
- double-click "Python Run" to create a new configuration
- give it a name, e.g. "mkroesti tests (setup.py)"
- select project, e.g. mkroesti
- select main module, e.g. browse for "setup.py" (will result in something like "${workspace_loc:mkroesti/setup.py}")
- on the "Arguments" tab, set the program arguments to "test"
run all tests in a directory
- the easiest way is to let PyDev create the run configuration for you
- right-click on the folder that contains your tests and select "Run as... -> Python unittest"
- the resulting run configuration is configured as follows
- name = <projectname> tests (e.g. "mkroesti tests")
- project = <project> (e.g. mkroesti)
- main module = <folder-with-tests> (e.g. ${workspace_loc:mkroesti/src/packages/tests})
- on the "Arguments" tab, the working directory is set to <folder-with-tests> (e.g. ${workspace_loc:mkroesti/src/packages})

Python Package Index (PyPi)

Website: http://pypi.python.org/

How to submit a package to PyPi:

You need to register a user account before you can submit any packages.
Package submission works in one of three ways
- ./setup.py register
- Submit the file PKG-INFO that is generated by ./setup.py sdist (the file can be found inside the generated tar ball)
- Manually enter package information on the submission page
I found that setup.py's register command works well, although Python 2.5 choked on my name because it contains a Unicode character; I hade to invoke setup.py with a Python 3 interpreter to make it work
Multiple submissions for a package
- The latest submission will overwrite the previous submissions for the same version
- The latest submission will "hide" all previous submissions for other versions, i.e. listings and searches will find only the version of the latest submission. Besides the obvious intention of displaying only the newest version of a package, this feature is also useful if a submission has been made for a wrong version: Simply fix the version number in setup.py and re-submit the package
- The package admin web interface can be used to "un-hide" a hidden submission (I have not investigated how this works)

Further references:

Dive Into Python has a nice overview about packaging Python software
The Cheese Shop Tutorial
The list of trove classifiers

Python environments and pip

pip

References:

https://python.land/virtual-environments/installing-packages-with-pip

pip (recursive acronym for "Pip Installs Packages") is the default package manager for Python. It draws on the Python Package Index as its package repository to install packages in the given Python environment. Custom repositories are possible as well.

pip is available in two forms:

As command line utility. The name is either pip or pip3. The latter is used to allow for side-by-side installations of Python 2 and Python 3. If multiple Python 3 versions are installed then pip<version> is used to pick the correct version. Examples: pip3.12, 3.13.
As a built-in module. The command python3 -m pip is used to invoke functions on it.

Some basic commands:

# Get useful information about command line usage
pip3 help

# Install/upgrade/uninstall packages
pip3 install <package-name>
pip3 install <package-name>==<version>   # is also used to downgrade a package
pip3 install --upgrade <package-name>
pip3 install --upgrade <package-name>==<version>
pip3 uninstall <package-name>

# Information about installed packages
pip3 list
pip list –outdated
pip3 show <package-name>

# Get information about the Python environment
pip3 inspect

Site vs. user installation

Without the use of a virtual environment (see next section), pip installs packages either site-wide (equivalent to system-wide), or for the current user only. In both cases the packages are installed for the specific Python version only with which pip is executed.

For the latter you need to specify the --user option:

pip3 install --user <package-name>

Because site-wide installations usually require some sort of administrator privileges, it may be the case that pip is compiled, or configured, to default to per-user installs, in which case --user is not needed.

Example locations on a Mac with Homebrew installed:

Site-wide path for Python 3.13 = /opt/homebrew/lib/python3.13/site-packages (note that in this special case Homebrew is managing site-wide packages, i.e. you use brew install and not pip3 install to install site-wide packages)
Per-user path for Python 3.13 = /Users/dev/Library/Python/3.13/lib/python/site-packages

Although I have not tried this out, apparently these are common per-user locations on other platforms:

Unix-like systems = ~/.local
Windows = %APPDATA%\local\programs\python

virtualenv (venv)

References:

When working on different projects which probably have wildly differing package version dependencies, installing packages site-wide or in a per-user location can be insufficient to satisfy the needs of all projects. For this reason Python supports installing packages into virtual environments - venv for short - which are isolated from the rest of the system.

Important: Virtual environments are non-portable and should never be added to version control. Because of this when virtual environment folders are created they already contain a .gitignore file by default.

You can use virtualenv in two ways:

Built-in Python module (available since Python 3.4): python3 -m venv [...]
Or as dedicated command line tool (requires installing the virtualenv package with pip): virtualenv [...]

The basic usage pattern is this:

Create the virtual environment (only once)
Activate the virtual environment (once before you start working in the venv)
Work with the virtual environment
Deactivate the virtual environment (once after you stop working in the venv)
Delete the virtual environment (only once)

Creating a virtual environment is done by specifying a folder where the venv is supposed to live. A commonly used folder name is just "venv" - everyone will then know what this folder contains.

python3 -m venv <venv-folder-name>

The venv is then activated with one of the following commands. From now on all shell commands will operate within the context of that venv. pip should already be available as a command line tool within the venv. The main magic of a venv is letting the venv activation script manipulate the PATH environment variable, and possibly other environment variables (I haven't looked into the details) so that your shell uses the command line tools from the venv, and Python uses the modules from the venv.

# macOS, Linux
source <venv-folder-name>/bin/activate

# Windows cmd.exe shell
<venv-folder-name>\Scripts\activate.bat

# Powershell
<venv-folder-name>\Scripts\Activate.ps1

Deactivating the venv works the same on all platforms and is done with a single command:

deactivate

Deleting the venv is as simple as deleting the venv folder:

rm -r <venv-folder-name>

requirements.txt

pip allows to install a number of packages in one command by reading them from a "requirements file". The common name for this file is requirements.txt. This is how the command looks like:

pip3 install -r requirements.txt

The requirements file can be used to specify all dependencies of a project. The file not only allows to specify the package names, but also the versions. You can use the specifiers ==, >, >=, < and <=. Some examples:

foo
foo==1.2.3
foo>=1.2.0
foo>=1.2.0,<=1.3.0

You can let pip generate the requirements file from the packages that are currently installed:

pip3 freeze >requirements.txt

Conda

So far I have not used Conda, but from what I have read about it these are its key points:

Unlike virtualenv, which entirely relies on pip to install packages, Conda is a package manager in its own right. This means that Python modules can be packaged specifically for Conda.
Within a Conda environment you can still use pip to install packages, and Conda will recognize the presence of such pip-installed packages. If you mix the two package sources, though, you have to be careful that they are compatible - especially when the packages are native binaries.
Conda environments are centralized (I assume per user), whereas virtualenv environments are per-project (or better: per-directory).
An empty Conda environment is "fatter" than a virtualenv environment, i.e. it requires more harddisk space.
Conda is not restricted to Python, in principle it is language-independent, allowing packages for multiple languages to be installed in the same environment. Because of this, Conda is apparently popular in the science community where it is used for projects that combine Python and R.

Python Poetry

TODO

pipenv

TODO

pipx

https://github.com/pypa/pipx

From the README:

pipx is a tool to help you install and run end-user applications written in Python. It's roughly similar to macOS's brew, JavaScript's npx, and Linux's apt.
It's closely related to pip. In fact, it uses pip, but is focused on installing and managing Python packages that can be run from the command line directly as applications.
[...] pipx creates an isolated environment for each application and its associated packages.
[...] By default, pipx uses the same package index as pip, PyPI. pipx can also install from all other sources pip can [...] In a way, it turns Python Package Index (PyPI) into a big app store for Python applications.

At the moment I have not explored pipx any further.

Software

Interesting software related to Python

py2app (url): Convert Python scripts into standalone Mac OS X applications

LearningPython

Why Python?

References

Glossary

Coding Python

First impressions

Variables

Data types

Boolean

Sequences

Lists

Tuples

Sets

Dictionaries

Strings

References

Literals

Conversion

Operations

Flow control

if

for

while

break, continue, else on loops

Logical and other operators

Functions

Arbitrary number of arguments

Keyword arguments (kwargs)

Modules

How to use modules

How to define modules

Object Orientation

Features

Class definition

Class objects

Function objects and method objects

Data hiding

Inheritance

Method calls

Static methods

Object destruction

Introspection

Exceptions

Handling exceptions

The "with" keyword

Raising exceptions

Custom exception types

Executing a Python script

Basics

main() function

Command line arguments

Coding style guide

Stuff

Callable object

Null object

Statement continuation

Source code file encoding

Documentation

Docstrings

reStructuredText

Doxygen

Distributing / Installing Python Modules

Overview

Creating a distribution

setup.py for my standard project directory layout

Installing a distribution

Testing

Running the tests

Directory layout

Adding a "test" command to distutils

Coding

Using A Python Program In the Web

IDEs

PyCharm

Introduction

Key bindings

Pydev

Workspace configuration

Project configuration

Python Package Index (PyPi)

`setup.py` for my standard project directory layout

Adding a "test" command to `distutils`