LearningPython

From HerzbubeWiki
Jump to navigation Jump to search

The purpose of this page is to keep my notes about my effort to learn the Python progamming language. I doubt very much that this page is of any use to somebody besides myself. If you are not myself, you probably better look at one of the tutorials listed in the "References" section.


Why Python?

The first job of my working career thoroughly taught me shell and awk script programming. Although I was (and am) aware that these script languages are just not suitable for some tasks, I never got around to learn another interpreted programming language that would fill the gap between shell scripts and C++ or Objective-C. I was forced to familiarize myself with perl at some time, but never got along with it, so that doesn't count. When I learned of the existence of ruby, I immediately intended to have a look at it, but somehow there were always other, more important things to do.

Now there is Python which offers itself as another interesting candidate for the "stop-gap" role :-) ... And since I have just started to become involved with ISFDB (whose programs are written in Python) I finally have a reason to get acquainted with something new.

So let's get started...


References

Tutorials:

python.org
http://docs.python.org/tutorial/
Dive Into Python
http://diveintopython.org/


From python.org:

List of beginner's resources
http://wiki.python.org/moin/BeginnersGuide/Programmers
String handling
http://docs.python.org/library/stdtypes.html#string-methods
Style guide for Python code
http://www.python.org/dev/peps/pep-0008/
Docstring conventions
http://www.python.org/dev/peps/pep-0257/
Unit testing framework
http://www.python.org/doc/current/library/unittest.html
Installing Python modules
http://www.python.org/doc/current/install/index.html
Distributing Python modules
http://www.python.org/doc/current/distutils/index.html


From wikipedia.org:


Glossary

Also consult Python's own glossary.

PEP
Pyhton Enhancement Proposal (see this index)
class object
when the interpreter has finished executing the statements of a class definition, a class object is created; the object can be referenced using the class name
class attribute
any name in the class object, i.e. both "variables" and functions
class instance object
the instance of a class
object
general term, i.e. there are other types of objects than just class instance objects (e.g. list objects)
data attribute
a variable that "belongs to" an object
method
a function that "belongs to" an object
function object
if a class MyClass defines a function foo(), the following refers to a function object: MyClass.foo
method object
if myObject is an instance of MyClass (see above), the following refers to a method object: myObject.foo
kwarg
keyword argument
PyPI
Python Package Index (sometimes also known as "The Cheese Shop")


Coding Python

First impressions

  • Statements are grouped by indentation. Seems yucky!
  • No variable or argument declarations are necessary. Is this a good thing? Is there a strict mode?
  • No char type, single characters are simply strings of length 1. Good!
  • Seems to have good unicode support
  • Data types: integers, floating points, complex numbers, strings, lists
  • Strings are immutable, lists are not
  • Slices are useful to access parts of strings and lists
  • Right-hand side of an assignment is evaluated before any assignment takes place (important for multiple assignment)
  • Zero = false, non-zero = true; empty sequence = false, non-empty sequence = true
  • There is the concept of sequences - lists are sequences


Variables

Declare a variable by assigning it a value:

foo = 'bar'

Remove the variable declaration with the del statement:

del foo


Data types

Sequences

There are six sequence types:

  1. strings (immutable)
  2. Unicode strings (immutable)
  3. lists (mutable)
  4. tuples (immutable)
  5. buffers (immutable)
  6. xrange objects.

Sequences are indexed by a range of numbers.

The in and not in keywords test whether or not a sequence contains a certain value:

a = ['cat', 'window', 'defenestrate']
if 'window' in a:
  print('is in list')

if 'n' not in ('y', 'ye', 'yes'):
  print('is not in list')


See http://www.python.org/doc/current/library/stdtypes.html#string-methods for useful stuff that you can do with sequence types.


Lists

A list is a mutable sequence type.

list1 = [12345, 54321, 'hello!']
# Refer to single list elements by index position
element = list1[2]
# Refer to list elements by slice notation (results in another list); in this example, list2 refers to 54321
list2 = list1[1:2]
# Empty list
list3 = list()
# Two equivalent ways to append the elements of one list (list2) to another list (list1)
list1.extend(list2)
list1[len(list1):] = list2
# Append a single element to a list
list1.append('world')
# Remove an element from a list, in place. The element must exist, otherwise an error is raised.
list1.remove(12345)
# Counting list members
len(list1)
list1.count('world')
# Comparing lists
list1 = [12345, 54321]
list2 = [54321, 12345]
assert(list1 != list2)
assert(sorted(list1) == sorted(list2))
list1.sort()
list2.sort()
assert(list1 == list2)
# Copying a list
list2 = list1[:]


Tuples

A tuple is an immutable sequence type. It consists of a number of values separated by commas:

t = 12345, 54321, 'hello!'
# Tuples may be nested:
u = t, (1, 2, 3, 4, 5)
# Empty tuple
empty = ()
# Tuple with 1 element needs a trailing comma
singleton = 'hello',


Sets

A set is an unordered collection with no duplicate elements.

basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']
fruit = set(basket)               # create a set without duplicates
a = set('abracadabra')


Dictionaries

A dictionary is an unordered set of key/value pairs, with the requirement that the keys are unique. A key can be any immutable type:

  • Strings and numbers can always be keys
  • Tuples can be used as keys if they contain only strings, numbers, or tuples; if a tuple contains any mutable object either directly or indirectly, it cannot be used as a key
  • Lists cannot be used as keys because lists can be modified in place
tel = {'jack': 4098, 'sape': 4139}
# Add an entry
tel['guido'] = 4127
# Use dict() to build dictionary from lists of key-value pairs stored as tuples
dict([('jack', 4098), ('sape', 4139), ('guido', 4127)])
# Use dict() with keyword arguments (keys are strings)
dict(jack=4098, sape=4139, guido=4127)
# Return a list with dictionary keys or values
tel.keys()
tel.values()
# Comparing dictionaries
dictl = {"a": 17, "b": 42}
dict2 = {"b": 42, "a": 17}
assert(dict1 == dict2)
# Copying a dictionary
dict2 = dict1.copy()   # shallow copy (sufficient if values are immutable)
import copy
dict2 = copy.deepcopy(dict1)   # deep copy (e.g. if values are mutable, such as lists)


Strings

References

Introduction
http://www.python.org/doc/current/tutorial/introduction.html#strings
String methods
http://www.python.org/doc/current/library/stdtypes.html#string-methods
String services
http://www.python.org/doc/current/library/string.html


Literals

String literals:

  • can be enclosed in single or double quotes
  • multi-line strings when using single or double quotes must use a backslash ("\") to indicate line continuation
  • "\n" indicates newlines
  • the following example defines a raw string literal where the backslash loses its special properties:
foo = r"one two \n three"
  • a string literal can also be enclosed in triple quotes, this is used e.g. in docstrings; there is no need for backslashes to indicate line continuation or newlines
  • the following example defines a unicode string literal using the unicode character with ordinal value 0x0020 (= a space); note that the interpreation of the "ä" character depends entirely on the encoding of the source file that contains the literal - if it's in latin-1 then the result will be incorrect!
foo = u"Patrick\u0020Näf"


Conversion

The str() function converts its argument into a string according to the string conversion rules specific to the argument's type. TODO: Find exact definitions, e.g. for numeric values, for class instance objects.


Operations

  • concatenation using the "+" operator
  • length using the "len()" function
  • subscription using the [] operator
    • "foo[0]" refers to the first character
    • "foo[-1]" refers to the last character
    • "foo[2:5]" refers to characters at index positions 2-4
    • "foo[:5]" refers to characters at index positions 0-4
    • "foo[5:]" refers to characters from index position 5 until end-of-string
    • "foo[-2:]" refers to the last two characters
  • splitting into parts using the split() function: "a,b,c".split(",") # results in a list


Flow control

if

if x < 0:
  x = 0
  print('Negative changed to zero')
elif x == 0:
  print('Zero')
elif x == 1:
  print('Single')
else:
  print('More')


for

Python's for statement iterates over the items of any sequence:

a = ['cat', 'window', 'defenestrate']
for x in a:
  print(x, len(x))

It is not safe to modify the sequence being iterated over in the loop. Instead, iterate over a copy, e.g. with a slice:

for x in a[:]: # make a slice copy of the entire list
  if len(x) > 6: a.insert(0, x)

To iterate over the indices of a sequence, combine range() and len() as follows:

a = ['Mary', 'had', 'a', 'little', 'lamb']
for i in range(len(a)):
  print(i, a[i])


while

a, b = 0, 1
while b < 10:
  print(b)
  a, b = b, a+b


break, continue, else on loops

break and continue work as expected.

Loop statements may have an else clause; it is executed when the loop terminates through exhaustion of the list (with for) or when the condition becomes false (with while), but not when the loop is terminated by a break statement.

for n in range(2, 10):
  for x in range(2, n):
    if n % x == 0:
      print(n, 'equals', x, '*', n/x)
      break
  else:
    # loop fell through without finding a factor
    print(n, 'is a prime number')


Logical and other operators

# Logical operators
if x < 0 and y > 42:
  print('and')
if x < 0 or y > 42:
  print('or')
if not (x < 0 and y > 42):
  print('not')

# Membership operators
if x in range(2, 10):
  print('in')
if x not in range(2, 10):
  print('not in')
 
# Identity operators
if x is y:
  print('is')
if x is not y:
  print('is not')

# Arithmetic operators
x = 50 % 42      # modulus; result is 8
x = 2 ** 8       # exponent; result is 256
x = 9 // 2       # floor division; result is 4
y = 9.0 // 2.0   # floor division; result is 4.0


Functions

Use def to start a function definition:

def foobar(n, paramwithdefvalue = 17):
  <do something>
  return 42

# Call function
result = foobar(2000)

Local variables (symbols) shadow global variables (symbols). Global variables can be accessed, but cannot be changed (unless using the global statement).

Parameters are passed using "call by [object] reference". If the parameter is a mutable object, changing the object will let the caller see those changes.


Arbitrary number of arguments

A function can be called with an arbitrary number of arguments. These arguments will be wrapped up in a tuple:

def fprintf(file, format, *args):
  file.write(format % args)

When the arguments are already in a list or tuple but need to be unpacked for a function call:

args = [3, 6]
range(*args) 


Keyword arguments (kwargs)

Functions can be called using a "keyword = value" syntax. The keyword must match the name of a formal parameter. The main advantage is that we don't have cryptic function calls like

doIt(1, 7, 2.2, 'hmmm')

For instance, we might call the function in the above example like this:

def foobar(foo, bar):
  <do something>

foobar(bar = 99, foo = 'alright')

Function call with arbitrary number of keyword arguments that will then be packed into a dictionary:

def foobar(foo, bar, **keywords):
  print('foo = ', foo)
  print('bar = ', bar)
  keys = keywords.keys()
  keys.sort()
  for kw in keys:
    print(kw, ':', keywords[kw])

foobar(bar = 99, foo = 'alright', keyword1 = 'value1', keyword2 = 'value2')

When the arguments are already in a dictionary but need to be unpacked for a function call:

def foobar(foo, bar):
  <do something>

dict = {"foo": "99", "bar": "alright'}
foobar(**dict)


Modules

How to use modules

# This loads the file "foobar.py"
import foobar

# Execute a function from the module
foobar.doIt(42)

# Assign and use local name
localDoIt=foobar.doIt()
localDoIt(42)

# Import certain items directly from module
from foobar import doIt, dontDoIt
dontDoIt("why not")

# Import everything from module (except names beginning with underscore)
from foobar import *

# Load a module from a package "Sound" and its sub-package "Effects"
import Sound.Effects.echo
Sound.Effects.echo.doSomething()

# Import an entire module from a package, making it available without package prefix
from Sound.Effects import echo
echo.doSomething()

# Import all modules from a package that are listed in the "__all__" variable
# The variable must be set by the package's file __init__.py
from foobar import *

The search path for modules and packages is the list of directories stored in the variable sys.path. This variable is initialized with the following values:

  • the current directory (".")
  • the content of the environment variable PYTHONPATH; this has the same syntax as the shell variable PATH
  • an installation-dependent default path (e.g. /usr/local/lib/python)

Note: A program that knows what it is doing can change the content of sys.path to influence where modules are searched for.


Find out which names a module defines:

# Examine "sys" module
dir(sys)
# List currently defined names
dir()


How to define modules

If someone says

import foobar

the module foobar must be located in a file named foobar.py. The module file does not need to have a special structure.

A module can be located within a package, which is represented by a directory that contains a file

__init__.py

The file can contain

  • nothing
  • arbitrary initialization code
  • a definition of the variable "__all__"; this allows clients to say something like "from foo import *", which in the following example would import the modules "bar1", "bar2" and "bar3", but not module "bar4" or any other module also present within package "foo"
__all__ = ["bar1", "bar2", "bar3"]


Object Orientation

Features

  • multiple inheritance
  • no "virtual" or similar keyword, all methods can be overridden
  • all members are public
  • everything is an object: data types, classes
  • operators can be redefined
  • objects are passed by reference


Class definition

Example 1:

class MyClass:
  i = 12345
  def f(self):
    return 'hello world'


Class objects

A class definition must be executed before it has any effects. When a class definition is left, a class object is created. The class object acts as a wrapper around the contents of the namespace created by the class definition.

Class objects support two kinds of operations: attribute references and instantiation.

Attributes are referenced as expected: obj.name. Class attributes can also be assigned to.

Instantiation is done using function notation (). The special __init()__ method works as a kind of constructor, to initialize the new object to a given state. The __init()__ method may have arguments.

class Complex:
  def __init__(self, realpart, imagpart):
    self.r = realpart
    self.i = imagpart

x = Complex(3.0, -4.5)
x.r, x.i


Function objects and method objects

Consider this class definition:

class MyClass:
  i = 12345
  def f(self):
    return 'hello world'

MyClass.f is a reference to a function object. The function belongs to the class object.

MyClass().f is a reference to a method object. The method belongs to the class instance object.

If you have a reference to a method object m:

  • m.im_self refers to the instance object that the method belongs to
  • m.im_func refers to the function object that corresponds to the method


Data hiding

Data hiding is not possible since Pyhton has no concept of "private" or "protected". Clients may access a class object's and/or class instance object's data members in whatever way they want. They may even

  • change the value of a member
  • add new members
  • delete existing members (using the del keyword)


Inheritance

class DerivedClassName(BaseClassName):
  [...]

When an class attribute is referenced, the attribute is recursively searched for, first in the derived class itself, then in the base class, etc. This works both for data and for function attributes. For function attributes, this effectively provides the mechanism for method overriding.

To call the base class method:

BaseClassName.methodname(self, arguments)"


Multiple inheritance:

class DerivedClassName(Base1, Base2, Base3):
  [...]

With multiple inheritance, attribute lookup occurs depth-first, left-to-right.


Method calls

When an instance object's method is called, the first parameter passed to the method is always the instance object (self, this, ...).

# Equivalent
MyClass().f()
MyClass.f(MyClass())

# Equivalent
myObject = MyClass()
myObject.f()
MyClass.f(myObject)

This is perfect if you keep in mind that within methods you always have to use self to refer to data attributs of the instance that the method is operating on.


Static methods

Definition & use of static method through decorator @staticmethod:

class Foo:
  @staticmethod
  def doIt():
    pass

Foo.doIt()

Note: There is also a decorated called @classmethod. I have not (yet) understood what the difference is.


Object destruction

Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. An object becomes unreachable if there are no references left that point to the object.

# Create a reference to a dictionary object
a = {foo = 123, bar = 456}
# The dictionary object is referenced twice
b = a
# Remove a reference
del a
# Remove the second reference; the dictionary object becomes unreachable and may be garbage-collected
b = None



Introspection

If you have a class instance object o:

o.__class__
refers to the class that the object is an instance of


If you have a class object c:

c.__bases__
the tuple of base classes of a class object; if there are no base classes, this will be an empty tuple
c.__name__
the name of the class or type
c.__doc__
the docstring belonging to the class


If you have a reference to a method object m:

  • m.im_self refers to the instance object that the method belongs to
  • m.im_func refers to the function object that corresponds to the method


To check whether an object is an instance of a class that implements a certain interface:

isinstance(object, class)


To perform a similar operation on a class:

issubclass(class, class)


Exceptions

Handling exceptions

Exception handling is pretty straightforward:

  • the usual try clause
  • followed by the exception handlers
  • followed by an optional else clause which is executed if the try block did not raise an error
  • followed by an optional finally clause which is always executed, regardless of whether an exception occurred or not
import sys

try:
  f = open('myfile.txt')
  s = f.readline()
  i = int(s.strip())

# Assign exception instance to a variable
except IOError as exc:
  # Extract and print exception arguments
  errno, strerror = exc.args
  print("I/O error(%s): %s" % (errno, strerror))

except ValueError:
  print("Could not convert data to an integer.")

# Catch different exception types by naming them in a parenthesized tuple
except (RuntimeError, TypeError, NameError):
  pass

# Catch all exceptions by omitting the name
except:
  (exc_type, exc_value, exc_traceback) = sys.exc_info()
  # exc_type = the object identifying the exception (object has class "type")
  # exc_value = the actual exception object (class depends on the raised exception); passing this to print() usually prints the "reason" embedded in the exception
  # exc_traceback = a traceback object (object has class "traceback") identifying the point in the program where the exception occurred
  print("Unexpected error: " + str(exc_value))
  raise

# Executes if no exception occurred
else:
  print("executing else clause")

# Always executes
finally:
  f.close()


Raising exceptions

In Python, exceptions are raised, not thrown. This is how it works:

# Specify the exception name followed by the exception argument
try:
  raise NameError('HiThere')

# Catch the exception, then re-raise it
except NameError:
  print('An exception flew by!')
  raise


Custom exception types

Best practices:

  • derive from the Exception class
  • the exception name should end in "Error"
  • if a module can raise several exceptions, create a base class for exceptions defined by that module, and subclass that to create specific exception classes for different error conditions

See this example.


Executing a Python script

Basics

The script must have the executable bit set and contain a shebang at the top.

osgiliath:~/py# ls -l helloworld 
-rwxr-xr-x 1 root root 39 Sep 27 22:17 helloworld
osgiliath:~/py# cat helloworld 
#!/usr/bin/python

print("hello world")


main() function

The following construct defines & executes a main() function. Note that it is not at all necessary to have a main() function!

def main():
  print("hello world")

if __name__ == "__main__":
  main()

Discussion

  • the __name__ attribute in this context refers to the name of the current module
  • the module named "__main__" is a special module provided by the Python runtime; the module represents the (otherwise anonymous) scope in which the interpreter's main program executes


If a module is executed like this:

python foobar.py <arguments>

the module's __name__ attribute is set to "__main__". The module can therefore include code such as the following to detect when it is run as a standalone program:

if __name__ == "__main__":
  do_something()

To quote from diveintopython.org:

The if __name__ trick allows this program do something useful when run by itself, without interfering with its use as a module for other programs.


Command line arguments

Command line arguments are stored in the sys module's argv attribute as a list:

import sys
print(sys.argv)

The getopt module processes sys.argv using the conventions of the Unix getopt() function. More powerful and flexible command line processing is provided by the optparse module.

Modules:

sys
http://www.python.org/doc/current/library/sys.html
getopt
http://www.python.org/doc/current/library/getopt.html
optparse
http://www.python.org/doc/current/library/optparse.html (example from my mkroesti project)


Coding style guide

I break the following "rules" from the coding style guide in PEP 8:

  • Limit all lines to a maximum of 79 characters
    • I do this for docstrings
    • I do this for statements that lend themselves for elegant/clear representation on multiple lines
    • I don't do this just for the sake of some hypothetical 80-characters-per-line limited device, because to my eyes a statement spaced out over multiple lines usually looks just like garbage (the example used in PEP 8 to demonstrate line wrapping is just such an example)


Rules that I no longer break because I have seen their wisdom :-)

  • Use 4 spaces per indentation level.
    • I formely used 2 spaces only because I was thinking 4 spaces is excessive
    • After a relatively short time I noticed that the structure of the code was often hard to see: Where does the scope of the function/class/if-block begin/end?
    • At first I blamed Python's group-statements-by-indentation and bitterly wished for the braces I was accustomed to from C/C++/Java
    • After some time I stopped griping because this is just a fact that I can't change
    • Instead I tried out 4-spaces-per-indent-level and suddenly my code looked better


Stuff

Callable object

A callable object is an instance object that is "called" as if it were a function.

The class must define a method __call__, then "calling" an instance foo of that class like this

foo(arg1, arg2, ...)

is the same as saying

foo.__call__(arg1, arg2, ...)


Null object

The null object is returned by functions that don't explicitly return a value. It supports no special operations. There is exactly one null object, named

None

(a built-in name).


Statement continuation

Although a backslash can be used to continue a statement on a next line, it is usually better to use paranthesis like this (example copied from the Idioms and Anti-Idioms in Python article):

value = (foo.bar()['first'][0]*baz.quux(1, 2)[5:9]
        + calculate_number(10, 20)*forbulate(500, 360))

(the main reason cited in the article to avoid backslashes is that a stray space character after a backslash will break line continuation)


Source code file encoding

PEP 263 describes how to specify the encoding of files that contain Python source code. This is interesting for me because my surname "Näf" contains a non-ASCII character.

It all boils down to the first or second line of the file containing a comment line that satisfies a regular expression described in the PEP. An example:

# coding=<encoding name>


My files all look like this:

#!/usr/bin/env python
# coding=utf-8


Documentation

Docstrings

Functions, classes, etc. can (and should) all be documented using a Python feature called "Documentation Strings" (or "docstrings" for short). For reference, see this overview and the docstring conventions in [PEP 257.

A docstring must be a string literal that occurs as the first statement in a module, functin, class or method definition. A function documentation might look like this:

def doSomething():
  """Summary line, should start with a capital letter and end with a period.

  The second line should always be blank to visually separate the summary sentence(s)
  from the follow-up detailed paragraphs. The detailed description may consist of
  multiple paragraphs and has no restrictions about what it should contain.

  Documentation parsers determine what indentation to use for formatting from the
  first ''non-blank'' line after the first line of the docstring.
  """

To print out an entity's docstring in code:

print(doSomething.__doc__)

To print out the documentation of a module "bar" within package "foo" on the command line:

pydoc foo.bar


Note: The docstrings feature does not define any specific markup, the markup depends on the tool that is desired for processing the docstrings.


reStructuredText

reStructuredText is a special way to markup Python docstrings (or any other source code documentation, for that matter). It has been developed by the docutils project, and the primary document is found here:

http://docutils.sourceforge.net/rst.html

With the 2.6 release, Python has changed its documentation format from LaTeX to reStructuredText. A primer can be found here:

http://docs.python.org/documenting/rest.html

The toolset that processes the Python documentation into HTML is called Sphinx. Its web site is found here:

http://sphinx.pocoo.org/


Doxygen

Apparently Doxygen also supports the Python language, however I have not investigated this since I am quite happy with Python's docstring feature.


Distributing / Installing Python Modules

Overview

The standard way of distributing a Python module, or installing such a distributed module, is to use the module

distutils

from the Python Standard Library.


Some references:

Creating a distribution package
http://www.python.org/doc/current/distutils/introduction.html
Installing a package
http://www.python.org/doc/current/install/index.html
Index of distutils docs
http://www.python.org/doc/current/distutils/index.html


Creating a distribution

Steps required:

  • write a setup script (setup.py by convention)
  • (optional) write a setup configuration file (setup.cfg by convention)
  • (optional for source distribution) write a manifest template file (MANIFEST.in by convention)
  • create a source distribution
  • (optional) create one or more built (binary) distributions (e.g. a Debian package, a Windows installer, etc.)


A simple setup.py:

from distutils.core import setup
setup(name='foo',
      version='1.2.3',
      py_modules=['foo'],
      )

Note: Within the setup script, use "/" as path separator. distutils will take care of converting this into the platform specific path separator.

A simple MANIFEST.in:

include COPYING

Note: Instead of a manifest template, it is also possible to provide the actual manifest. In this case, the manifest file must specify every single file to include in the distribution (even setup.py)


To create a source distribution foo-1.2.3.tar.gz:

python setup.py sdist


The source distribution will contain the following stuff:

  • Python source files (py_modules and packages options in setup.py)
  • Script files (scripts options in setup.py)
  • README.txt (or README)
  • setup.py
  • setup.cfg
  • test/test*.py
  • files mentioned in MANIFEST.in

Note: Build files and versioning files (e.g. .svn) are removed automatically by distutils).


If a manifest file is already present when the "sdist" command is executed, it will be re-created automatically if setup.py or MANIFEST.in are newer. The manifest file needs to be regenerated manually, however, if only files have been added/removed that match an existing file pattern in setup.py or MANIFEST.in:

# Create a new source distribution
python setup.py sdist --force-manifest
# Regenerate manifest file but do not create a source distribution
python setup.py sdist --manifest-only


setup.py for my standard project directory layout

My usual project directory layout looks like this:

base
 +-- doc
 |    +-- README
 |        [...]
 +-- src
 |    +-- packages
 |    |    +-- package_A
 |    |    |    +-- foo.py
 |    |    +-- package_B
 |    |         +-- bar.py
 |    +-- tests
 |    |    +-- package_A
 |    |    |    +-- foo_test.py
 |    |    +-- package_B
 |    |         +-- bar_test.py
 |    +-- scripts
 |         +-- foo
 +-- setup.py
 +-- MANIFEST.in

Note: I would have preferred to have a dist subfolder that contains setup.py and MANIFEST.in. Unfortunately this did not work as intended: although in setup.py I was able to specify the package root as "../src/packages", the MANIFEST.in stubbornly refused to accept a recursive-include directory "../doc" (I always got the error "warning: no files found matching '*' under directory '../doc'")


A more complex setup.py that reflects the above directory structure looks like this:

TODO

The accompanying setup.cfg:

TODO

The accompanying MANIFEST.in:

TODO


Installing a distribution

The usual sequence is this:

tar xfvz python-foo-1.2.3.tar.gz 
cd python-foo-1.2.3
./setup.py build
./setup.py install

If something out of the ordinary is required for building/installing the module, it should be mentioned in the file README.


The build step:

  • This step is responsible for putting the files to install into a build directory
  • By default, this is named build, located directly below the distribution root
  • The build directory can be changed using --build-base option; e.g.
python setup.py build --build-base=/tmp/pybuild/foo-1.2.3


The install step:

  • This step is responsible for copying everything under build/lib (or build/lib.plat) to the chosen installation directory
  • The standard location of the installation directory is system-dependent; to find out what it is, do the following in an interactive Python shell:
>>> import sys
>>> sys.prefix
'/System/Library/Frameworks/Python.framework/Versions/2.5'
>>> sys.exec_prefix
'/System/Library/Frameworks/Python.framework/Versions/2.5'
  • The installation directory can be changed using a number of different schemes
    • The "home" scheme: python setup.py install --home=~
    • The "prefix" scheme: python setup.py install --prefix=/usr/local
  • For details about the "home" and "prefix" scheme, or for even more customized schemes, see this reference (already cited further up)


Testing

Running the tests

Preferred way to run all tests of the project:

python setup.py test

(this requires some coding in setup.py, see the next chapter for details)


Running all tests of a test module:

python module.py

(this requires that module.py contains code that calls unittest.main())


Running specific tests in a test module:

python unittest.py module.FooTestSuite
python unittest.py module.FooTestCase
python unittest.py module.FooTestCase.testBar


Directory layout

For maximum ease of use, I want to be able to run my test cases like this:

python setup.py test

To achieve this, I organize my unit tests in a directory structure that parallels the directory structure of the project's source code. An example is provided further up where I explain my standard project directory layout.


Adding a "test" command to distutils

Useful reference: http://da44en.wordpress.com/2002/11/22/using-distutils/. Another resource that might be worth investigating is http://peak.telecommunity.com/DevCenter/setuptools.


A bare-bones subclass for distutils.cmd.Command looks like this:

class TestCommand(Command):
    user_options = list()

    def initialize_options(self):
        pass
    def finalize_options(self):
        pass
    def run(self):
        pass


A more interesting example is this:

# PSL
from distutils.cmd import Command
import unittest
import sys

# Extend search path for packages and modules. This is required for finding the
# "tests" package and its modules.
PACKAGES_BASEDIR = "src/packages"
sys.path.append(PACKAGES_BASEDIR)

class test(Command):
    """Implements a distutils command to execute unit tests.

    The class name is the same as the command name string used in the 'cmdclass'
    dictionary passed to the setup() function further down. The reason for this
    is that, unfortunately, 'python setup.py test --help' will print out the
    class name instead of the name used in the dictionary (or the 'command_name'
    attribute defined in this class).
    """

    # This must be a class attribute; it is used by
    # "python setup.py --help-commands"
    description = "execute unit tests"

    # Options must be defined in a class attribute. The attribute value is a
    # list of tuples. Each tuple defines an option and must contain 3 values:
    # long option name, short option name, and a description to print with
    # --help. An option that should have an argument must have the suffix "=".
    # Each option defined in user_options must have a data attribute with a
    # name that corresponds to the long name of the option. For instance, an
    # option "--foo-bar" requires an attribute "foo_bar". If the user has
    # specified the option, a value is set to the data attribute. If the
    # option has no argument, the attribute value is set to 1. If the option
    # has an argument, the attribute value is set to the argument value.
    user_options = [("suite=", "s", "run test suite for a specific module [default: run all tests]")]

    def __init__(self, dist):
        # This data attribute is returned by Command.get_command_name()
        self.command_name = "test"
        Command.__init__(self, dist)

    def initialize_options(self):
        # The default value is a callable defined in tests.__init__.py. The user
        # must specify something like this: "--suite tests.test_algorithm"
        self.suite = "tests.allTests"   

    def finalize_options(self):
        pass

    def run(self):
        tests = unittest.defaultTestLoader.loadTestsFromName(self.suite)
        testRunner = unittest.TextTestRunner(verbosity = 1)
        testRunner.run(tests)

setup(
      # Add a command named "test". The name string in the dict is also used by
      # "python setup.py --help-commands", but not by "python setup.py test -h"
      cmdclass = { "test" : test },
      [...]
     )


Coding

Overview

  • the smallest unit to test is represented by the TestCase class
  • subclasses of TestCase implement various test methods; test method names usually begin with "test" (although it's also possible to override the single method runTests())
  • subclasses of TestCase may implement setUp() and tearDown() to define a test fixture
  • each test method is executed with a new TestCase instance
  • the class TestSuite aggregates TestCase and other TestSuite instances
  • a test runner such as TextTestRunner finally executes a number of tests


TestCase examples:

class MyTestCase(unittest.TestCase):
    def setUp():
        passs
    def tearDown():
        passs
    def testFoo():
        passs
    def testBar():
        passs

# Create instances that will execute the named test method
# Note: This gets tedious with lots of test cases and test methods.
# We will see a better way how to do this.
fooTestCase = MyTestCase('testFoo')
barTestCase = MyTestCase('testBar')


TestSuite examples:

# Simple way to aggregate test cases into a test suite
myTestSuite1 = unittest.TestSuite()
myTestSuite1.addTest(MyTestCase('testFoo'))
myTestSuite1.addTest(MyTestCase('testBar'))

# Another way
tests = ["testFoo", "testBar"]
myTestSuite2 = unittest.TestSuite(map(MyTestCase, tests))

# A third way. TestLoader relies on the fact that test method names
# begin with "test"
myTestSuite3 = unittest.defaultTestLoader.loadTestsFromTestCase(MyTestCase)

# A fourth way to get at all tests within an entire module
myTestSuite4 = unittest.defaultTestLoader.loadTestsFromModule(mymodule)

# A last way to get at tests within a module, TestCase, etc. See
# docs for the unittest module for exact behaviour, options and overloads
myTestSuite5 = unittest.defaultTestLoader.loadTestsFromName("mymodule")


If it should be possible to run a test module in standalone mode, the module must contain this code at the bottom (see docs for the unittest module for more options on the unittest.main() method):

if __name__ == "__main__":
    unittest.main()


Finally, these are some of the assertions that the TestCase class defines. Each takes a string message as an optional last argument that can be used e.g. to indicate the exact nature of the failure:

  • assert(expr)
  • assertEqual(first, second)
  • assertNotEqual(first, second)
  • assertRaises(exception, callable)
  • fail()


Using A Python Program In the Web

TODO


Eclipse and Python

Pydev

Pydev is an Eclipse plugin for Python (and Jython) development. It can be installed from this update site:

http://pydev.sourceforge.net/updates/

Note: The former "Pydev Extensions" plugin is now open source and part of Pydev.


Workspace configuration

The minimal configuration is to define one or more Python interpreters in "Preferences -> Pydev -> Interprether Python -> New". When such an interpreter is added, Pydev automatically finds a number of paths that contain modules for the interpreter. It then suggests to add these paths to the PYTHONPATH for that interpreter. This suggestion is usually OK and should be accepted.

I usually add the following interpreters:

  • The system interpreter /usr/bin/python (2.5.1 on Mac OS X 10.5)
  • The interpreter installed via fink /sw/bin/python2.5 (2.5.2 as of this writing)
  • Interpreters installed into /Library/Frameworks/Python (I usually add the latest 2.6.x and 3.x for compatibility testing)


Project configuration

Now that the interpreters have been configured, work on a project can begin:

  • Switch to the Pydev perspective
  • Create a new project: File -> New -> Pydev Project
    • Pydev projects consist of a .project and a .pydevproject file
    • Pydev suggests creating a src folder; if this is accepted, the folder will be added to the project's PYTHONPATH
  • To use the project with a new SVN repository
    • Create a new, empty repository
    • Select project, e.g. in the Pydev Package Explorer view
    • Context Menu -> Team -> Share Project
    • Select "SVN"
    • Enter repository URL (e.g. http://www.herzbube.ch/svn/mkroesti)
    • Folder name = trunk
    • Enter an initial comment (e.g. "add Eclipse project files")
  • Alternatively, to connect the project to an SVN repository that already exists and has content
    • Do the same as above, but enter an URL + folder name that point to the location where the SVN repository lives
    • Subclipse will warn that the specified folder already exists in the given repository; you can now say "yes" to let Subclipse checkout the folder and connect the project to the working copy
    • Subclipse will offer to switch to the "Team Synchronize" perspective; say "yes"
    • you will see that the two project files .project and a .pydevproject are marked as "added"
    • right-click on the project and select "Commit..." to add the project files to the repository
    • switch back to the "Pydev" perspective
  • configure the project's PYTHONPATH
    • Select project, e.g. in the Pydev Package Explorer view
    • Context Menu -> Properties -> PyDev PYTHONPATH
    • Add whatever path is needed, e.g. if the project has packages, replace the src folder (which was automatically added when the project was created) by src/packages


Run unit tests

  • setup.py with "test" command
    • open "Run configurations" dialog
    • double-click "Python Run" to create a new configuration
    • give it a name, e.g. "mkroesti tests (setup.py)"
    • select project, e.g. mkroesti
    • select main module, e.g. browse for "setup.py" (will result in something like "${workspace_loc:mkroesti/setup.py}")
    • on the "Arguments" tab, set the program arguments to "test"
  • run all tests in a directory
    • the easiest way is to let PyDev create the run configuration for you
    • right-click on the folder that contains your tests and select "Run as... -> Python unittest"
    • the resulting run configuration is configured as follows
    • name = <projectname> tests (e.g. "mkroesti tests")
    • project = <project> (e.g. mkroesti)
    • main module = <folder-with-tests> (e.g. ${workspace_loc:mkroesti/src/packages/tests})
    • on the "Arguments" tab, the working directory is set to <folder-with-tests> (e.g. ${workspace_loc:mkroesti/src/packages})


Python Package Index (PyPi)

Website: http://pypi.python.org/


How to submit a package to PyPi:

  • You need to register a user account before you can submit any packages.
  • Package submission works in one of three ways
    • ./setup.py register
    • Submit the file PKG-INFO that is generated by ./setup.py sdist (the file can be found inside the generated tar ball)
    • Manually enter package information on the submission page
  • I found that setup.py's register command works well, although Python 2.5 choked on my name because it contains a Unicode character; I hade to invoke setup.py with a Python 3 interpreter to make it work
  • Multiple submissions for a package
    • The latest submission will overwrite the previous submissions for the same version
    • The latest submission will "hide" all previous submissions for other versions, i.e. listings and searches will find only the version of the latest submission. Besides the obvious intention of displaying only the newest version of a package, this feature is also useful if a submission has been made for a wrong version: Simply fix the version number in setup.py and re-submit the package
    • The package admin web interface can be used to "un-hide" a hidden submission (I have not investigated how this works)


Further references:


Software

Interesting software related to Python

py2app (url)
Convert Python scripts into standalone Mac OS X applications