LearningPython
The purpose of this page is to keep my notes about my effort to learn the Python progamming language. I doubt very much that this page is of any use to somebody besides myself. If you are not myself, you probably better look at one of the tutorials listed in the "References" section.
Why Python?
The first job of my working career thoroughly taught me shell and awk script programming. Although I was (and am) aware that these script languages are just not suitable for some tasks, I never got around to learn another interpreted programming language that would fill the gap between shell scripts and C++ or Objective-C. I was forced to familiarize myself with perl at some time, but never got along with it, so that doesn't count. When I learned of the existence of ruby, I immediately intended to have a look at it, but somehow there were always other, more important things to do.
Now there is Python which offers itself as another interesting candidate for the "stop-gap" role :-) ... And since I have just started to become involved with ISFDB (whose programs are written in Python) I finally have a reason to get acquainted with something new.
So let's get started...
References
Tutorials:
- python.org
- http://docs.python.org/tutorial/
- Dive Into Python
- http://diveintopython.org/
From python.org:
- List of beginner's resources
- http://wiki.python.org/moin/BeginnersGuide/Programmers
- String handling
- http://docs.python.org/library/stdtypes.html#string-methods
- Style guide for Python code
- http://www.python.org/dev/peps/pep-0008/
- Docstring conventions
- http://www.python.org/dev/peps/pep-0257/
- Unit testing framework
- http://www.python.org/doc/current/library/unittest.html
- Installing Python modules
- http://www.python.org/doc/current/install/index.html
- Distributing Python modules
- http://www.python.org/doc/current/distutils/index.html
From wikipedia.org:
- http://en.wikipedia.org/wiki/Python_(programming_language)
- http://en.wikipedia.org/wiki/Python_syntax_and_semantics
Glossary
Also consult Python's own glossary.
- PEP
- Pyhton Enhancement Proposal (see this index)
- class object
- when the interpreter has finished executing the statements of a class definition, a class object is created; the object can be referenced using the class name
- class attribute
- any name in the class object, i.e. both "variables" and functions
- class instance object
- the instance of a class
- object
- general term, i.e. there are other types of objects than just class instance objects (e.g. list objects)
- data attribute
- a variable that "belongs to" an object
- method
- a function that "belongs to" an object
- function object
- if a class MyClass defines a function foo(), the following refers to a function object: MyClass.foo
- method object
- if myObject is an instance of MyClass (see above), the following refers to a method object: myObject.foo
- kwarg
- keyword argument
- PyPI
- Python Package Index (sometimes also known as "The Cheese Shop")
Coding Python
First impressions
- Statements are grouped by indentation. Seems yucky!
- No variable or argument declarations are necessary. Is this a good thing? Is there a strict mode?
- No char type, single characters are simply strings of length 1. Good!
- Seems to have good unicode support
- Data types: integers, floating points, complex numbers, strings, lists
- Strings are immutable, lists are not
- Slices are useful to access parts of strings and lists
- Right-hand side of an assignment is evaluated before any assignment takes place (important for multiple assignment)
- Zero = false, non-zero = true; empty sequence = false, non-empty sequence = true
- There is the concept of sequences - lists are sequences
Variables
Declare a variable by assigning it a value:
foo = 'bar'
Remove the variable declaration with the del statement:
del foo
Data types
Sequences
There are six sequence types:
- strings (immutable)
- Unicode strings (immutable)
- lists (mutable)
- tuples (immutable)
- buffers (immutable)
- xrange objects.
Sequences are indexed by a range of numbers.
The in and not in keywords test whether or not a sequence contains a certain value:
a = ['cat', 'window', 'defenestrate'] if 'window' in a: print('is in list') if 'n' not in ('y', 'ye', 'yes'): print('is not in list')
See http://www.python.org/doc/current/library/stdtypes.html#string-methods for useful stuff that you can do with sequence types.
Lists
A list is a mutable sequence type.
list1 = [12345, 54321, 'hello!'] # Refer to single list elements by index position element = list1[2] # Refer to list elements by slice notation (results in another list); in this example, list2 refers to 54321 list2 = list1[1:2] # Empty list list3 = list() # Two equivalent ways to append the elements of one list (list2) to another list (list1) list1.extend(list2) list1[len(list1):] = list2 # Append a single element to a list list1.append('world') # Remove an element from a list, in place. The element must exist, otherwise an error is raised. list1.remove(12345) # Counting list members len(list1) list1.count('world') # Comparing lists list1 = [12345, 54321] list2 = [54321, 12345] assert(list1 != list2) assert(sorted(list1) == sorted(list2)) list1.sort() list2.sort() assert(list1 == list2) # Copying a list list2 = list1[:]
Tuples
A tuple is an immutable sequence type. It consists of a number of values separated by commas:
t = 12345, 54321, 'hello!' # Tuples may be nested: u = t, (1, 2, 3, 4, 5) # Empty tuple empty = () # Tuple with 1 element needs a trailing comma singleton = 'hello',
Sets
A set is an unordered collection with no duplicate elements.
basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana'] fruit = set(basket) # create a set without duplicates a = set('abracadabra')
Dictionaries
A dictionary is an unordered set of key/value pairs, with the requirement that the keys are unique. A key can be any immutable type:
- Strings and numbers can always be keys
- Tuples can be used as keys if they contain only strings, numbers, or tuples; if a tuple contains any mutable object either directly or indirectly, it cannot be used as a key
- Lists cannot be used as keys because lists can be modified in place
tel = {'jack': 4098, 'sape': 4139} # Add an entry tel['guido'] = 4127 # Use dict() to build dictionary from lists of key-value pairs stored as tuples dict([('jack', 4098), ('sape', 4139), ('guido', 4127)]) # Use dict() with keyword arguments (keys are strings) dict(jack=4098, sape=4139, guido=4127) # Return a list with dictionary keys or values tel.keys() tel.values() # Comparing dictionaries dictl = {"a": 17, "b": 42} dict2 = {"b": 42, "a": 17} assert(dict1 == dict2) # Copying a dictionary dict2 = dict1.copy() # shallow copy (sufficient if values are immutable) import copy dict2 = copy.deepcopy(dict1) # deep copy (e.g. if values are mutable, such as lists)
Strings
References
- Introduction
- http://www.python.org/doc/current/tutorial/introduction.html#strings
- String methods
- http://www.python.org/doc/current/library/stdtypes.html#string-methods
- String services
- http://www.python.org/doc/current/library/string.html
Literals
String literals:
- can be enclosed in single or double quotes
- multi-line strings when using single or double quotes must use a backslash ("\") to indicate line continuation
- "\n" indicates newlines
- the following example defines a raw string literal where the backslash loses its special properties:
foo = r"one two \n three"
- a string literal can also be enclosed in triple quotes, this is used e.g. in docstrings; there is no need for backslashes to indicate line continuation or newlines
- the following example defines a unicode string literal using the unicode character with ordinal value 0x0020 (= a space); note that the interpreation of the "ä" character depends entirely on the encoding of the source file that contains the literal - if it's in latin-1 then the result will be incorrect!
foo = u"Patrick\u0020Näf"
Conversion
The str() function converts its argument into a string according to the string conversion rules specific to the argument's type. TODO: Find exact definitions, e.g. for numeric values, for class instance objects.
Operations
- concatenation using the "+" operator
- length using the "len()" function
- subscription using the [] operator
- "foo[0]" refers to the first character
- "foo[-1]" refers to the last character
- "foo[2:5]" refers to characters at index positions 2-4
- "foo[:5]" refers to characters at index positions 0-4
- "foo[5:]" refers to characters from index position 5 until end-of-string
- "foo[-2:]" refers to the last two characters
- splitting into parts using the split() function:
"a,b,c".split(",") # results in a list
Flow control
if
if x < 0: x = 0 print('Negative changed to zero') elif x == 0: print('Zero') elif x == 1: print('Single') else: print('More')
for
Python's for statement iterates over the items of any sequence:
a = ['cat', 'window', 'defenestrate'] for x in a: print(x, len(x))
It is not safe to modify the sequence being iterated over in the loop. Instead, iterate over a copy, e.g. with a slice:
for x in a[:]: # make a slice copy of the entire list if len(x) > 6: a.insert(0, x)
To iterate over the indices of a sequence, combine range() and len() as follows:
a = ['Mary', 'had', 'a', 'little', 'lamb'] for i in range(len(a)): print(i, a[i])
while
a, b = 0, 1 while b < 10: print(b) a, b = b, a+b
break, continue, else on loops
break and continue work as expected.
Loop statements may have an else clause; it is executed when the loop terminates through exhaustion of the list (with for) or when the condition becomes false (with while), but not when the loop is terminated by a break statement.
for n in range(2, 10): for x in range(2, n): if n % x == 0: print(n, 'equals', x, '*', n/x) break else: # loop fell through without finding a factor print(n, 'is a prime number')
Logical and other operators
# Logical operators if x < 0 and y > 42: print('and') if x < 0 or y > 42: print('or') if not (x < 0 and y > 42): print('not') # Membership operators if x in range(2, 10): print('in') if x not in range(2, 10): print('not in') # Identity operators if x is y: print('is') if x is not y: print('is not') # Arithmetic operators x = 50 % 42 # modulus; result is 8 x = 2 ** 8 # exponent; result is 256 x = 9 // 2 # floor division; result is 4 y = 9.0 // 2.0 # floor division; result is 4.0
Functions
Use def to start a function definition:
def foobar(n, paramwithdefvalue = 17): <do something> return 42 # Call function result = foobar(2000)
Local variables (symbols) shadow global variables (symbols). Global variables can be accessed, but cannot be changed (unless using the global statement).
Parameters are passed using "call by [object] reference". If the parameter is a mutable object, changing the object will let the caller see those changes.
Arbitrary number of arguments
A function can be called with an arbitrary number of arguments. These arguments will be wrapped up in a tuple:
def fprintf(file, format, *args): file.write(format % args)
When the arguments are already in a list or tuple but need to be unpacked for a function call:
args = [3, 6] range(*args)
Keyword arguments (kwargs)
Functions can be called using a "keyword = value" syntax. The keyword must match the name of a formal parameter. The main advantage is that we don't have cryptic function calls like
doIt(1, 7, 2.2, 'hmmm')
For instance, we might call the function in the above example like this:
def foobar(foo, bar): <do something> foobar(bar = 99, foo = 'alright')
Function call with arbitrary number of keyword arguments that will then be packed into a dictionary:
def foobar(foo, bar, **keywords): print('foo = ', foo) print('bar = ', bar) keys = keywords.keys() keys.sort() for kw in keys: print(kw, ':', keywords[kw]) foobar(bar = 99, foo = 'alright', keyword1 = 'value1', keyword2 = 'value2')
When the arguments are already in a dictionary but need to be unpacked for a function call:
def foobar(foo, bar): <do something> dict = {"foo": "99", "bar": "alright'} foobar(**dict)
Modules
How to use modules
# This loads the file "foobar.py" import foobar # Execute a function from the module foobar.doIt(42) # Assign and use local name localDoIt=foobar.doIt() localDoIt(42) # Import certain items directly from module from foobar import doIt, dontDoIt dontDoIt("why not") # Import everything from module (except names beginning with underscore) from foobar import * # Load a module from a package "Sound" and its sub-package "Effects" import Sound.Effects.echo Sound.Effects.echo.doSomething() # Import an entire module from a package, making it available without package prefix from Sound.Effects import echo echo.doSomething() # Import all modules from a package that are listed in the "__all__" variable # The variable must be set by the package's file __init__.py from foobar import *
The search path for modules and packages is the list of directories stored in the variable sys.path. This variable is initialized with the following values:
- the current directory (".")
- the content of the environment variable PYTHONPATH; this has the same syntax as the shell variable PATH
- an installation-dependent default path (e.g. /usr/local/lib/python)
Note: A program that knows what it is doing can change the content of sys.path to influence where modules are searched for.
Find out which names a module defines:
# Examine "sys" module dir(sys) # List currently defined names dir()
How to define modules
If someone says
import foobar
the module foobar must be located in a file named foobar.py. The module file does not need to have a special structure.
A module can be located within a package, which is represented by a directory that contains a file
__init__.py
The file can contain
- nothing
- arbitrary initialization code
- a definition of the variable "__all__"; this allows clients to say something like "from foo import *", which in the following example would import the modules "bar1", "bar2" and "bar3", but not module "bar4" or any other module also present within package "foo"
__all__ = ["bar1", "bar2", "bar3"]
Object Orientation
Features
- multiple inheritance
- no "virtual" or similar keyword, all methods can be overridden
- all members are public
- everything is an object: data types, classes
- operators can be redefined
- objects are passed by reference
Class definition
Example 1:
class MyClass: i = 12345 def f(self): return 'hello world'
Class objects
A class definition must be executed before it has any effects. When a class definition is left, a class object is created. The class object acts as a wrapper around the contents of the namespace created by the class definition.
Class objects support two kinds of operations: attribute references and instantiation.
Attributes are referenced as expected: obj.name. Class attributes can also be assigned to.
Instantiation is done using function notation (). The special __init()__ method works as a kind of constructor, to initialize the new object to a given state. The __init()__ method may have arguments.
class Complex: def __init__(self, realpart, imagpart): self.r = realpart self.i = imagpart x = Complex(3.0, -4.5) x.r, x.i
Function objects and method objects
Consider this class definition:
class MyClass: i = 12345 def f(self): return 'hello world'
MyClass.f is a reference to a function object. The function belongs to the class object.
MyClass().f is a reference to a method object. The method belongs to the class instance object.
If you have a reference to a method object m:
- m.im_self refers to the instance object that the method belongs to
- m.im_func refers to the function object that corresponds to the method
Data hiding
Data hiding is not possible since Pyhton has no concept of "private" or "protected". Clients may access a class object's and/or class instance object's data members in whatever way they want. They may even
- change the value of a member
- add new members
- delete existing members (using the del keyword)
Inheritance
class DerivedClassName(BaseClassName): [...]
When an class attribute is referenced, the attribute is recursively searched for, first in the derived class itself, then in the base class, etc. This works both for data and for function attributes. For function attributes, this effectively provides the mechanism for method overriding.
To call the base class method:
BaseClassName.methodname(self, arguments)"
Multiple inheritance:
class DerivedClassName(Base1, Base2, Base3): [...]
With multiple inheritance, attribute lookup occurs depth-first, left-to-right.
Method calls
When an instance object's method is called, the first parameter passed to the method is always the instance object (self, this, ...).
# Equivalent MyClass().f() MyClass.f(MyClass()) # Equivalent myObject = MyClass() myObject.f() MyClass.f(myObject)
This is perfect if you keep in mind that within methods you always have to use self to refer to data attributs of the instance that the method is operating on.
Static methods
Definition & use of static method through decorator @staticmethod:
class Foo: @staticmethod def doIt(): pass Foo.doIt()
Note: There is also a decorated called @classmethod. I have not (yet) understood what the difference is.
Object destruction
Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. An object becomes unreachable if there are no references left that point to the object.
# Create a reference to a dictionary object a = {foo = 123, bar = 456} # The dictionary object is referenced twice b = a # Remove a reference del a # Remove the second reference; the dictionary object becomes unreachable and may be garbage-collected b = None
Introspection
If you have a class instance object o:
- o.__class__
- refers to the class that the object is an instance of
If you have a class object c:
- c.__bases__
- the tuple of base classes of a class object; if there are no base classes, this will be an empty tuple
- c.__name__
- the name of the class or type
- c.__doc__
- the docstring belonging to the class
If you have a reference to a method object m:
- m.im_self refers to the instance object that the method belongs to
- m.im_func refers to the function object that corresponds to the method
To check whether an object is an instance of a class that implements a certain interface:
isinstance(object, class)
To perform a similar operation on a class:
issubclass(class, class)
Exceptions
Handling exceptions
Exception handling is pretty straightforward:
- the usual try clause
- followed by the exception handlers
- followed by an optional else clause which is executed if the try block did not raise an error
- followed by an optional finally clause which is always executed, regardless of whether an exception occurred or not
import sys try: f = open('myfile.txt') s = f.readline() i = int(s.strip()) # Assign exception instance to a variable except IOError as exc: # Extract and print exception arguments errno, strerror = exc.args print("I/O error(%s): %s" % (errno, strerror)) except ValueError: print("Could not convert data to an integer.") # Catch different exception types by naming them in a parenthesized tuple except (RuntimeError, TypeError, NameError): pass # Catch all exceptions by omitting the name except: (exc_type, exc_value, exc_traceback) = sys.exc_info() # exc_type = the object identifying the exception (object has class "type") # exc_value = the actual exception object (class depends on the raised exception); passing this to print() usually prints the "reason" embedded in the exception # exc_traceback = a traceback object (object has class "traceback") identifying the point in the program where the exception occurred print("Unexpected error: " + str(exc_value)) raise # Executes if no exception occurred else: print("executing else clause") # Always executes finally: f.close()
Raising exceptions
In Python, exceptions are raised, not thrown. This is how it works:
# Specify the exception name followed by the exception argument try: raise NameError('HiThere') # Catch the exception, then re-raise it except NameError: print('An exception flew by!') raise
Custom exception types
Best practices:
- derive from the Exception class
- the exception name should end in "Error"
- if a module can raise several exceptions, create a base class for exceptions defined by that module, and subclass that to create specific exception classes for different error conditions
See this example.
Executing a Python script
Basics
The script must have the executable bit set and contain a shebang at the top.
osgiliath:~/py# ls -l helloworld -rwxr-xr-x 1 root root 39 Sep 27 22:17 helloworld osgiliath:~/py# cat helloworld #!/usr/bin/python print("hello world")
main() function
The following construct defines & executes a main() function. Note that it is not at all necessary to have a main() function!
def main(): print("hello world") if __name__ == "__main__": main()
Discussion
- the __name__ attribute in this context refers to the name of the current module
- the module named "__main__" is a special module provided by the Python runtime; the module represents the (otherwise anonymous) scope in which the interpreter's main program executes
If a module is executed like this:
python foobar.py <arguments>
the module's __name__ attribute is set to "__main__". The module can therefore include code such as the following to detect when it is run as a standalone program:
if __name__ == "__main__": do_something()
To quote from diveintopython.org:
The if __name__ trick allows this program do something useful when run by itself, without interfering with its use as a module for other programs.
Command line arguments
Command line arguments are stored in the sys module's argv attribute as a list:
import sys print(sys.argv)
The getopt module processes sys.argv using the conventions of the Unix getopt() function. More powerful and flexible command line processing is provided by the optparse module.
Modules:
- sys
- http://www.python.org/doc/current/library/sys.html
- getopt
- http://www.python.org/doc/current/library/getopt.html
- optparse
- http://www.python.org/doc/current/library/optparse.html (example from my mkroesti project)
Coding style guide
I break the following "rules" from the coding style guide in PEP 8:
- Limit all lines to a maximum of 79 characters
- I do this for docstrings
- I do this for statements that lend themselves for elegant/clear representation on multiple lines
- I don't do this just for the sake of some hypothetical 80-characters-per-line limited device, because to my eyes a statement spaced out over multiple lines usually looks just like garbage (the example used in PEP 8 to demonstrate line wrapping is just such an example)
Rules that I no longer break because I have seen their wisdom :-)
- Use 4 spaces per indentation level.
- I formely used 2 spaces only because I was thinking 4 spaces is excessive
- After a relatively short time I noticed that the structure of the code was often hard to see: Where does the scope of the function/class/if-block begin/end?
- At first I blamed Python's group-statements-by-indentation and bitterly wished for the braces I was accustomed to from C/C++/Java
- After some time I stopped griping because this is just a fact that I can't change
- Instead I tried out 4-spaces-per-indent-level and suddenly my code looked better
Stuff
Callable object
A callable object is an instance object that is "called" as if it were a function.
The class must define a method __call__, then "calling" an instance foo of that class like this
foo(arg1, arg2, ...)
is the same as saying
foo.__call__(arg1, arg2, ...)
Null object
The null object is returned by functions that don't explicitly return a value. It supports no special operations. There is exactly one null object, named
None
(a built-in name).
Statement continuation
Although a backslash can be used to continue a statement on a next line, it is usually better to use paranthesis like this (example copied from the Idioms and Anti-Idioms in Python article):
value = (foo.bar()['first'][0]*baz.quux(1, 2)[5:9] + calculate_number(10, 20)*forbulate(500, 360))
(the main reason cited in the article to avoid backslashes is that a stray space character after a backslash will break line continuation)
Source code file encoding
PEP 263 describes how to specify the encoding of files that contain Python source code. This is interesting for me because my surname "Näf" contains a non-ASCII character.
It all boils down to the first or second line of the file containing a comment line that satisfies a regular expression described in the PEP. An example:
# coding=<encoding name>
My files all look like this:
#!/usr/bin/env python # coding=utf-8
Documentation
Docstrings
Functions, classes, etc. can (and should) all be documented using a Python feature called "Documentation Strings" (or "docstrings" for short). For reference, see this overview and the docstring conventions in [PEP 257.
A docstring must be a string literal that occurs as the first statement in a module, functin, class or method definition. A function documentation might look like this:
def doSomething(): """Summary line, should start with a capital letter and end with a period. The second line should always be blank to visually separate the summary sentence(s) from the follow-up detailed paragraphs. The detailed description may consist of multiple paragraphs and has no restrictions about what it should contain. Documentation parsers determine what indentation to use for formatting from the first ''non-blank'' line after the first line of the docstring. """
To print out an entity's docstring in code:
print(doSomething.__doc__)
To print out the documentation of a module "bar" within package "foo" on the command line:
pydoc foo.bar
Note: The docstrings feature does not define any specific markup, the markup depends on the tool that is desired for processing the docstrings.
reStructuredText
reStructuredText is a special way to markup Python docstrings (or any other source code documentation, for that matter). It has been developed by the docutils project, and the primary document is found here:
http://docutils.sourceforge.net/rst.html
With the 2.6 release, Python has changed its documentation format from LaTeX to reStructuredText. A primer can be found here:
http://docs.python.org/documenting/rest.html
The toolset that processes the Python documentation into HTML is called Sphinx. Its web site is found here:
http://sphinx.pocoo.org/
Doxygen
Apparently Doxygen also supports the Python language, however I have not investigated this since I am quite happy with Python's docstring feature.
Distributing / Installing Python Modules
Overview
The standard way of distributing a Python module, or installing such a distributed module, is to use the module
distutils
from the Python Standard Library.
Some references:
- Creating a distribution package
- http://www.python.org/doc/current/distutils/introduction.html
- Installing a package
- http://www.python.org/doc/current/install/index.html
- Index of distutils docs
- http://www.python.org/doc/current/distutils/index.html
Creating a distribution
Steps required:
- write a setup script (setup.py by convention)
- (optional) write a setup configuration file (setup.cfg by convention)
- (optional for source distribution) write a manifest template file (MANIFEST.in by convention)
- create a source distribution
- (optional) create one or more built (binary) distributions (e.g. a Debian package, a Windows installer, etc.)
A simple setup.py:
from distutils.core import setup setup(name='foo', version='1.2.3', py_modules=['foo'], )
Note: Within the setup script, use "/" as path separator. distutils will take care of converting this into the platform specific path separator.
A simple MANIFEST.in:
include COPYING
Note: Instead of a manifest template, it is also possible to provide the actual manifest. In this case, the manifest file must specify every single file to include in the distribution (even setup.py)
To create a source distribution foo-1.2.3.tar.gz:
python setup.py sdist
The source distribution will contain the following stuff:
- Python source files (py_modules and packages options in setup.py)
- Script files (scripts options in setup.py)
- README.txt (or README)
- setup.py
- setup.cfg
- test/test*.py
- files mentioned in MANIFEST.in
Note: Build files and versioning files (e.g. .svn) are removed automatically by distutils).
If a manifest file is already present when the "sdist" command is executed, it will be re-created automatically if setup.py or MANIFEST.in are newer. The manifest file needs to be regenerated manually, however, if only files have been added/removed that match an existing file pattern in setup.py or MANIFEST.in:
# Create a new source distribution python setup.py sdist --force-manifest # Regenerate manifest file but do not create a source distribution python setup.py sdist --manifest-only
setup.py for my standard project directory layout
My usual project directory layout looks like this:
base +-- doc | +-- README | [...] +-- src | +-- packages | | +-- package_A | | | +-- foo.py | | +-- package_B | | +-- bar.py | +-- tests | | +-- package_A | | | +-- foo_test.py | | +-- package_B | | +-- bar_test.py | +-- scripts | +-- foo +-- setup.py +-- MANIFEST.in
Note: I would have preferred to have a dist subfolder that contains setup.py and MANIFEST.in. Unfortunately this did not work as intended: although in setup.py I was able to specify the package root as "../src/packages", the MANIFEST.in stubbornly refused to accept a recursive-include directory "../doc" (I always got the error "warning: no files found matching '*' under directory '../doc'")
A more complex setup.py that reflects the above directory structure looks like this:
TODO
The accompanying setup.cfg:
TODO
The accompanying MANIFEST.in:
TODO
Installing a distribution
The usual sequence is this:
tar xfvz python-foo-1.2.3.tar.gz cd python-foo-1.2.3 ./setup.py build ./setup.py install
If something out of the ordinary is required for building/installing the module, it should be mentioned in the file README.
The build step:
- This step is responsible for putting the files to install into a build directory
- By default, this is named build, located directly below the distribution root
- The build directory can be changed using --build-base option; e.g.
python setup.py build --build-base=/tmp/pybuild/foo-1.2.3
The install step:
- This step is responsible for copying everything under build/lib (or build/lib.plat) to the chosen installation directory
- The standard location of the installation directory is system-dependent; to find out what it is, do the following in an interactive Python shell:
>>> import sys >>> sys.prefix '/System/Library/Frameworks/Python.framework/Versions/2.5' >>> sys.exec_prefix '/System/Library/Frameworks/Python.framework/Versions/2.5'
- The installation directory can be changed using a number of different schemes
- The "home" scheme: python setup.py install --home=~
- The "prefix" scheme: python setup.py install --prefix=/usr/local
- For details about the "home" and "prefix" scheme, or for even more customized schemes, see this reference (already cited further up)
Testing
Running the tests
Preferred way to run all tests of the project:
python setup.py test
(this requires some coding in setup.py, see the next chapter for details)
Running all tests of a test module:
python module.py
(this requires that module.py contains code that calls unittest.main())
Running specific tests in a test module:
python unittest.py module.FooTestSuite python unittest.py module.FooTestCase python unittest.py module.FooTestCase.testBar
Directory layout
For maximum ease of use, I want to be able to run my test cases like this:
python setup.py test
To achieve this, I organize my unit tests in a directory structure that parallels the directory structure of the project's source code. An example is provided further up where I explain my standard project directory layout.
Adding a "test" command to distutils
Useful reference: http://da44en.wordpress.com/2002/11/22/using-distutils/. Another resource that might be worth investigating is http://peak.telecommunity.com/DevCenter/setuptools.
A bare-bones subclass for distutils.cmd.Command looks like this:
class TestCommand(Command): user_options = list() def initialize_options(self): pass def finalize_options(self): pass def run(self): pass
A more interesting example is this:
# PSL from distutils.cmd import Command import unittest import sys # Extend search path for packages and modules. This is required for finding the # "tests" package and its modules. PACKAGES_BASEDIR = "src/packages" sys.path.append(PACKAGES_BASEDIR) class test(Command): """Implements a distutils command to execute unit tests. The class name is the same as the command name string used in the 'cmdclass' dictionary passed to the setup() function further down. The reason for this is that, unfortunately, 'python setup.py test --help' will print out the class name instead of the name used in the dictionary (or the 'command_name' attribute defined in this class). """ # This must be a class attribute; it is used by # "python setup.py --help-commands" description = "execute unit tests" # Options must be defined in a class attribute. The attribute value is a # list of tuples. Each tuple defines an option and must contain 3 values: # long option name, short option name, and a description to print with # --help. An option that should have an argument must have the suffix "=". # Each option defined in user_options must have a data attribute with a # name that corresponds to the long name of the option. For instance, an # option "--foo-bar" requires an attribute "foo_bar". If the user has # specified the option, a value is set to the data attribute. If the # option has no argument, the attribute value is set to 1. If the option # has an argument, the attribute value is set to the argument value. user_options = [("suite=", "s", "run test suite for a specific module [default: run all tests]")] def __init__(self, dist): # This data attribute is returned by Command.get_command_name() self.command_name = "test" Command.__init__(self, dist) def initialize_options(self): # The default value is a callable defined in tests.__init__.py. The user # must specify something like this: "--suite tests.test_algorithm" self.suite = "tests.allTests" def finalize_options(self): pass def run(self): tests = unittest.defaultTestLoader.loadTestsFromName(self.suite) testRunner = unittest.TextTestRunner(verbosity = 1) testRunner.run(tests) setup( # Add a command named "test". The name string in the dict is also used by # "python setup.py --help-commands", but not by "python setup.py test -h" cmdclass = { "test" : test }, [...] )
Coding
Overview
- the smallest unit to test is represented by the TestCase class
- subclasses of TestCase implement various test methods; test method names usually begin with "test" (although it's also possible to override the single method runTests())
- subclasses of TestCase may implement setUp() and tearDown() to define a test fixture
- each test method is executed with a new TestCase instance
- the class TestSuite aggregates TestCase and other TestSuite instances
- a test runner such as TextTestRunner finally executes a number of tests
TestCase examples:
class MyTestCase(unittest.TestCase): def setUp(): passs def tearDown(): passs def testFoo(): passs def testBar(): passs # Create instances that will execute the named test method # Note: This gets tedious with lots of test cases and test methods. # We will see a better way how to do this. fooTestCase = MyTestCase('testFoo') barTestCase = MyTestCase('testBar')
TestSuite examples:
# Simple way to aggregate test cases into a test suite myTestSuite1 = unittest.TestSuite() myTestSuite1.addTest(MyTestCase('testFoo')) myTestSuite1.addTest(MyTestCase('testBar')) # Another way tests = ["testFoo", "testBar"] myTestSuite2 = unittest.TestSuite(map(MyTestCase, tests)) # A third way. TestLoader relies on the fact that test method names # begin with "test" myTestSuite3 = unittest.defaultTestLoader.loadTestsFromTestCase(MyTestCase) # A fourth way to get at all tests within an entire module myTestSuite4 = unittest.defaultTestLoader.loadTestsFromModule(mymodule) # A last way to get at tests within a module, TestCase, etc. See # docs for the unittest module for exact behaviour, options and overloads myTestSuite5 = unittest.defaultTestLoader.loadTestsFromName("mymodule")
If it should be possible to run a test module in standalone mode, the module must contain this code at the bottom (see docs for the unittest module for more options on the unittest.main() method):
if __name__ == "__main__": unittest.main()
Finally, these are some of the assertions that the TestCase class defines. Each takes a string message as an optional last argument that can be used e.g. to indicate the exact nature of the failure:
- assert(expr)
- assertEqual(first, second)
- assertNotEqual(first, second)
- assertRaises(exception, callable)
- fail()
Using A Python Program In the Web
TODO
Eclipse and Python
Pydev
Pydev is an Eclipse plugin for Python (and Jython) development. It can be installed from this update site:
http://pydev.sourceforge.net/updates/
Note: The former "Pydev Extensions" plugin is now open source and part of Pydev.
Workspace configuration
The minimal configuration is to define one or more Python interpreters in "Preferences -> Pydev -> Interprether Python -> New". When such an interpreter is added, Pydev automatically finds a number of paths that contain modules for the interpreter. It then suggests to add these paths to the PYTHONPATH for that interpreter. This suggestion is usually OK and should be accepted.
I usually add the following interpreters:
- The system interpreter /usr/bin/python (2.5.1 on Mac OS X 10.5)
- The interpreter installed via fink /sw/bin/python2.5 (2.5.2 as of this writing)
- Interpreters installed into /Library/Frameworks/Python (I usually add the latest 2.6.x and 3.x for compatibility testing)
Project configuration
Now that the interpreters have been configured, work on a project can begin:
- Switch to the Pydev perspective
- Create a new project: File -> New -> Pydev Project
- Pydev projects consist of a .project and a .pydevproject file
- Pydev suggests creating a src folder; if this is accepted, the folder will be added to the project's PYTHONPATH
- To use the project with a new SVN repository
- Create a new, empty repository
- Select project, e.g. in the Pydev Package Explorer view
- Context Menu -> Team -> Share Project
- Select "SVN"
- Enter repository URL (e.g. http://www.herzbube.ch/svn/mkroesti)
- Folder name = trunk
- Enter an initial comment (e.g. "add Eclipse project files")
- Alternatively, to connect the project to an SVN repository that already exists and has content
- Do the same as above, but enter an URL + folder name that point to the location where the SVN repository lives
- Subclipse will warn that the specified folder already exists in the given repository; you can now say "yes" to let Subclipse checkout the folder and connect the project to the working copy
- Subclipse will offer to switch to the "Team Synchronize" perspective; say "yes"
- you will see that the two project files .project and a .pydevproject are marked as "added"
- right-click on the project and select "Commit..." to add the project files to the repository
- switch back to the "Pydev" perspective
- configure the project's PYTHONPATH
- Select project, e.g. in the Pydev Package Explorer view
- Context Menu -> Properties -> PyDev PYTHONPATH
- Add whatever path is needed, e.g. if the project has packages, replace the src folder (which was automatically added when the project was created) by src/packages
Run unit tests
- setup.py with "test" command
- open "Run configurations" dialog
- double-click "Python Run" to create a new configuration
- give it a name, e.g. "mkroesti tests (setup.py)"
- select project, e.g. mkroesti
- select main module, e.g. browse for "setup.py" (will result in something like "${workspace_loc:mkroesti/setup.py}")
- on the "Arguments" tab, set the program arguments to "test"
- run all tests in a directory
- the easiest way is to let PyDev create the run configuration for you
- right-click on the folder that contains your tests and select "Run as... -> Python unittest"
- the resulting run configuration is configured as follows
- name = <projectname> tests (e.g. "mkroesti tests")
- project = <project> (e.g. mkroesti)
- main module = <folder-with-tests> (e.g. ${workspace_loc:mkroesti/src/packages/tests})
- on the "Arguments" tab, the working directory is set to <folder-with-tests> (e.g. ${workspace_loc:mkroesti/src/packages})
Python Package Index (PyPi)
Website: http://pypi.python.org/
How to submit a package to PyPi:
- You need to register a user account before you can submit any packages.
- Package submission works in one of three ways
- ./setup.py register
- Submit the file PKG-INFO that is generated by ./setup.py sdist (the file can be found inside the generated tar ball)
- Manually enter package information on the submission page
- I found that setup.py's register command works well, although Python 2.5 choked on my name because it contains a Unicode character; I hade to invoke setup.py with a Python 3 interpreter to make it work
- Multiple submissions for a package
- The latest submission will overwrite the previous submissions for the same version
- The latest submission will "hide" all previous submissions for other versions, i.e. listings and searches will find only the version of the latest submission. Besides the obvious intention of displaying only the newest version of a package, this feature is also useful if a submission has been made for a wrong version: Simply fix the version number in setup.py and re-submit the package
- The package admin web interface can be used to "un-hide" a hidden submission (I have not investigated how this works)
Further references:
- Dive Into Python has a nice overview about packaging Python software
- The Cheese Shop Tutorial
- The list of trove classifiers
Software
Interesting software related to Python
- py2app (url)
- Convert Python scripts into standalone Mac OS X applications