TheoryAndAlgorithms

From HerzbubeWiki
Jump to: navigation, search

This page contains notes about algorithms, math theories, and similar abstract topics. I don't have a bachelor or master degree in computer science, but the stuff on this page is what I imagine would be on the curriculum.


Big O notation

References

This section paraphrases from the Big O notation page on Wikipedia.


Abstract

I'm interested in the application of the Big O notation in computer science:

In computer science, big O notation is used to classify algorithms according to how their running time or space requirements grow as the input size grows.

When we write about the time growth of a function or algorithm, and we write this:

f(x) = O(g(x)) as x → ∞

we actually say this:

  • The function/algorithm is f(x)
  • The input argument is "x"
  • The input argument "x" goes towards infinity
  • The time it takes for the algorithm to complete is determined by the function g(x)


The shorthand notation for the above is this:

O(g(x))


The letter "O" is used because the growth rate of a function is also referred to as the order of the function.


So how do we determine g(x)? A function/algorithm usually consists of several terms. Because "x" goes towards infinity, the contribution of the terms that grow "most quickly" will eventually make the other terms irrelevant. As a result, the following simplification rules can be applied:

  1. If f(x) is a sum of several terms, if there is one with the largest growth rate, it can be kept, and all others omitted.
  2. If f(x) is a product of several factors, any constants (i.e. terms in the product that do not depend on "x") can be omitted.


Example

Let's look at this example:

f(x) = 6x4 − 2x3 + 5
  • The function is the sum of three terms: 6x4, −2x3, and 5.
  • Of these three terms, the one with the highest growth rate is the one with the largest exponent as a function of x, namely 6x4.
  • So we can apply rule #1 and omit everything except 6x4.
  • The remaining term 6x4 is the product 6 and x4
  • The first factor, 6, is a constant that does not depend on x.
  • So we can apply rule #2 and omit everything except x4.


So the final result for g(x), after applying the simplification rules, is "x4", or

f(x) = O(x4)

which is pronounced "big-oh of (x4)".


Orders of common functions

Orders of common functions
Notation Name Example/Remark
O(1) Constant The function always takes the same time to execute, regardless of the input size. Example: Getting a value from a constant-sized lookup table - because the lookup table always has the same size the input size does not change.
O(log n) Logarithmic Example: Finding an item in a sorted list or array using a binary search.
O(n) Linear Execution time grows linearly (as opposed to exponentially) with the input size. For instance, if the input size grows by a factor of 2, execution time also grows by a factor of 2. Example: Finding an item in an unsorted list or array, where the input size is the list/array size.
O(n2) Quadratic Example: Bubble sort.
O(cn) Exponential Example: Finding the (exact) solution to the travelling salesman problem.


Graph theory

References


Glossary

Bipartite graph 
A graph type. A graph that consists of two distinct sets of vertices where every vertex is connected to all the vertices in the opposite set, but none of the vertices in its own set. Kxy denotes the bipartite graph with two sets of size x and y.
Complete graph 
A graph type. A graph is said to be complete if every vertex is connected to every other vertex. K3 denotes the complete graph with 3 vertices.
Cycle 
A graph type. A cycle is a graph which consists of a single "ring" of vertices.
Degree 
A vertex property. The degree of a vertex is the number of edges which meet at that vertex.
Directed graph 
A graph type. A graph in which the edges have a direction.
Disconnected graph 
A graph type. A graph that consists of distinct groups of vertices that are not connected by edges.
Edge 
A graph element. A line in a graph. The symbol used is E.
Euler's equation
F + V = E + 1.
Face 
When drawing a planar graph, a face is an area delimited by the edges of the graph. The symbol used is F.
Hamiltonian cycle 
A cycle that visits every vertex in a graph exactly once. A complete graph Kn contains n! Hamiltonian cycles.
Loop 
An edge that connects a vertex to itself.
Order 
A graph property. The order of a graph is its number of vertices.
Planar graph 
A graph type. A planar graph is a graph that can be drawn so that its edges do not overlap.
Vertex (pl. vertices) 
A graph element. A point in a graph. The symbol used is V.
Weighted graph 
A graph type. A weighted graph is a graph in which a number (the weight) is assigned to each of the graph's edges.


Basics

Graph theory is the study of graphs and their properties.

A graph consists of

  • Vertices: The points in the graph. The symbol used for "vertex" is V.
  • Edges: The lines connecting the points. The symbol used for "edge" is E.

When a graph is rendered visually, it is usually depicted with circles (representing the vertices) and lines (representing the edges). For specialized graph types the layout of vertices and edges can be important.


Graph properties

The order of a graph is its number of vertices.

The degree of a vertex in a graph is the number of edges which meet at that vertex.


Graph types

In a directed graph the edges have a direction.

A disconnected graph consists of distinct groups of vertices that are not connected by edges.

A cycle is a graph which consists of a single "ring" of vertices, i.e. the vertices are arranged sequentially and the last vertex is connected to the first vertex. A cycle has the same number of vertices and edges.

A complete graph is one where every vertex is connected to every other vertex. A complete graph with n vertices is written as Kn. A complete graph with n vertices has n * (n - 1) / 2 edges (handshake problem).

A bipartite graph is one that consists of two distinct sets of vertices where every vertex is connected to all the vertices in the opposite set, but none of the vertices in its own set (dating problem). A bipartite graph with two sets of size x and y is written as Kxy. A bipartite graph with two sets of size x and y has x * y edges.

A planar graph is a graph that can be drawn so that its edges do not overlap. The areas created by drawing a planar graph are called faces (symbol F). For complete graphs, only graphs up to K4 are planar, K5 and larger are not planar. The number of edges is always one less than the number of faces plus the number of vertices - this is called "Euler's equation": F + V = E + 1.

A weighted graph is a graph in which a number (the weight) is assigned to each of the graph's edges. The weight might represent a cost of some sort, or the length of a route segment, etc.


The "Bridges of Königsberg" problem

The "The Bridges of Königsberg" problem is about finding a route, or path, that uses every edge in a graph exactly once, possibly but not necessarily ending at the same vertex that was the start of the route/path.

To find out quickly whether or not there is a solution for any given graph: The graph must have no more than two vertices whose degree (= number of edges) is odd-numbered.


Map colouring

The "map colouring" problem examines the question: What is the maximum number of colours needed to fill areas on a map (e.g. countries) so that no two adjacent areas have the same colour?

The map can be converted a graph like this:

  • Areas on the map are converted to vertices
  • For areas that are adjacent on the map the corresponding vertices in the graph are connected to each other by an edge.

For maps on a flat plane or a sphere, where all countries consist of a single area, only four colours are needed (the four colour theorem, finally proven in 1976). More colours are needed if countries consist of multiple disconnected areas (enclaves/exclaves, or empires), or if the map is not on a flat plane or sphere. For instance, seven colours are needed for maps on a torus (doughnut shape).


Hamiltonian cycles

Here we try to find a route, or path, that visits every vertex in a graph exactly once (cf. the "Bridges of Königsberg" problem where the problem is to find a route, or path, that uses every edge). This is a so-called Hamiltonian cycle (cf. cycle).

In any complete graph Kn, the number of Hamiltonian cycles is n!.

If a graph is not complete (i.e. some of the graph's vertices have no direct connection) the number of Hamiltonian cycles cannot be "predicted" by a formula - in fact there may be no Hamiltonian cycle at all.


The "Travelling salesman" problem

The "Travelling salesman" problem (TSP) tries to find the shortest route, or path, that visits every vertex in a graph exactly once. On the Wikipedia page the requirement is added that the path must return to the origin vertex. The graph for which a TSP is calculated usually is a weighted graph where the weight indicates the length of the path segment. If the graph is unweighted, one can assume a weight of 1.

There are several algorithms to solve the TSP, or at least approximate a solution:

  • Brute force. This quickly becomes impractical because the number of calculations grows exponentially (O(n!)).
  • Greedy algorithm / Nearest Neighbour algorithm. Start at a random vertex, then always go the closest vertex that hasn't been visited before, until all vertices have been visited. Apparently it can be shown that on average paths found using this algorithm are 25% longer than the shortest possible path.
  • 2-Opt algorithm. Start with a random possible path, then repeatedly pick two edges and swap them around if that would reduce the length of the path, until the length can't be reduced any further by swapping any pairs of edges. TODO: Find a better description, and also find out whether this is an exact solution.
  • Ant Colony System (ACS) algorithms. These types of algorithms try to replicate ants' behaviour when they scout for food. Ants that find something return to the colony and leave behind a trail of pheromones, which attracts other ants that follow and strengthen the trail with their own pheromones. Over time the pheromones evaporate, thus a short path is favoured over a long path because it takes ants longer to travel along the long path and pheromones have more time to evaporate.


Shortest path problem

The shortest path problem is the problem of finding the shortest route, or path, between two given vertices in a graph. As in the "Travelling salesman" problem, the graph for which the shortest path is calculated is a weighted graph. Unlike the TSP, though, the shortest path is not required to visit all vertices of the graph.