Contents

Introducing Graphs

Graph is a mathematical object. Well, it sounds too complicated... :)

Let us start from something simpler. Have a look at the following picture:

Here you see 7 circles depicting nodes (or vertices) linked by edges. That is all! The graph is just the heap of nodes and edges. The particular placement of rounds and lines is not important - lines could be drawn curving for example.

So the graph is completely specified by the following data:

the set of vertices (usually marked with some letters, numbers of names);
the set of edges each of which is described by the pair of connected nodes.

For example, the graph above could be specified as:

nodes:
A, B, C, D, E, F, G

edges:
A-B, B-C, C-D, D-E, E-B, B-F, F-G

Popular examples which could be represented by graphs are the:

subway map (or any other road map);
the map of Sates in USA (edges will mark that two states are neighbor, i.e. share the border);
computer network topology (for example the whole web with documents referencing each other);
class hierarchy in the sources of some computer program built upon OOP principles.

Optionally graph can have additional information, for example:

Edges could have "weights" (lengths of roads, time of signal propagation between routers etc.)
Vertices can have weights too (though this could be transformed into weights of the edges)
Edges could be "directed", like one-way road between cities

Terminology

edge is said to be incident to given vertex if it connects this vertex with some other;
path (or, rarely, route) is the chain of edges, for example D-C, C-B, B-F, F-G in the picture above is the path between vertices D and G - alternatively we can describe path by its vertices D C B G;
cycle is the path connecting the vertex to itself, like C D E B C in our example;
directed graph is one which contains at least one directed edge (undirected graph is usually represented as directed, having a pair of directed edges in two directions instead one undirected edge);
connected graph is one for which the path exists for any pair of vertices (i.e. it is not broken in several unconnected isles);
tree is a connected graph without cycles (you can usually draw it as a tree-like structure).

Graph representation in programming

Shortly speaking, there are many ways. A lot of ways. Let us regard only few most popular - keep in mind that based on these examples you can invent other variants using common data structures.

Incidence Matrix

For graph with N vertices we can use the matrix N * N. The j-th cell in the i-th row will contain a number - the weight of the edge from vertex i to vertex j. The only question is how to represent the absent edges - usually some special value (or simply extremely large weight) could be used. For example our graph will look like:

   A B C D E F G

A  - 1 - - - - -
B  1 - 1 - 1 1 -
C  - 1 - 1 - - -
D  - - 1 - 1 - -
E  - 1 - 1 - - -
F  - 1 - - - - 1
G  - - - - - 1 -

As you see, all edges have marks in two cells (from i to j and from j to i) because they are undirected. All weights are 1 for our sample graph.

This variant was popular with languages like C or Pascal not having more efficient data structures than arrays. Good feature is that the edge between given vertex could be found in one operation. However it is significantly space-consuming (the web graph could not be represented in such way for sure).

Incidence Lists

To save the space we can store for each vertex only list of its existent edges. These lists could be grouped into an array (so that we can found a list by the vertex number etc.) Here is the example for our graph:

A: B               g['a'] = [('b', 1)]
B: A, C, E, F      g['b'] = [('a', 1), ('c', 1), ('e', 1), ('f', 1)]
C: B, D            g['c'] = [('b', 1), ('d', 1)]
D: C, E            g['d'] = [('c', 1), ('e', 1)]
E: B, D            g['e'] = [('b', 1), ('d', 1)]
F: B, G            g['f'] = [('b', 1), ('g', 1)]
G: F               g['a'] = [('f', 1)]

On the left we used simplified notation, without edge weights (because our example do not have them - or rather they are all equal) - however in general case we need to store the list of edges, each of which contains both the target vertex and the weight - like it is shown on the right, in Python-like syntax.

You may note that if there are no duplicating edges, we can use dictionaries (mapping target vertex name to weight) instead of lists. However with dictionaries we can do better.

Using Dictionary

These specific data structures are called "dictionaries" in Python or "hash maps" in Java and C++. They are also known as "associative arrays" - in languages like PHP and JavaScript all arrays are associative.

They work like ordinary array, but can use strings and other (constant) objects for indexes. So we can, for example, create a dictionary of edges, using the names of pair of vertices as the key (or index) and the weight of the edge as the value. Here is an example with our graph:

g['ab'] = 1
g['ba'] = 1
g['bc'] = 1
g['be'] = 1
    ...
g['gf'] = 1

This approach is really a space-efficient version of the incidence matrix.