Suppose you have a directed graph G with a source node s and a sink node t. You wish to count the number of different s-t paths (two paths differ as long as they are not the exact same sequence of edges). G may have cycles, so this number is possibly infinite; you wish to detect this.
If you suppose G is acyclic, a natural approach might be to do a depth first search from s. Each time you reach t, you have found another path.
class Node def dfs_count_paths(target) @@count += 1 if self == target @visited = true for child in @children child.dfs_count_paths(target) end @visited = false end end
Besides not working in the acyclic case, this has the potential to be very slow, because it will enumerate every single path in the graph. For example, if w is another node, there could be 100 s->w paths and 100 w->t paths; all 10,000 combinations of these two are valid s->t paths. Enumerating them all is expensive. However, if we knew both the number of s->w paths and w->t paths, we could compute this directly. With some thought, this idea can become a better approach. In a directed acyclic graph, we can order the vertices so that edges only point forward. In other words, if u is a descendent of v, then u will come after v in the sorted list of nodes. This is called a topological sort. Using this ordering, call s vertex 1, the next vertex 2, and so on, up to t which we may call vertex n. The number of paths from 1 to 2 is just the number of edges from 1 to 2. The number of paths from 1 to 3 is the number of (direct) edges from 1 to 3, as well as paths that use 2 as an intermediate vertex. More generally, let d(j) be the number of paths from 1 to j, then we have that:
This gives a dynamic programming algorithm: for j from 2 to n, compute d(j). d(n) gives the number of paths from 1 to n. This is much more efficient: for each node, computing d takes time proportional to the in-degree of the node, and overall it will take O(E) time, where E is the number of edges in the graph.
How do we topologically sort the graph? A depth-first search will visit all children of a node, and then finish visiting the node itself. This sounds a lot like the property we want for our topological sort: that the descendents of the node come after the node itself. We can express this as the following algorithm:
class Node def topological_sort @visited = true sorted_nodes =  for child in @children if not child.visited sorted_nodes = child.topological_sort + sorted_nodes end end return [self] + sorted_nodes end end
The combination of a topological sort with dynamic programming gives us a good algorithm for counting paths in an acyclic graph. What about when cycles are allowed? After the topological sort, if u comes after v, then u is a descendent of v, so there is a path from v to u. If there is also an edge from u to v, then there is a cycle. This edge is called a back-edge, because it goes against the direction of the topological sort.
So, we may detect cycles by looking for back edges. However, such a cycle doesn’t necessarily indicate that there are an infinite number of paths from s to t, because the cycle may not lie on a path from s to t at all. This is because there are “dead ends” that cannot reach t. We only wish to consider cycles involving nodes that can reach t.
To detect this, we again turn to our trusty tool the depth first search. The nodes that can reach t are the same as the nodes that we can see when exploring backwards along edges out from t. In other words, we will follow in-edges, not out-edges:
def prune_unreachable_nodes @visit = CAN_REACH_TARGET for node in @parents if node.visit == UNVISITED node.prune_unreachable_nodes end end end
This marks only the nodes that can reach t; the other (unreachable) nodes can be ignored.
Putting It Together
Now, to put it all together, the algorithm is as follows:
- Mark the nodes that can reach t using a DFS.
- Topologically sort these nodes.
- Starting from the position of node s in the topological sort, perform the dynamic programming algorithm to compute d for each node.
- If at any point a back edge is encountered, then there are an infinite number of paths.
- d(t) indicates the number of paths from s to t.
The following code gives the dynamic programming part of the approach:
class Graph def count_paths @t.prune if not @s.visit == Node::CAN_REACH_TARGET return 0 end sorted_nodes = @s.topological_sort for i in 0...sorted_nodes.length sorted_nodes[i].position(i) if sorted_nodes[i] == @s start_pos = i end end init = sorted_nodes[start_pos] init.paths(1) for i in start_pos...sorted_nodes.length cur = sorted_nodes[i] for child,count in cur.children if child.visit != Node::UNVISITED and child.position <= i return "INFINITE PATHS" end end #count the paths from init to node i, and save that number in node i paths = 0 if init.children[cur] paths = init.children[cur] end for prev,count in cur.parents paths += prev.paths*count end cur.paths(paths) end @t.paths end end
I've been playing around with twitter's bootstrap library the last few days. I made a version of my site using it, which I think looks pretty nice. The design is minimal and appealing, and responds to the size and browser automatically. This means my site looks nice on mobile now too.
Maybe I'll switch to that version of the site once I get social and music integrated.
The big limitation of my old webhost was the lack of control I had over the backend. It was great at php, and had some python support, but didn’t even give me command-line access. Lately I’ve been playing with ruby on rails a lot, as well as thinking about the idea of doing things like programmatically generating pdf files and other things that I wouldn’t be able to do with my old host. So I’ve switched to Slicehost. Now I get to run the entire backend myself, which meant some choices.
First, I’m using nginx as the web server instead of apache. This is because the event-driven architecture of nginx is much better than apache at handling the massive traffic that my website gets. Ok, but seriously, nginx does do some things that make me like it.
Unlike Apache, it doesn’t try to do everything itself. It delegates dynamic page generation (that is, python, php, ruby, etc) to other processes over a socket or to a server listening locally on another port. For example, I get php support using FastCGI with the php5-fpm ubuntu package. php5-fpm listens on port 9000, and nginx forwards php requests (locally) to this server.
For python web services, I’m using uWSGI as a WSGI server. Again, nginx forwards python requests over a socket to uWSGI, which manages the python interpreter and responds to a request. This approach differs from other WSGI server implementations that double as web servers; performance-wise, this approach pays off, as this benchmark shows. Support for Ruby and other web languages can be added similarly.
Another advantage of separating the webserver from the interpreter is that it allows the interpreter to be restarted (to roll out a new version of a script) without the webserver going down.
Configuring nginx to do this isn't hard: here's the relevent part form my nginx config:
How do you remove the duplicates from an array of integers?
Let's say you have a group of people, or a network of computers, or the brain of an earthworm. Let's say a group of people: the city of Toronto. Some of the people in Toronto know each other, but most pairs of people do not. Within the city, there are groups of friends, and most of the people within these smaller groups know each other. In other words, people in Toronto cluster into smaller groups of inter-connected people. As an urban planner/sociologist/bored computer scientist, you want to know what these clusters are like. How many are there? How big are they? If you have all the people in Toronto, and know who knows who, you can search this in this information to find these clusters of dense connections.
But you're worried this approach wont work. First, Toronto is pretty big--it's hard to get that information from everyone, and hard to study it once you have it. (You could study a small part of Toronto, but you're worried that doing so will throw away long-range connections between people in different parts of the city, so it wouldn't be representative of the full city.) Second, you're not just interested in Toronto. You're interested in Boston, and New York, and in Europe. You're interested in networks of computers. So you decide not to work in Toronto, you decide to work in Proto-Toronto.
What's Proto-Toronto? Proto-Toronto is a lot like Toronto, except it's made of math. It has the same number of people, and some of the people know each other (but most don't). However, in Proto-Toronto, the only thing that determines who knows who is how close they live. People who live within a block of each other have a very high chance of knowing each other; people who live far apart probably do not. Now you ask: who are the groups of friends, the clusters, in Proto-Toronto? This you can answer! They can be big, but not too big--after all, not everyone can be friends. The bigger Proto-Toronto grows, the larger the clusters of people it can support.
At first you're happy, but it's late at night one night and you're having doubts. The people in Proto-Toronto are not like the people in Toronto. Why should we care about Proto-Toronto? It's not Toronto, after all. I'll tell you why. Because you can make the people in Proto-Toronto act like Torontonians much more easily than you can study Toronto itself. To figure out how a Proto-Torontonian acts, just take fifty or so real Torontonians, and figure out who they know. Take the average, and you have a proto-Torontonian. Similarly, you can make Proto-Boston or Proto-New-York (or a proto-computer network). Even then, your Proto-villes aren't great--they aren't real cities. But they act kind of like real cities. Understanding what groups of friends look like in Proto-Toronto gives some you some intuition about what groups of friends in Toronto look like.
This is a "human-readable" version of what I've been researching. Want more? Go read the real thing.