In this chapter, we will demonstrate path finding on the example of metabolic networks. We will work on a network assembled from all metabolic pathways annotated for the yeast S. cerevisiae in BioCyc (Release 10.6) [3]. We will also show the influence of the weighting scheme on path finding results.
The yeast network constructed from BioCyc data consists of 1,185 nodes and 2,656 edges. It has been obtained by unifying 171 metabolic pathways. Note that this network is bipartite, which means that it is made up of two different node types: reactions and compounds. An edge never connects two nodes of the same type. For the tutorial, we choose to represent the metabolic data as undirected network. Note that higher accuracies can be achieved by representing metabolic data by directed networks that contain for each reaction its direct and reverse direction, which are treated as mutually exclusive. See the advanced options of the Pathfinder tool for mutual exclusion of reactions in directed metabolic networks.
We will recover the heme biosynthesis II pathway given its start and end compound, namely glycine and protoheme. First, we will use the "degree" weighting scheme, which penalizes hub nodes. Second, we will infer the path using the "unit" weighting scheme and compare the results.
In the right panel, you should now see a form entitled ``Pathfinder''.
The form is now filled with the BioCyc demo network, and the parameters have been set up to their appropriate value for the demonstration. At the top of the form, you can read some information about the goal of the demo, and the source of the data.
The computation should take no more than two minutes. When it is finished, a link to the results should appear.
It lists a table of all paths found for the requested rank number (5 by default). You can also specify another type of output, for instance a network made up of all paths found. Vary the parameter Output type for this.
To see how results change with modified weight, you can repeat steps 1 and 2. Before clicking on GO, choose ``unit weight'' as Weighting scheme and set the Rank to 1. Continue as described above. You will obtain another paths table than before.
This section assumes that you have installed the RSAT/NeAT command line tools.
You can find the demo network Scer_biocyc.tab in $RSAT/public_html/demo_files.
Type the following command to enumerate paths up to the 5th rank in the weighted network:
java -Xmx800m graphtools.algorithms.Pathfinder -g Scer_biocyc.tab -f tab -s gly -t protoheme -y con
To find paths in the unweighted network, type:
java -Xmx800m graphtools.algorithms.Pathfinder -g Scer_biocyc.tab -f tab -s gly -t protoheme -y unit -r 1
First, we run Pathfinder with degree weighting scheme, which is the default weighting scheme of the demo. This weighting scheme sets the weights of compound nodes to their degree and of reaction nodes to one. The first ranked path obtained should look like this:
GLY 5-AMINOLEVULINIC-ACID-SYNTHASE-RXN 5-AMINO-LEVULINATE PORPHOBILSYNTH-RXN PORPHOBILINOGEN OHMETHYLBILANESYN-RXN HYDROXYMETHYLBILANE UROGENIIISYN-RXN UROPORPHYRINOGEN-III UROGENDECARBOX-RXN COPROPORPHYRINOGEN_III RXN0-1461 PROTOPORPHYRINOGEN PROTOPORGENOXI-RXN PROTOPORPHYRIN_IX PROTOHEMEFERROCHELAT-RXN PROTOHEME
This path recovers very well the annotated heme biosynthesis II pathway.
GLY GLUTATHIONE-SYN-RXN ADP PEPDEPHOS-RXN PROTON PROTOHEMEFERROCHELAT-RXN PROTOHEME
This path deviates strongly from the heme biosynthesis II pathway annotated in BioCyc. It contains two hub nodes: ADP and PROTON.
To sum up: path finding can predict pathways with high accuracy if an appropriate weighting scheme is applied to the network of interest. Our metabolic example shows that the heme biosynthesis II pathway is accurately predicted when using a weighted network and not found at all when using an unweighted network. The take home message is that in order to use Pathfinder on biological networks, weights have to be carefully adjusted.
Make sure that your start and end nodes are present in your network of interest. If no path could be found, none of the end nodes is reachable from the start nodes, thus no path exists. For big graphs and long waiting time, there is the possibility that the pre-processing step of REA, namely to compute the shortest paths from the source to all nodes with Dijkstra, was not finished before the server timeout. In this case, a path might exist but could not be detected due to the timeout.
When searching for paths with the "unit" weighting scheme in large networks, there might be a large number of possible paths for each requested rank. Although REA has a memory-efficient way to store paths with pointers, there is a limit for the number of paths that can be hold in memory. Reduce the number of requested paths or the size of the graph or use another weighting scheme.