Cytoscape Plugin Tutorial

Retrieve the latest Version

  1. Download and install Cytoscape. Make sure, you have Java 1.6 installed and configured!
  2. If you do not wish to install the plugins within Cytoscape but manually, you can download them here: BLAST2SimGraph, TransClust, ClusterExplorer.

Blast2SimilarityGraph

Workflow 1 - Create a similarity network using the Blast2SimilarityGraph plugin.

The sample network consists of 866 proteins with known family assignments (manually curated gold standard data set from Brown et al.). The proteins were BLASTed all vs. all and the similarites between the proteins are represented by the edges in the similarity network. The (super)family assignments later will serve as gold standard. Please see Brown et al. for more details.

Note: The BLAST results file has to be in M8 TAB format: run the all-vs-all BLAST with the -m8 option.

  1. Install Blast2SimilarityGraph via the Cytoscape Plugins Manager (Cytoscape Plugins Menu -> Manage Plugins -> Available for Install -> Analysis).
  2. After installing, start the plugin from the Cytoscape Plugins Menu (Cytoscape Plugins Menu -> Blast2SimilarityGraph -> Start Blast2SimilarityGraph GUI).
  3. The Blast2SimilarityGraph GUI is opened in a panel on the left side of the Cytoscape Desktop.
  4. Download the sample BLAST and FASTA files and save the files to disk.
  5. Import the BLAST and FASTA files via clicking on the 'Browse...' buttons in the Blast2SimilarityGraph GUI.
  6. Enter the BLAST e-value cutoff, which is 100 for this example.
  7. Choose "Sum of all Hits" as Similarity Function and set the Coverage Factor to 15, as suggested by Wittkop et al. 2007 for protein sequences. Keep the default values for all other settings.

    For a detailed description of the different BLAST-based cost models refer to Wittkop et al. 2007. BeH uses the best BLAST hit only, while SoH incorporates all BLAST hits. If you are unsure, which cost model to use, choose the default option BeH. It is faster to computer and usually provides good results.
  8. Press the 'Start Blast2SimGraph' button.
  9. The network is automatically created with nodes representing the input proteins and edges representing the similarities between them.
  10. Click on the nodes to see the Protein IDs in the 'Node Attribute Browser' at the bottom of the Cytoscape Desktop. Click on the edges to see the similarity values in the 'Edge Attribute Browser'.

TransClust

Workflow 2 - Clustering the created similarity network using the TransClust plugin.

The similarity network produced in Workflow 1 can be clustered to re-group the proteins according to their protein family based on the similarity values.

  1. Install TransClust via the Cytoscape Plugins Manager (Cytoscape Plugins Menu -> Manage Plugins -> Available for Install -> Analysis).
  2. After installing, start the plugin from the Cytoscape Plugins Menu (Cytoscape Plugins Menu -> TransClust -> Start TransClust).
  3. The TransClust GUI is opened in a panel on the left side of the Cytoscape Desktop.
  4. Select the Edge Weight Attribute, which is called 'Blast2SimGraph_sim'. This attribute will be used for the clustering.
  5. Set the Threshold to 70.
    Note: Higher values will result in many small clusters, while clustering on lower threshold values will result in fewer but larger clusters.
  6. Keep the default values for the other settings.

    For fast computers, you may open 'Advanced Settings' and set the 'Max. Subcluster Size' to 50 and the 'Max. Time (secs)' to 2. These option adjust the size of subproblems to be solved exactly. The higher the numbers, the higher the running time but also the accuracy.

    Depending on the specific problem, but also for more slow computers, enable 'Merge similar nodes into one?' and set the respective threshold (upper bound). Elements (here: proteins) with a similarity exceeding the upper bound will be merged virtually into one object while clustering. This may decrease running time drastically. Depending on the specific problem, it's reasonable to do that. For protein sequence clustering using BLAST as similarity function, it might make sense to set the threshold to 323 since this is the highest reachable similarity (corresponding to an BLAST E-value of 0.0).
  7. Press the 'Run TransClust' button. A window will pop up as soon as the clustering process is finished and the network will be layouted according to the clustering result.

Workflow 3 - Determining the optimal clustering threshold.

The optimal threshold for clustering can be identified with TransClust by trying different thresholds and comparing the results to a gold standard.

  1. Download the Gold Standard file and save it to disk. See Brown et al. for more details on the gold standard.
  2. Import the gold standard as node attribute (Cytoscape File Menu -> Import -> Attributes from Table).
    Select the gold standard file as input file.
    Rename 'Column 2' via right-click to 'superfamily', and 'Column 3' to 'family'.
  3. For this application example, select the attribute 'family' as the Gold Standard Attribute in the TransClust GUI (bottom).
  4. Set the minimal threshold to 20 and the maximal threshold to 100.
  5. Set the stepsize to 10.
  6. Make sure the 'Edge Weight Attribute' and the other values are still set in the TransClust Settings as described in Workflow 2.
  7. Press the 'Run Comparison' button. TransClust will compute clusterings for the different thresholds and compare them to the gold standard. A window will pop up as soon as the clustering process is finished and the results will be shown in a new panel on the right side of the Cytoscape Desktop.
  8. The threshold is choosen best, where the F-measures are highest. Hence, the optimal values seems to be somewhere around 50 and 60. Choose e.g. 55 as threshold, enter it in the TransClust Threshold field and press the 'Run TransClust' button. The clustering will be performed according to the newly selected threshold.

    F-Measure: Values near 0 indicate a "bad" clustering, values near 1 a "good" match with the gold standard. See Paccanaro et al. 2006 for more details about the F-measure II. F-Measure I simply set as the mean of precision and recall.

    Note: Another way to find a reasonable density parameter is to use the cluster size distribution of the resulting clusterings. This can be done for instance with the ClusterExplorer plug-in.

ClusterExplorer

Workflow 4 - Analysis of cluster results using the ClusterExplorer plugin.

  1. Install ClusterExplorer via the Cytoscape Plugins Manager (Cytoscape Plugins Menu -> Manage Plugins -> Available for Install -> Analysis).
  2. After installing, start the plugin from the Cytoscape Plugins Menu (Cytoscape Plugins Menu -> ClusterExplorer -> Start ClusterExplorer GUI).
  3. The ClusterExplorer GUI is opened in a panel on the left side of the Cytoscape Desktop.

Workflow 5 - Element and Cluster Analysis

  1. Select the Edge Weight Attribute, which is 'Blast2SimGraph_sim' and the Cluster ID Attribute, which is called 'TransClust'.
  2. Choose 'Similarity to all other clusters' in the 'Element Analysis' dropdown menu. This would reflect the biological question "If my protein of interest was not in the assigned family (cluster), what would be the second best family (cluster) it would fit in?".
  3. Press 'Analyse Graph'. A results table will open on the right side of the Cytoscape Desktop containing all other clusters in the graph and the mean similarities (weights) of the selected element (protein) to this clusters.
  4. Click on a row in the results table to highlight the respective cluster in the network.

Workflow 6 - Plot Histograms

  1. Open the plot histogram panel by clicking on the plus sign.
  2. Select the Edge Weight Attribute, which is 'Blast2SimGraph_sim' and the Cluster ID Attribute, which is called 'TransClust'.
  3. Select the 'Inter/Intra Edge Weight Histogram' box.
  4. Press 'Plot Histogram'. A new window will open displaying the histogram of the distributions. Increase the bucket size to 100 in the plot (to get a finer-granular plot) and press 'Update'.

    Note that the intra-cluster-similarity distribution is well separated from the inter-cluster-similarity distribution (red/blue) at around 55 (our threshold).
  5. Now close the distribution window and go back to the plug-in. Select our gold standard 'family' as Cluster ID Attribute and press 'Plot Histogram'.

    Note the overlap of the intra-cluster-similarity distribution with the inter-cluster-similarity distribution (red/blue) at around 55. This indicates that looking for the clustering threshold in the area of 50 and 60 was reasonable.

    Actually, if having a gold standard available, the first step is to investigate these distributions. Here, we search for a reasonable range for the in Workflow 3 described optimal threshold determination.

Workflow 7 - Cluster Comparison

  1. Open the cluster comparison panel by clicking on the plus sign.
  2. Select the Gold Standard Attribute 'family' and the Cluster ID Attribute 'TransClust'.
  3. Press 'Run Comparison'. The results of the comparison are shown in a table on the right side of the Cytoscape Desktop.

Workflow 8 - Log Transformation of Edge Weights

This feature of ClusterExplorer is for users that did not use the BLAST2SimGraph plug-in to create a similarity network but imported a network with not preprocessed BLAST E-values as edge attributes. Handling integer similarities is much easier than clustering with log-scaled similarities generated by BLAST.

  1. Open the log transformation panel by clicking on the plus sign.
  2. Select the Edge Weight Attribute, here choose 'Blast2SimGraph_sim' exemplarily.
  3. Keep the default values for the other settings in the panel.
  4. Press 'Run Log Transformation'.
  5. Open to the Edge Attribute Browser of Cytoscape and select the 'Log Edge Weight' attribute. Click an edge edge to see that the log transformed similarity value will be displayed.