Investigating the molecular processes altered in lung cancer

We will use the software application PathVisio to visualize the lung cancer transcriptomics dataset and perform pathway statistics to find pathways that are up- or down-regulated in lung cancer.

You can use UM remote desktop to run PathVisio (it is already installed there!) OR install it on your own computer. Make sure you have Java 8 installed. Then follow the installation instructions.

Set up

  • Check the supporting video (data-download.mp4) on the student portal (Practical 1). On the remote desktop, use the same location as shown in the video (C:/Users/Public/Public Downloads/) to save the pathway-analysis.zip file with all required files for the practical. Otherwise, you might run out of space (be aware that the data will be deleted once you log out so make sure you copy what you want to keep to the I: drive).

  • The folder contains:

    • a subset of the human pathway collection from WikiPathways
    • the lung cancer gene expression data (lung-cancer-data.txt - as described in the lecture)
  • Download the human BrigdeDb identifier mapping file and unzip it in the pathway-analysis folder.
  • Start PathVisio from the Start Menu.
  • Open the cell cycle pathway in PathVisio (File → Open → Browse and select "cell-cycle.gpml" file in the pathway-analysis folder)
  • Load the human identifier mapping database from BridgeDb (Data → Select Gene Database → Browse to Hs_Derby_Ensembl_91.bridge)

Question 1: Computer readable annotated pathway models
Click on the GSK3B gene in the top left. In the "Backpage" tab on the right side, you can find the annotation and cross references for the gene (provided by BridgeDb).
In the cross references, can you find the Ensembl, Entrez Gene and HGNC identifier for this gene?

Assignment 1: Data visualization in PathVisio

Step 1: Import the gene expression data in PathVisio

  • Go to Data → Import expression data
  • Select the lung-cancer-data.txt file as the input file. Then click "Next".
  • Make sure the correct data delimiter is set and you see the following preview. Then click "Next".

alt text

Question 2: Gene identifier
The first column contains the identifier of the genes. From which of the three databases below are the identifiers in the dataset?
- Ensembl
- Entrez Gene
- OMIM
(Required for following steps!)

  • In the next dialog, you need to define the column containing the identifier and the database used. Select the "GeneID" column for the identifier. Based on the preview which "Database" needs to be selected (Question 2)?

alt text

  • The data will now be imported. Before clicking "Finish", check how many rows were imported successfully and how many identifiers were not recognized.

Question 3: Data import
How many rows were imported successfully
How many identifiers were not recognized?

The software now created a file lung-cancer-data.pgex and .pgex.xml. Those files will store all settings and visualizations you create. In the future, you can simply reload your dataset using "Data → Select Gene Expression Data" and browse to the lung-cancer-data.pgex file.


Important! If the number of rows is the same as the number of identifiers not recognized the data import was not done correctly - you probably did not select the correct database! Redo the import or ask one of the instructors for help.

If you clicked finish, you should see a default visualization on the pathway (if all genes are gray, the data import was not successful → please redo the import, make sure you select the correct database in the data import. Click on the GSK3B gene in the top left and check the "Data" tab on the right side → you should see that the GSK3B gene has a log2FC of -0.38?


Step 2: Data visualization The default visualization is just a starting point to visualize your dataset; however if you want to explore your dataset in more detail, other visualization options are available and more suitable for the data nodes (in this case gene nodes).

Question 4: Log2FC values
In the lecture, you got a explanation of the dataset. Can you describe in your own words, what the log2FC means?
In this example dataset, we compare primary lung cancer with healthy lung tissue. What does a positive or negative log2FC mean?

We will now create a visualization in PathVisio to visualize the log2FC as a gradient on the data nodes.

  • Go to Data → Visualization options
  • Create a new visualization named “log2FC visualization” by clicking on the following edit icon in the top right (→ New)

alt text

  • Select the checkbox before "Text label"
  • Select the checkbox before "Expression as color" and then the "Basic" option
  • Select the checkbox before "log2FC" and define a new color set (click on the edit icon for color set)
  • Select "Gradient" and define a gradient from -2 over 0 to 2 (blue – white – red) and click Ok

alt text

Question 5: Biological interpretation
As a whole, is the pathway more up- or down-regulated in lung cancer? Is this expected?
Make a screenshot of the pathway. What do the colors mean?


Assignment 2: Pathway statistics in PathVisio

Step 1: Find up-regulated pathways in lung cancer cells

We now know how the cell cycle pathway is altered in lung cancer but there could be other interesting pathways to look at.

How can we find those pathways? Using pathway statistics we can find those pathways that are more altered than expected (check lecture!).

  • Go to Data → Statistics
  • First, we want to find up-regulated pathways, so we need to define a criteria that selects all genes that are significantly up-regulated: [log2FC] > 1 AND [P.Value] < 0.05
  • Then, we select the directory that contains all human pathways available in WikiPathways. All these pathways will be tested and statistically evaluated if they are up-regulated in our dataset: click on browse and select the "wikipathways-human-pathway-collection" folder
  • Then, we click on "Calculate" and wait for the result table.

Question 6: Biological interpretation
What are the top five up-regulated pathways and what are their Z-scores?
Do you see highly ranked pathways in the result table that you expected to be upregulated in lung cancer? Can you link the shown biological processes to one or more of the hallmarks of cancer?

You can save the result table with all settings used as a .txt file, which can be opened in Excel.


Step 2: Find down-regulated pathways in lung cancer cells

Repeat the same analysis for down-regulated genes → criteria = [log2FC] < -1 AND [P.Value] < 0.05

Question 7: Biological interpretation
What are the top five down-regulated pathways and what are their Z-scores?
Do you see highly ranked pathways in the result table that you expected to be downregulated in lung cancer? Can you link the shown biological processes to one or more of the hallmarks of cancer?

results matching ""

    No results matching ""