CORNET: Correlation networks: Frequently Asked Questions

Frequently asked questions

Technical questions
Tool questions

Technical questions

1. Why does this site look all messy?: That's because you're using Internet Explorer. Don't use it, use Firefox.
To view this site use at least a 1280x1024 resolution and view it in Firefox.
2. Why does Firefox ask me to cancel the running script?: Firefox thinks the script has crashed because it takes so long. But that's normal, it takes a while to load some information from the database.
To fix this problem type in "about:config" in the address bar in Firefox, hit Enter and change the value of dom.max_script_run_time to 60 or higher.
3. Why is Cytoscape not starting?: Calculations are done, loading is finished, but Cytoscape does not start. You need to allow pop-ups in your browser. You also need an up to date version of Java (see Tool question 7).
4. Why do I have to download Cytoscape every time I launch the Cytoscape view?: You need to make sure caching is enabled for the Java plugin. You can find out how to enable caching on this page. Make sure the cache size is big enough (50-100 MB).

Tool questions

1. Can you give me some more information about the precalculated experiment sets?

This Zea mays CORNET version contains two public expression compedia:

Affymetrix Maize Genome Array Compendium extracted from GEO including unique 128 experiments whereas replicates are summarized calculating the average value (GEO, 12/2010). In total 24 Series (GSE21070, GSE8188, GSE15048, GSE8278, GSE8275, GSE10023, GSE8194, GSE7030, GSE16567, GSE19501, GSE8308, GSE10237, GSE10236, GSE8320, GSE8179, GSE11531, GSE8174, GSE22479, GSE8176, GSE18491, GSE15371, GSE12892, GSE12770, and GSE10243) containing studies of cis-transcriptional variation in different inbred lines, expression profiling of mutants, non-adaptive and imprinted gene expression, different tissues as well as infection with pathogenes were downloaded, combined, and pre-processed using a custom-made CDF file.
Nimblegen Maize Whole-Genome Microarray 385K (VersionV1_4a.53) taken from Sekhon et al., (2011) including 60 experiments (GEO, 06/2011).

Both compendia contain absolute expression values resulting from the microarray experiment. All possible correlation coefficients are pre-calculated and only those with a corresponding p-value < 0.05 (Bonferroni corrected) are stored into a database for searching (no calculations on the fly).

2. Why and how to use multiple precalculated expression datasets?

The correlation coefficient for two genes can vary considerably depending on the input expression dataset.

By selecting multiple expression datasets, co-expression in different conditions can be studied simultaneously. The expression datasets that can be selected are described in Question 1.

Co-expression links are reported when they meet the requirements set in step 3 (Pearson correlation coefficient threshold, top most correlated genes, or both (see Question 5)). One can report only those co-expression links that meet the requirements based on all selected expression datasets or at least X expression datasets.

The correlation coefficients found in the different datasets will be reported in the "dataset coefficients" attribute (see Question 7). On the edges, either the minimum, maximum or average correlation coefficient ("corrcoeff" attribute) over the datasets meeting the requirements will be shown.

3. What kind of annotation data is used?

Both experimentally identified and predicted localizations are used. The source of the data and according evidence code is mentioned in the Cytoscape attributes.

Data type	Data source	Download date
cellular_component (GO)	MaizeSequence.Org	2010-07-21
biological_process (GO)	MaizeSequence.Org	2010-07-21
molecular_function (GO)	MaizeSequence.Org	2010-07-21
INTERPRO_domain	MaizeSequence.Org	2010-07-21
pathways	MapMan	2010-02-12

4. What kind of protein-protein interaction data is used?

The data are transfered from the species Arabidopsis thaliana via homology studies using OrthoMCL results from PLAZA v2 framework. Both experimentally identified and predicted protein-protein interactions are used. The source of the data is mentioned in the Cytoscape attributes.

Data type	Data source	Download date
protein-protein_interaction	ArathReactome	2011-09-29
protein-protein_interaction	AtPID	2011-09-20
protein-protein_interaction	BioGRID	2011-09-01
protein-protein_interaction	Geisler-Lee et al., 2007 (BAR, Arabidopsis Interactions Viewer)	2007-08-03
protein-protein_interaction	IntAct	2011-09-14
protein-protein_interaction	TAIR	2011-09-15
protein-protein_interaction	De Bodt et al., 2009 (filtered and predicted)	2008-12-09
protein-protein_interaction	DIP	2010-10-10
protein-protein_interaction	MINT	2011-07-08
protein-protein_interaction	Arabidopsis Interactome Mapping Consortium - Yeast-2-hybrid	2011-08-01

5. What guidelines should I follow when setting the options in Step 3?

Start with stringent parameters depending on the number of query genes:
- High correlation coefficient (e.g. 0.8)
- Low number of neighbours (e.g. 10)
Gradually loosen the stringency of the parameters
When the co-expression network is too large, a warning will be given. A text output instead of a visual Cytoscape output can be chosen

6. What do all the options in Step 3 mean exactly?

Pairwise comparisons:
All possible combinations between query genes are made. A correlation coefficient is calculated for each pair of genes
Neighbours:
Add extra genes. Every query gene is compared to the complete Arabidopsis genome. A gene pair with a correlation coefficient above the chosen thresholds or a gene belonging to the top X most correlated genes is reported.
Thresholds:
Two thresholds can be chosen.

Correlation coefficients higher and lower than a certain value (if given) are reported.
The top X genes with highest correlation coefficients are reported.
When using multiple datasets, the same approach is followed for each expression dataset and results are combined or intersected based on the "atleast" parameter. For example, if atleast=2 is chosen, only co-expression links meeting the requirements for two or more datasets are reported.

The most limiting threshold always has priority.
(e.g. If you ask for top 10 neighbours but only 6 meet the first threshold's requirement, only 6 are shown. If you ask for top 10 neighbours but many more meet the first threshold's requirement, only 10 are shown.)
Relations between neighbours:
If you want to test if the neighbours of your query gene are co-expressed, you need to choose this option.

7. What is Cytoscape and how do I use it with this tool?

Cytoscape is an open source bioinformatics software platform for visualizing molecular interaction networks and integrating these interactions with gene expression profiles and other state data. Cytoscape will launch through JAVA™ Web Start Launcher
You may need to update JAVA™ for the Cytoscape Web Start.
Get the latest JAVA™ Runtime Environment
You can also download and install the latest version of Cytoscape locally on your computer.

To view more information about an edge or a node, click on the "Select attributes" button in the Data Panel.

Then select all the desired columns you want to be visible.
This table is an overview of the available attributes:

*Node attributes*
ID	GRMZM code
description	GRMZM code (and alias, if available)
descriptionLong	short MaizeSequence.Org functional description
gene.selected	indicates in which step of the application this gene was a query gene
localization	localization of a gene derived from the GO cellular component information and used for the visualization within the network (see legend)
localizationReferences	references and evidence codes of the localization of the gene Hint: Move your mouse over this column and wait for the tooltip to appear. The tooltip is nicely formatted HTML and easier to read.
biologicalProcess	GO biological process
molecularFunction	GO molecular function
cellularComponent	GO cellular component
proteinDomains	InterPro protein domains
link:db	identifiers for knowledge search in external databases, e.g. db = {KEGG, KOG, PLAZAv2, Panther, Pfam, EntrezGene, GenBank, RefSeq, UniGene, UniProt, Enzyme} Note, some of the identifiers are based on orthologous gene information!
athOrtholog	orthologous information for Arabidopsis
osaOrtholog	orthologous information for rice
zmbOrtholog	orthologous information for maize
athDescOrthologs	short TAIR functional description of the orthologous Arabidopsis genes
MapMan	MapMan pathway information
*Edge attributes*
ID	label describing the relation between 2 genes: (cor) = correlation, (pp) = protein-protein interaction
interaction	kind of relation between genes
corrcoeff	The correlation value [-1.0,1.0] between 2 genes if the edge indicates a correlation
n_ref	how many times this protein-protein interaction is referenced
features	detailed information about the references and the orthologous relationship which was used to predict this protein-protein interaction Hint: Move your mouse over this column and wait for the tooltip to appear. The tooltip is nicely formatted HTML and easier to read.
prop	indicates if the protein-protein interaction is experimental or predicted
dataset coefficients	individual coefficients for the selected datasets
dataset pvalues	individual p-values for the selected datasets
matching datasets	number of datasets returning a correlation coefficient matching the selected criteria

8. What do the colors and shapes mean in the networks in Cytoscape?

Visualization legend:

*Correlation networks*
Edge color
Query gene
*Protein-protein interactions*
Edge color	black
Edge width	the more references, the wider the edge
Edge style	= experimental = predicted
Query gene
*COR and PPI*
Query gene
*Localization pie colors* (the size of a pie slice represents how much a value is referenced compared to the others)
	CYTOSKELETON
	PEROXISOME
	OTHER
	CELLPLATE
	NUCLEOLUS
	CYTOSOL
	CHLOROPLAST
	NUCLEUS
	MITOCHONDRIA
	ENDOSOME
	PLASTID
	VACUOLE
	ENDOPLASMICRETICULUM
	EXTRACELLULAR
	PLASMAMEMBRANE
	GOLGI

9. What kind of input data could be used?

By default the input data are GRMZM codes of your genes of interest. Other maize identifiers, such as PLAZA v2.0 identifiers (e.g. ZM08G15930), Affymetrix probe identifiers (e.g. ZM.17362.s1_at), or Maize Oligonucleotide Array identifiers (e.g. MZ00002380 from Arizona Maize Array) are also accepted as input. Additionally, the program allows also orthologous gene identifiers from Arabidopsis using the AGI code (TAIR identifiers, e.g. At2g33610), orthologous gene locus identifiers from rice (TIGR identifiers without the LOC_ prefix, e.g. Os02g10060).

Note that, if an orthologous gene (from Arabidopsis and/ or Oryza) is given and if it is part of a group of orthologous genes with a many-to-many relationship (extracted using OrthoMCL results from PLAZA v2 framework) then from this orthologous group all maize genes will taken into account.

User
Password