Intergenomics on QTLsTopic
Quantitative trait loci (QTL) mapping efforts have been performed independently for several species for the same traits. If the animal models are driven by the same genetic mechanisms as those for the human diseases, we should expect to find common conserved sequences shared by the QTLs and susceptibility regions of all three organisms. Genes present as homologues in the QTLs of all three species, arise as best candidates to be relevant for the onset and/or further development of the disease.
A comprehensive QTL database for rodent EAE was created with data collected from the public databases of the NCBI, the Jackson Laboratory (MGI: Mouse Genomics Informatics) and the Rat Genome Database (RGD) of the Medical College of Wisconsin. The data were complemented with human MS predisposition loci and genes assembled both from public databases of the NCBI and recent large scale genetic association studies on MS (mainly: J Neuroimmnol vol 143/1-2).
The rest of databases (alternative QTLs, homologous gene pairs, sequence similarity for syntenic blocks) used rely on a local installation of EnsEMBL.
Consensuses may occur by chance with a certain probability. In order to determine that probability, here we test the performance of the intergenomics QTL synteny/homology tool on randomly located QTLs of the same size as the original ones (Permutation test). With an increasing number of iterations, the calculated distribution fits the real random distribution. That distribution yields the probability of finding by chance a certain number of genes in consensuses. Whenever the observed number of genes in consensuses is above the 95th percentile limit, it can be concluded that the observed number is reached or surpassed only five from hundred cases (i.e. p<0.05).
WARNING: Depending on the number and the size of the loci added to the system, the consensus search may take several hours to complete. However, the browser allows mostly to follow the process as it goes on (does not apply to some versions of Konqueror).
There are two ways implemented for setting the limits of a QTL / susceptibility locus. The first one fixes those borders on the corresponding centroid value (default). That value corresponds to the most probable physical position of the given genetic marker as calculated from the genetic position by CARTOGRAPHER (Voigt et al. 2004 Non-linear conversion between genetic and physical chromosomal distances. Bioinformatics, in press). The conversion was made with an outlier tolerance of 20% and a scrolling window size of 20 points. For those markers not listed in base pair position or, instead, the physical position when available in EnsEMBL. The second way is to extend the borders to the external confidence interval as provided by CARTOGRAPHER. This option is only valid for a search based on synteny.
This checkbox allows to switch between the full table display (see OUTPUT) and the compact display showing only the final results (quicker). This option is only valid for a search based on synteny.
The output is formatted as a large HTML table. The left column series (yellow) displays the chromosomal regions analyzed for the source species. The middle part (light orange) - if present - shows the chromosomal regions inside of QTLs of the second source species that are syntenic to the ones of the left column series (yellow). The right column series displays the chromosomal regions syntenic to the foregoing consensus regions between both source species (in case of three-way analysis) or syntenic to the only source species (in case of two-way analysis) outside (dark orange) or inside (red) of the QTLs or susceptibility loci of the target species. The loci displayed all link to the EnsEMBL contigview in order to assist in further analyses.
The output statistics are furthermore summarized in a table. The user can read the nr. of base pairs and genes affected by the consensus and - in case of search based on synteny - details about the the number of DNA fragments matched and later merged with their corresponding base pair sizes.
The validation process is based on a permutation test. The original QTLs and SLs submitted are merged where overlapping (shown in the textareas under "Merged QTLs and SLs") and then randomly rearranged over the genome. The size of the QTLs / SLs is respected and overlaps avoided, but their position is set randomly in each iteration.
Normalize QTL sizes (recommended)
QTLs and SLs may be located in chromosomal regions with unusual high or low gene density. This is particularly clear for QTLs / SLs affecting the MHC-Locus (very high gene density). When rearranging such a QTL / SL in each iteration its size is only respected in terms of base pairs. Normalization of the QTLs / SLs respects the original number of genes in the region to amielorate the effect of such disproportions in gene density on the permutation test.
With increasing number of iterations, the distribution of the results of the permutation test approximates to an ideal random distribution. Such distribution is then used to match the number of observed consensus genes. If clearly at marginal percentiles of the random distribution (e.g. over the 95th percentile = p<0.05), the result can be considered as not met by chance. The script allows a maximum of 10 iterations in its on-line version clearly below the 1000 recommended. However, because of limitations of computing ressources we cannot offer that possibility on the web. We encourage the interested users to consider a (free) local installation of our software to solve this (see availability).
WARNING: With increasing sizes of QTLs / SLs the chance to generate a random rearrangement of the QTLs / SLs without overlaps increases geometrically, thus greatly affecting the computing time. For the case of whole genome comparisons (see INPUT) the option for validation has been intentionally disregarded.
This box informs about the type of permutation that is being currently done. For validation of data generated by means of synteny, the type of permutation should by synteny, and analogously for data generated by means of homology. The web-interface allows only permutations based on homology, again because of lacking computing ressources. However the local installations would only need to activate a certain part of the script to use permutations based on synteny (it is already implemented). It is also important to keep in mind that the differences between both types of permutation are relatively little (about 10% difference in amount of genes detected) but the difference in computing time is dramatic (from few seconds to few minutes for homology, to several hours for synteny), so it may be interesting for the user to check always the type "homology" first and only dare the type "synteny" if the results for "homology" were promising enough.
Show detailed results list
This option allows the user to get the detailed list of the results generated for each iteration. This may be particularly useful for exporting to other software programs in order to represent the random distribution curve and the cutoff of the observed value.
The figure illustrates the distribution of the number of consensus genes calculated for each iteration in the permutation test (blue vertical bars) in comparison to the observed ones (red bar). If the observed number of genes is found at the green region of the distribution, it can be said that the observed number of consensus genes is significantly higher than expected by pure chance. In other words, the QTLs / SLs analyzed must share indeed a common - at least partial - explanation for the observed phenotype. However, for a maximum of 10 iterations as offered by this web-interface, one should take the calculated p-value with caution (e.g. here p=0). A local installation of the application will allow you to increase the number of iterations. The p-value for 100 iterations will be a good trend indicator and it stabilises at the second decimal position before reaching the 1000th iteration.
Each consensus region, taken as a whole, may be syntenic or homologous to one or more consensus regions of the other species. In the latter case we have a reasonable hint to think that a consensus that seggregates into two in another species could be bearing at least two genes relevant for the trait. The argumentation is then analogous for increasing number of combinations of consensuses. The table above shows all combinations found and includes a number in parenthesis after each merged QTL / SL that stands for the number of different combinations with consensuses of the other species homologous / syntenic to it, This number is an estimation of the minimum number of disease-relevant genes in that consensus.
The web interface is programmed in PHP4 and is divided in three subprograms:
The local databases are available in different formats and always separately for each metatrais: multiple sclerosis (MS) and rheumatoid arthritis (RA) and their respective animal models in mouse and rat, the experimental autoimmune encephalomyelitis (EAE) and the collagen or pristane induced arthritis (CIA/PIA).
The scripts and database available here are Open Source. This means they are license-free in use and distribution. However, the authorship should be indicated whenever the scripts/database are publicly used. The links and paths to the database in the PHP scripts and between the PHP scripts refer strictly to our local network architecture and should therefore be readjusted after a local installation (!). Developers are encouraged to feed back corrections and improvements, and any user is of course welcome to collaborate in debugging the software.
AuthorsQTL View was created by Pablo Serrano-Fernández, Steffen Möller, Saleh M. Ibrahim, Hans-Jürgen Thiesen (Immunology, University of Rostock), Uwe K. Zettl (Neurology, University of Rostock), René Gödde and Jörg T. Epplen (Human Genetics, University of Bochum).
For technical help, comments or questions please contact Pablo Serrano-Fernández or Steffen Möller.
|Back to the application, back to qtl.pzr.uni-rostock.de|