THE MAP CONVERSION TOOL

                       1. Goal

  The present application was designed to reach two mayor complementary goals:

  • It allows the punctual conversion of units of the different genetic linkage maps (both traditionally recombination-based and radiaton hybrid maps) included in our database to an estimated physical position in base pairs (click here to see an example) This is important for further analysis as provided by EnsEMBL, for example transcriptomics, promoters or synteny analyses, among many others.
    Furthermore, also comparisons between mapping systems are allowed simply by checking a second map for the same chromosome in the interface. Note that in such a case the values for conversions are ignored and instead only the correlation values (classical c.c. and Spearmann c.c.) are calculated.

  • It allows a global overview of the relationship between these different mapping systems, and reflects the irregular distribution of the probability of recombination for equal physical distances.

                       2. Conversion algorithm

The algorithm used for the calculation of the regression curve is based on a scrolling window model. Multiple regression lines are calculated; one for each window as it slides along overlapping positions through the data set. The regression lines are normal (first derivates) to the calculated regression curve that extends through the averages of each window.

The resulting regression curve is continuous, but has not always necessarily a positive slope and so adapts very accurately to every kind of  marker distribution. The regresssion curve is plotted in red. Analogously, the 95% confidence interval is calculated separately for each window and thus is sensitive to local data density and distribution. The confidence intervals are plotted in cyan.


                       3. User interface



The user interface includes some features required for further calculations, as well as optional ones. All required features are preset to a default value that can be changed, within a given range, by the user through the web interface (see figure above). First of all, the user has to select the chromosome for which conversions or comparisons are going to be made (chromosome Y is unfortunately not sufficiently mapped at the moment).

The graphic display can be optionally enriched with the plot of the linear model (regression line) best fitting the data pool.

In some cases the user may decide to include genes as well as genetic markers in the data set to improve the accuracy of the conversion by increasing the size of the sample. A good example of the usefulness of this feature can be tested for the chromosome 2 of the mouse. The regression curve calculated for the marker (click here to see an example) set is very different than the one calculated for the set of markers and genes (click here to see an example).

The algorithm used for the calculation of the regression curve (see conversion algorithm, above) can be modified by the user. He may choose the size (in number of markers) of the scrolling window. A smaller window will produce a regression curve fitting the data best, but more sensitive to local distortions (click here to see an example), and vice versa (click here to see an example). The threshold to consider a marker an outlier or not can be set manually. A threshold at 100% tolerance will include all data of the set. Very small tolerances may, on the other hand, disfigure the regression curve (see below, interpretation of the results).

At the last step, the user is asked to type in the text box the genetic position value he wants to convert into a physical position (or vice versa). Once the form is filled, the submit button may be clicked to refresh the display.


                      4. Interpretation of the results

Unfortunately, accuracy and reliability of estimations don't go hand in hand. Usually, as for this case, increasing one of them will decrease the other one. Depending on the research topic, the user may be interested in more accurate or more reliable estimations for the physical position corresponding to a given genetic position.

In the first case - getting the most likely position - the confidence intervals are not considered and the estimation for the conversion will be simply the corresponding y-value on the regression curve (A->B; see figure below).

In the second case - getting a reliable interval as an estimation of the position - the confidence intervals are taken as borderlines of the estimation. The user is recommended to take the most distal estimation (A->B') if the estimated position is a flanking marker (see figure below).


The most frequent exclamations when using this program are dealt below:
  • The regression curve looks like abruptly interrupted!
a)  The threshold chosen for the outlier tolerance is too low for the current data set. The threshold is too sensitive and normal data are recognized as outliers.
b)  The map used as not a detailed description of the current chromosome. There are simply no more data to show.
  • The display appears empty!
a)  The map used as not a detailed description of the current chromosome. There are simply no more data to show. This is actually the case for chromosome Y.
b) There could be a bug in the script (we apologize in advance). Please send us a description of the problem to this e-mail direction: serrano@pzr.uni-rostock.de and we will try to solve the problem as soon as possible.
  • The regression curve does not reach the telomeres!
This is a normal situation. The scrolling window used in the conversion algorithm (see above) slides over the data overlapping with the neighbouring windows except for the telomeres. The first and window end at a distance equal to half the size chosen for the window (in number of markers) before reaching the extremes of the chromosome. For unit conversions at this regions we recommend the use of a linear regression algorithm (and to be very careful, because the distribution of the data is particularly irregular at the telomeres).
  • The linear model (regression line) goes partially outside of the confidence intervals!
The linear model ("regression line" in the user interface; see above) may go beyond the confidence intervals of the regression curve calculated here. These shifts can be applied as a standard to consider the regression line - at least locally - a bad predictor for the physical equivalent of a genetic position. This is exemplified in the figure above at about 100 Mega base pairs / 55 centi Morgans.