[The GelBuddy Project]

GelBuddy's Automatic Signal Detection Algorithm



This document provides an outline of GelBuddy's automatic signal detection algorithm and attempts to explain the significance of many of the settings available in the Analyze Gel window. A detailed description of the algorithm is in press.

GelBuddy begins automatic signal detection by extracting one-dimensonal electropherogram data from the two-dimensional image data, using previously established lane tracks.
[1. Raw image data, 700nm channel]
1. Raw image data, 700nm channel.


[2. Extracted electropherogram data]
2. Extracted electropherogram data, presented as a "virtual gel" image.

Next, the data are resampled, using the de-smiling curves to stretch or squish each lane to compensate for artifactual differences in mobility. This "flattens" the data so that co-migrating bands appear at the same y-coordinate.

[3. Resampled data]
3. Resampled data.

The intensity values are scaled to mean value 1 (reducing variations in lane intensity), and the 20th percentile intensity value in each row is calculated, forming an artificial background pattern. The Background Percentile value is user-adjustable. A higher value will result in a more accurate background pattern but is more likely to cause dark bands to appear in the background pattern at the positions of common polymorphisms, resulting in erratic detection of signals at these mobilities.

[4. Rescaled data and calculated background pattern]
4. Rescaled data and calculated background pattern.


GelBuddy then uses a decorrelation algorithm to subtract the background pattern from each lane, producing a set of "foreground" data in which bands are more easily detected.
[5. Decorrelation output]
5. Decorrelation Output.


The decorrelated data contains high-spatial-frequency artifacts caused by inaccuracy of the desmiling curves and lane-specific variation in background banding. A smoothing pass reduces the intensity of these artifacts.
[6. Smoothed data]
6. Smoothed data.


Each peak in the smoothed data is assigned a signal score based on signal strength. Very weak signals (those below the signal detection threshold) are eliminated, as are signals that appear in the same location in both channels, and those appearing where lane markers are expected. Next, GelBuddy searches for pairs of strong signals (dark bands) whose fragment lengths are close to the full PCR product length. Weak signals will be given a higher pair score if a strong complementary signal is present in the other channel. Signals whose pair score exceeds the confirmation threshold are then marked for review by the user.
[7. Marked bands.]
7. Marked bands. 700nm bands are marked in red, 800nm bands (not visible in this image) are marked in blue.