GelBuddy Data Format Documentation
This document describes the data format that GelBuddy uses to
save and reload gel markups and to post markup data to a server.
If the files produced by GelBuddy differ from the description in this
document, the format implemented by GelBuddy is correct and this document
is in error.
Introduction
The data saved by GelBuddy to disk and posted by GelBuddy to a web server are identical. Thus, the data
in the file are redundant
and some portions of file (such as the signal pair list and the signal
pair group list) are generated when the
file is saved or posted but are not parsed when reloaded by GelBuddy.
Conversely, when SQUINT receives data from GelBuddy, it parses
only the data it requires. SQUINT stores the
complete unparsed data in its database for future data mining and debugging.
This file format can also be used to transfer data from other applications
to GelBuddy -- for example, an external lane-calling package
could generate a partial markup file containing only lane tracking information,
which the user would then load into GelBuddy.
The GelBuddy file format is based on XML.
The hierarchy of data generated by GelBuddy is guaranteed to follow the
description provided by this document. However, the presence or absence of
tags and the order of tags (except in ordered lists) should not be assumed
and may differ between various releases of GelBuddy.
All GelBuddy .XML files are "well-formed" XML documents, in the sense that they contain properly
nested tags and may be parsed by a generic XML parser. GelBuddy .XML files are not "valid" XML
documents in the sense of having been validated according to a predefined DTD or XML Schema.
Primitives
All elements of a GelBuddy .XML file are of the form
element = <tagname> elements </tagname>
where elements is either a sequence of elements or one of the following primitives:
- string
A string not containing the characters '<' or '>'.
- boolean
The string true or the string false
- integer
An ASCII-formated integer value.
- real
An ASCII-formatted decimal value. Scientific notation is not permitted.
- signalid , signalid
The indicies of two signals, corresponding to complementary 700-labeled and 800-labeled
cleavage fragments. Each signalid is either a positive integer or string literal null.
- <comment> string </comment>
A comment tag may appear in any sequence of elements.
element ... element denotes an ordered list of elements.
High Level File Structure
All GelBuddy XML files are of the form
<?xml version"1.0" ?>
<squintml> generatedby, createdby, gelinformation </squintml>.
where
- <generatedby> string </generatedby>
Denotes the program (specifically, the version of GelBuddy) that saved the most recent revision of this markup.
- <createdby> string </createdby>
Denotes the human (usually the squinter's initials) that generated the markup, as entered in the Gel Infomation dialog.
- <gelinformation> identity, calibrationpointlist, options, groupmode, productlength, signallist, signalpairlist, signalpairgrouplist, laneinfolist </gelinformation>
Contains gel markup information.
Gel Identity Information
- <identity> name, sourcelist </identity>
Contains run name (as entered in the Gel Information dialog) and references to the image files used in generating the markup.
- <sourcelist> source, source </sourcelist>
The list of image files used in the markup. Contains two source entries -- one for the 700nm channel image, one for the 800nm channel image.
- <source> name, path </source>
A reference to an image file. name is platform-independent and contains only the file name.
path is platform-dependent and contains the full path to the image file.
- <name> string </name>
A run name or image file name, depending on context.
- <path> string </path>
A platform-dependent path name
Gel Calibration Information
- <calibrationpointlist> calibrationpoint ... calibrationpoint, lowerlimitpos, upperlimitpos, desmilepoint ... desmilepoint </calibrationpointlist>
A list of calibration points, containing:
- <calibrationpoint> mw, pos, y </calibrationpoint>
The molecular weight, relative mobility, and y-coordinate of a calibration point. GelBuddy always creates
markups with two calibration points
(typically for the 200bp marker and one for the 700bp marker)
The usage of the mw, pos, and y tags is described in the Signals and Signal Grouping Information section below.
- <lowerlimitpos> real </lowerlimitpos>
The position of the lower limit (100%) marker.
- <upperlimitpos> real </upperlimitpos>
The position of the upper limit (0%) marker.
- <desmilepoint> y </desmilepoint>
Specifies the y coordinate of a de-smiling point. (These are present only in the markups of
the AFLP images discussed in the NAR paper. Released versions of GelBuddy allow the user to manipulate
the position of these de-smiling points but does not provide the ability to create new markups with
arbitrarily located de-smiling points. For TILLING image markups, de-smiling curves are constructed
only at the 200bp marker, 700bp marker, and full length product).
Option Settings
- <options>calibrationoptions, analysisoptions, channeloptions, channeloptions, showpoolboundaries</options>
Specifies calibration and image display options. The two channeloptions elements describe
settings for the 700nm channel and 700nm channel images, respectively.
- <calibrationoptions>
desmileprops, channel, calibratefulllengthmarker, calibrateuppermarker, calibratelowermarker,
forcezeropercentmw, fixlimits </calibrationoptions>
- <desmileprops> leftlaneistemplate, topbandlimit, bottombandlimit, window </desmileprops>
Specifies parameters to the calibration algorithm. See the NAR paper for an explanation of how GelBuddy
uses these parameters.
- <leftlaneistemplate> boolean </leftlaneistemplate>
Allows the use of the left lane as a template for all de-smiling offsets.
This option is always false for TILLING and Eco-Tilling markups.
- <topbandlimit> integer </topbandlimit>
The maximum offset, in pixels per lane, for the full-length product and upper marker (700bp) de-smiling curves.
- <bottombandlimit> integer </bottombandlimit>
The maximum offset, in pixels per lane, for the lower marker (200bp) de-smiling limit. This is set to a lower
value than topbandlimit to provide greater immunity to spurious 'junk' at the bottom of the gel.
- <window> integer </window>
The log-base-2 of the size of the window used to calculate de-smiling lines, in pixels.
- <channel> integer </channel>
Specifies which image is to be used for construction of de-smiling lines; 0 denotes the 700 channel and 1 denotes the 800 channel.
- <calibratefulllengthmarker> boolean </calibratefulllengthmarker>
Specifies whether image data is to be used to calibrate the full length (0%) marker.
If false, the position of the full-length marker in each lane is extrapolated from the
positions of the 200bp and 700bp markers.
- <calibrateuppermarker> boolean </calibrateuppermarker>
Specifies whether image data is to be used to calibrate the upper (700bp) calibration marker.
If false, the position of the 700bp marker in each lane is extrapolated from the
positions of the 200bp and 0% markers.
NOTE: If both calibratefulllenghtmarker and calibrateuppermarker are both
false, GelBuddy will assume that the the position of both markers is the same for every lane of
the gel.
- <calibratelowermarker> boolean </calibratelowermarker>
Specifies whether image data is to be used to calibrate the lower (200bp) calibration marker.
If false, the position of the 200bp marker is assumed to be the same for every lane of the gel.
- <forcezeropercentmw> boolean </forcezeropercentmw>
Specifies whether the 0% marker is forced to the size of the full-length product, as entered
in the Gel Information dialog. If false, the size of the 0% marker is inferred by its
position relative to the 200bp and 700bp markers.
- <fixlimits> boolean </fixlimits>
Corresponds to the Reposition 0%/100% Markers When Adjusting Upper/Lower Standards option setting.
- <analysisoptions> detectionmode, detectionthreshold, backgroundpercentile, confirmationmode,
confirmationthrehsold, ignorelanemarkers, ignorecoincidentsignals, onesignalsperpool </analysisoptions>
Specifies the options for automatic signal analysis.
- <detectionmode> integer </detectionmode>
The background-subtraction method.
- <detectionthreshold> integer </detectionthreshold>
The signal-detection threshold.
- <confirmationmode> integer </confirmationmode>
The signal-confirmation method.
- <confirmationthreshold> integer </confirmationthreshold>
The signal-confirmation threshold.
- <ignorelanemarkers> boolean </ignorelanemarkers>
If true, GelBuddy will ignore signals at 200bp in lanes 3-5, 11-13, 19-21, etc.
- <ignorecoincidentsignals> boolean </ignorecoincidentsignals>
If true, GelBuddy will ignore signals that occur at the same locations in both channels.
- <onesignalperpool> boolean </onesignalperpool>
If true, GelBuddy detect only the strongest-scoring cleavage fragment in a pool set. Not completely
implemented, and not exposed in the GelBuddy user interface.
- <backgroundpercentile> integer </backgroundpercentile>
The percentile rank used for background pattern construction.
- <excludedregionlist> excludedregion ... excludedregion </excludedregionlist>
A list of regions to be excluded from automatic analysis. This is intended as a performance evaluation aid and
is not exposed in the GelBuddy user interface.
- <excludedregion> min max </excludedregion>
- <min> real </min>
- <max> real </max>
The extent of a single excluded region, in base pairs.
- <channeloptions> blacklevel, gammalevel, whitelevel </channeloptions>
Determines how each channel of the image should be displayed.
- <blacklevel> real </blacklevel>
A value between 0.0 and 1.0. Default is 0.0. Larger values darken the image.
- <gammalevel> real </gammalevel>
A value between 0.0 and 1.0. Default is 0.5 (corresponding to an exponent of 1 in the image display transfer function).
Smaller values darken the image, larger values brighten the image.
- <whitelevel> real </whitelevel>
A value between 0.0 and 1.0. Default is 1.0. Smaller values lighten the image.
- <showpoolboundaries> boolean </showpoolboundaries>
Corresponds to the Show Pool Boundaries option setting.
- <productlength> real </productlength>
The size of the full PCR product, in base pairs, entered in the Gel Information dialog.
- <groupmode> string </groupmode>
The grouping mode, as entered in the Gel Information dialog. The grouping mode may be either
all, specifying that signals from any two lanes may be grouped, or group16, specifying that
only signals from the same row/column set (i.e. set of 16 consecutive lanes) may be grouped.
Signals And Signal Grouping Information
- <signallist>signal ... signal </signallist>
A list of all signals (bands) in the gel markup.
- <signal> id, groupid, channel, lane, mw, pos, x, y </signal>
A signal (band). Each signal entry contains the following information:
- <id> integer </id>
The serial number of a signal.
- <groupid> integer </groupid>
The group number of a signal. Each set of co-migrating bands in each channel is given a unique group number.
Grouping information is generated automatically by GelBuddy, but may be changed by the user (in Group Edit mode),
and may be constrained by the pool size and signal grouping settings in the Gel Information dialog.
- <channel> integer </channel>
Channel number; 0 denotes the 700 channel and 1 denotes the 800 channel.
- <lane> integer </lane>
Lane number. Lane 1 denotes the leftmost lane.
- <mw> real </mw>
Predicted fragment length in base pairs.
- <pos> real </pos>
Relative mobility, in percentage migration between the 0% marker and the 100% marker. GelBuddy does not actually store the relative mobility
of each signal -- instead, relative mobility is calculated from fragment length (in bp) and the reported
calibration information and in some cases will not represent the true relative mobility of the fragment.
This allows GelBuddy to use more accurate calibration formulas yet retain backward compatiblity with SQUINT.
For most applications it is recommended that <mw> be used instead.
- <x> integer </x>
The x-coordinate of the signal, in source image pixels.
- <y> integer </y>
The y-coordinate of the signal, in source image pixels. 0 represents the top of the image as displayed
by GelBuddy, and not necessarily the top of the image file itself. GelBuddy flips raw 16-bit TIFF images but
does not flip compressed 8-bit JPEG images.
- <detectionscore> integer </detectionscore>
S_signal (for automatically detected signals only)
- <confirmationscore> integer </confirmationscore>
S_pair (for automatically detected signals only)
- <verified> boolean </verified>
Set to true if the users has marked the signal as "verified". (See the "GelBuddy Automatic Gel Analysis"
document for details.)
- <signalpairlist> signalpair ... signalpair </signalpairlist>
A list of signal pairs. Automatically generated by GelBuddy.
- <signalpair> signalid, signalid </signalpair>
A signal pair, representing corresponding 700nm channel and 800nm channel signals.
Each signal pair corresponds to a line in the
squint (not eco-squint) form. The first integer is the index of a 700nm channel signal and the second signalid
is the index of a 800nm channel signal. Either signalid may be set to the text value null,
indicating that a 700nm channel signal has been entered without a corresponding 800nm channel signal, or vice versa.
- <signalpairgrouplist> signalpairgroup ... signalpairgroup </signalpairgrouplist>
A list of signal pair groups. Automatically generated by GelBuddy.
- <signalpairgroup> signalpair ... signalpair </signalpairgroup>
A signal pair group. Each signal pair group corresponds to a line in the eco-squint form.
Lane Track Information
- <laneinfolist> laneinfo ... laneinfo </laneinfolist>
A list of lane tracks and other per-lane information. One <laneinfo> tag is stored per lane.
- <laneinfo> pointcount, firstpoiint, lastpoint, pointlist, flags </laneinfo>
A single lane track and other per-lane information.
- <pointcount> integer </pointcount>
Each lane track contains up to pointcount control points, equally spaced.
NOTE: GelBuddy will behave erratically if pointcount is not equal for all lanes in a markup.
- <firstpoint> integer </firstpoint>
The index of the first control point.
- <lastpoint> integer </lastpoint>
The index of the last control point.
- <pointlist> x ... x </pointlist>
A list of lastpoint-firstpoint+1 control points.
- <x> integer </x>
The x-coordinate of the control point.
- <failed> boolean </failed>
Set to true if the lane has been marked as 'failed' by the user.
- <edited> boolean </edited>
Set to true if the lane or any of its control points have been displaced by the user.
- <inserted> boolean </inserted>
Set to true if this lane was manually inserted using the Insert Lanes dialog.
Client/Server Communication
GelBuddy posts the following data to a URL specified by the user using an HTTP POST command:
name=username&pw=password&data=XML data
The server should respond with a plaintext message indicating success or failure. GelBuddy will present
this message to the user but will not parse it.