Genomics Messaging System (GMS)
The current main thrust of the genomics part of our research is the Genomic Messaging System (GMS) for representation, transmission, and storage of patient genomic information. This research currently focuses on helping in the construction of the unified clinical and genomic record, and exploring the standards required.
The core function of the GMS software is to prepare the genomic information, compress and encrypt it, transmit (or store) it, and decompress and decrypt on receipt (or recovery from storage). This core function, however, is merely the underlying data-representation structure of a larger system, which has the potential to cover many features of clinical bioinformatics.
The information transmitted or stored comprises a stream of commands in a language called the Genomic Messaging System Language or GMSL. This language is highly condensed using Shannon-information-theoretic principles. Each command and data element is represented by an 8-bit byte, including bytes that represent the bases of the DNA itself, at various optional levels of compression, down to four base pairs per byte. GMSL deals with typing the data transmitted, embedding applets, controlling the transmission process, performing error checks, and adding features such as password levels at different points and in relation to new files (to give layered levels of access). Importantly, the language provides basic support features for annotation of the DNA by the clinical genomicist. Translation of the DNA sequence into protein in all six reading frames is built into the GMS program and is done on receipt.
The functionality is greatly extended by plug-in packages or "cartridges" both at the input and output ends of the messaging. These enable conversion between GMSL and other XML representations, including Clinical Document Architecture. They also include miniature "expert systems," which will add automatic annotations at both the DNA and protein sequence levels, merging them with any annotations added interactively by the user. They also include specialized display and interaction cartridges.
The most sophisticated cartridge is a basic automated protein modeling suite, which will model the patient's polymorphic protein from the transmitted gene. The first genuine transmission of patient DNA data and modeling of the protein took place in 2001 at IBM Research in Yorktown, a small but interesting "first" in clinical bioinformatics. Ultimately, this process will be extended to the direct screening (and even design) of drugs against this protein, in virtual reality, as support for fast-response personalized medicine.