IBMSkip to main content
  Home     Products & services     Support & downloads     My account  
  Select a country 
Journals Home 
 Systems Journal 
 ·  Current Issue 
 ·  Recent Issues 
 ·  Papers in Progress 
 ·  Search/Index 
 ·  Orders 
 ·  Description 
 ·  Author's Guide 
Journal of Research
and Development
 Staff 
 Contact Us 
  Related links:  
    IBM UIMA Project  
    IBM BioTeKS Project  
    IBM alphaWorks:
  Semantics
 
IBM Systems Journal 
Volume 43, Number 3, 2004
Unstructured Information Management
 Table of contents: arrowHTML arrowPDF   This article: arrowHTML arrowPDF arrowCopyright info
  

Text analytics for life science using the Unstructured Information Management Architecture - References

by R. Mack, S. Mukherjea, A. Soffer, N. Uramoto, E. Brown, A. Coden, J. Cooper, A. Inokuchi, B. Iyer, Y. Mass, H. Matsuzawa, and L. V. Subramaniam

Cited references and notes

  1. Pharma2010: The Threshold of innovation, IBM Corporation (2002), http://www-1.ibm.com/industries/healthcare/doc/content/resource/thought/390030105.html.
  2. W. C. Swope, “Deep Computing for the Life Sciences,” IBM Systems Journal 40, No. 2, 284–262 (2001), http://www.research.ibm.com/journal/sj/402/swope.html.
  3. J. Augen, “The Evolving Role of Information Technology in the Drug Discovery Process,” Drug Discovery Today 7, No. 5, 315–323 (March 2002).
  4. D. Ferrucci and A. Lally, “Building an Example Application with the Unstructured Information Management Architecture,” IBM Systems Journal 43, No. 3, 455–475 (2004, this issue).
  5. D. Ferrucci and A. Lally, “Accelerating Corporate Research in the Development, Application and Deployment of Human Language Technologies,” Proceedings of the Workshop on Software Engineering and Architecture of Language Technology Systems (SEALTS), Edmonton, CA (May 31, 2003).
  6. A. D. Marwick, “Knowledge Management Technology,” IBM Systems Journal 40, No. 4, 814–830 (2001).
  7. R. Mack, R. Byrd, and Y. Ravin, “Knowledge Portals and the Emerging Digital Knowledge Workplace,” IBM Systems Journal 40, No. 4, 925–955 (2001).
  8. D. Jurafsky and J. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Prentice-Hall, Saddle River, NJ (2000).
  9. R. Baeza-Yates and B. Ribeiro-Neto, B. (Editors) Modern Information Retrieval, ACM Press, New York (1999).
  10. C. Blaschke, L. Hirschman, and A. Valencia, “Information Extraction in Molecular Biology,” Briefings in Bioinformatics 3, No. 2, 154–165 (June 2002).
  11. J. Tsujii, “Tutorial on Information Extraction in Biological Sciences,” Proceedings of the Pacific Symposium on Biocomputing (2001), http://www-tsujii.is.s.u-tokyo.ac.jp/~genia/tutorial/index.html#psb2001.
  12. L. Hunter, On-line Course Notes on Bioinformatics, Center for Computational Pharmacology, University of Colorado Health Sciences Center, http://compbio.uchsc.edu/hunter/.
  13. T. Nasukawa and T. Nagano, “Text Analysis and Knowledge Mining System,” IBM Systems Journal 40, No. 4, 967–984 (2001).
  14. W. Cody, J. Kreulen, V. Krishna, and W. S. Spangler, “The Integration of Business Intelligence and Knowledge Management,” IBM Systems Journal 41, No. 4, 697–713 (2002).
  15. J. Chang, S. Raychaudhuri, and R. Alman, “Including Biological Literature Improves Homology Search,” Proceedings of the Pacific Symposium on Biocomputing, World Scientific, River Edge, NJ, 374–383 (2001).
  16. S.-K. Ng and M. Wong, “Toward Routine Automatic Pathway Discovery from Online Scientific Text Abstracts,” Genome Informatics 10,104–112 (1999).
  17. T. Ono, H. Hishigaki, A. Tanigami, and T. Takagi, “Automated Extraction of Information on Protein-protein Interactions from the Biological Literature,” Bioinformatics 17,155–161 (2001).
  18. T.-K. Jenssen, A.-L. Komorowski, and E. Hovig, “A Literature Network of Human Genes for High Throughput Analysis of Gene Expression,” Nature Genetics 28,21–28 (May 2001).
  19. T. Huynh, I. Rigoutsos, L. Parida, D. Platt, and T. Shibuya, “The Web Server of IBM's Bioinformatics and Pattern Discovery Group,” Nucleic Acids Research 31, No. 13, 3645–3650 (2003).
  20. IBM Research, Computational Biology Center, http://www.research.ibm.com/bioinformatics/.
  21. Swiss-Prot is available (along with other related databases) at http://www.expasy.org/sprot/.
  22. Life Sciences Framework, IBM Corporation (October, 2003), http://www-3.ibm.com/solutions/lifesciences/solutions/framework.html.
  23. DB2 Information Integrator for Content, IBM Corporation, http://www-3.ibm.com/software/data/eip/.
  24. IBM LanguageWare Linguistic Engine, http://www-306.ibm.com/software/globalization/topics/languageware/index.jsp.
  25. The LanguageWare linguistic engine provides dictionaries for several languages. A legacy version, LanguageWare v2, was embedded in the pre-UIMA Textract tool, available in the IM4T toolkit.
  26. Intelligent Miner for Text Product Review, IBM Corporation (February, 1997), http://www-3.ibm.com/software/data/iminer/fortext/.
  27. M. Neff, R. Byrd, and B. Boguraev, “The Talent System: TEXTRACT Architecture and Data Model,” Proceedings of the Workshop on Software Engineering and Architecture of Language Technology Systems (SEALTS), Edmonton, CA (May 31, 2003).
  28. The default training corpus used to train the POS tagger is available from the Language Data Consortium (http://www.ldc.upenn.edu/) and is based on non-biomedical text content and style.
  29. “IBM, SAS join to help automakers to comply with TREAD act,” Computer World (April 3, 2003).
  30. M. McCord, “Slot Grammar: A System for Simpler Construction of Practical Natural Language Grammars,” in Natural Language and Logic: International Scientific Symposium, Lecture Notes in Computer Science, R. Studer Editor, Springer Verlag, Berlin (1990), pp. 118–145.
  31. Unified Medical Language System (UMLS), http://www.nlm.nih.gov/research/umls.
  32. LocusLink, http://www.ncbi.nlm.nih.gov/LocusLink/.
  33. National Library of Medicine, http://www.nlm.nih.gov/libserv.html.
  34. L. Subramaniam, S. Mukherjea, P. Kankar, B. Srivastava, V. Batra, P. Kamesam, and R. Kothari, “Information Extraction from Biomedical Literature: Methodology, Evaluation and an Application,” Proceedings of the 2003 ACM CIKM Conference, New Orleans, LA (2003).
  35. International Union of Pure and Applied Chemistry, http://www.chem.qmw.ac.uk/iupac/.
  36. WEKA Machine Learning Project, http://www.cs.waikato.ac.nz/~ml/.
  37. Personal communication, David E. Johnson, IBM Thomas J. Watson Research Center.
  38. T. Zhang, F. Damereau, and D. E. Johnson, “Text Chunking Based on a Generalization of Winnow,” Journal of Machine Learning Research 2,615–637 (2002).
  39. T. Zhang, “Regularized Winnow Methods,” Advances in Neural Information Processing Systems 13,703–709 (2001).
  40. J. Cooper and R. Byrd, “Lexical Navigation: Visually Prompted Query Expansion and Refinement,” Proceedings of ACM DIGLIB97, Philadelphia, PA (1997), pp. 237–246.
  41. M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig, M. A. Harris, D. P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J. C. Matese, J. E. Richardson, M. Ringwald, G. M. Rubin, and G. Sherlock, “Gene Ontology: Tool for the Unification of Biology,” Nature Genetics 25,25–29 (2000).
  42. E. Brown, A. Dolbey, and L. Hunter, “IBM Research and the University of Colorado TREC 2003 Genomics Track,” Proceedings of the 12th NIST TREC Conference (November, 2003), http://trec.nist.gov/pubs/trec12/papers/ibm-brown.genomics.pdf.
  43. A. Inokuchi and H. Kashima, “Mining Significant Pairs of Patterns from Graph Structures with Class Labels,” Proceedings of the Third IEEE International Conference on Data Mining, Melbourne, FL (November 2003).
  44. The GENIA project Web site provides links to several resources of Junichi Tsujii, http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/.
  45. These three measures are standard measures of accuracy in finding specified categories of objects in search or information extraction. “Recall” is the percentage of correct identifications relative to all possible correct answers in the collection. “Precision” is the percentage of correct identification relative to all correct answers provided by the system. The F-value is derived from and tries to balance the trade-off implied in precision and recall measures (see Reference 8, page 578).
  46. P. Youngja and R. Byrd, “Hybrid Text Mining for Finding Terms and Their Abbreviations,” Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing (EMNLP-2001), Carnegie Mellon University, Pittsburgh, PA (June 2001), http://www.cs.cornell.edu/home/llee/emnlp.html.
  47. Munich Information Center for Protein Sequences, http://www.biochem.mpg.de/home_en.html.
  48. J. Cooper, “An Evaluation of Unnamed Relations in Discovery of Protein-Protein Interactions,” Presented at ACM SIGIR 2003, Workshop on Text Analysis and Search for Bioinformatics, Toronto, CA (2003).
  49. C. Blaschke, M. Andrade, C. Ouzounis, and A. Valencia, “Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions,” Proceedings of the International Conference on Intelligent Systems in Molecular Biology (1999), pp. 60–67.
  50. T. Rindflesch, L. Tanabe, J. Weinstein, and L. Hunter, “EDGAR: Extraction of Drugs, Genes and Relations from the Biomedical Literature,” Proceedings of the 5th Pacific Symposium on Biocomputing (2000), pp. 538–549.
  51. J. Sowa, “Conceptual Structures: Information Processing in Mind and Machine. Reading,” Addison-Wesley, Reading, MA (1984).
  52. M. Gruninger and J. Lee, “Ontology Applications and Design, Introduction,” ACM Communications 45, No. 2, 39–41 (February 2002).
  53. B. Humphreys, D. Lindberg, H. Schoolman, and G. Barnett, “The Unified Medical Language System: An Information Research Collaboration,” Journal of the American Medical Informatics Association 5,1–11, (1998).
  54. D. Carmel, E. Amitay, M. Herscovici, Y. Maarek, Y. Petruschka, and A. Soffer, “Juru at TREC 10 - Experiments with Index Pruning,” Proceedings of the 10th Text Retrieval Conference (2001).
  55. D. Carmel, Y. Maarek, M. Mandelbrod, Y. Mass, and A. Soffer, “Searching XML Documents via XML Fragments,” Proceedings of the Twenty-Sixth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2003), Toronto, Canada, (August 2003), pp. 151–158.
  56. The Juru text search engine is described in http://www.haifa.il.ibm.com/km/ir/juru/ and the version of Juru for indexing and searching XML fields is described in “Juru XML - an XML retrieval system at INEX'02,” http://qmir.dcs.qmul.ac.uk/inex/Slides/YosiMass_etal_talk.pdf.
  57. R. Y. Ando, B. Boguraev, R. Byrd, and M. Neff, “Multidocument Summarization by Visualizing Topic Content,” Proceedings of ANLP/NAACL Workshop on Automatic Summarization, Seattle, WA (2000), pp. 79–88.
  58. S. Vaithyanathan and B. Dom, “Model-Based Hierarchical Clustering,” Proceedings of UAI-2000, Stanford (2000), http://citeseer.nj.nec.com/vaithyanathan00modelbased.html.
  59. K. Krishna and R. Krishnapuram, “A Clustering Algorithm for Asymmetrically Related Data with its Applications to Text Mining,” Proceedings of the 2001 ACM CIKM Conference, Atlanta, GA, (2001), pp. 571–573.
  60. N. Uramoto, H. Matsuzawa, T. Nagano, A. Murakami, H. Takeuchi, and K. Takeda, “A Text-Mining System for Knowledge Discovery from Biomedical Documents,” IBM Systems Journal 43, No. 3, 516–533 (2004, this issue).
  61. S. Grell, “Information Retrieval in Life Sciences: How to Discover Relations between Concepts,” Unpublished master's thesis, University of Heidelberg (December, 2001).
  62. R. Mack and M. Hehenberger, “Text-Based Knowledge Discovery: Search and Mining of Life-Sciences Documents,” Drug Discovery Today, 7,89–98 (2002).
  63. D. Swanson, “Implicit Text Linkages between MEDLINE Records: Using Arrowsmith as an Aid to Scientific Discovery,” Library Trends 48, No. 1, 48–59 (1999).
  64. M. Weeber, H. Klein, T. W. Lolkje, and L. de Jong-van den Berg, “Using Concepts in Literature-Based Discovery: Simulating Swanson's Raynaud-Fish Oil and Migraine-Magnesium Discoveries,” Journal of the American Society for Information Science and Technology 52, No. 7, 548–557 (2001).
  65. P. Kankar, S. Adak, A. Sarkar, K. Murari, and G. Sharma, “MedMeSH Summarizer: Text Mining for Gene Clusters,” Proceedings of the Second SIAM International Conference on Data Mining, Arlington, VA, (2002), pp. 548–565.