|

|
Activities Research
Current
- BioTeKS (Biology Text Knowledge Server) - I'm currently working on
text analysis and text mining techniques for the Biomedical and Life
Sciences domains. The goal is to support Biomedical research
(e.g., genomics, proteomics, drug discovery, etc.) with techniques that
facilitate access to the literature in the domain. One of our first
demonstrations of this technology is document clustering, which has been
integrated into the BioDictionary built by the Bioinformatics
& Pattern Discovery Group at IBM Research. To see the
clustering tool in action, go to the BioDictionary
demo, select one of the sample sequences, press "compute",
follow the "Relevant PUBMED References" link, then click one of
the "Cluster abstracts" links.
- Question Answering - I worked with John Prager, Anni Coden, and
Dragomir Radev to develop Predictive Annotation for Question
Answering (see our SIGIR
paper). I also developed the core search engine used by our
Question Answering system, and I continue to collaborate with John's group in
this area.
Recent
- MeetingMiner - The primary goal of this project is to make meetings
more productive by automatically analyzing the meeting discussion and
providing the meeting participants with relevant information from related
knowledge sources. In particular, the system uses the IBM
ViaVoice® speech recognition system to convert the speech to a text
transcript, applies a series of analyzers to identify various information
elements in the transcript (e.g., questions, named entities, topics, etc.),
automatically creates queries, submits those queries to other relevant
knowledge repositories, and integrates the results of those searches back
into the meeting. For more information, see the recent Think
Research article on MeetingMiner.
- Text Categorization - I have also worked on automatic text
categorization, with particular emphasis on Web environments. The goal
is to automatically classify documents into a Yahoo!® style
taxonomy. We developed a system, called WebCat, that uses a k-Nearest
Neighbor classifier at its core and exploits many of the attributes specific
to Web documents. For more information on this project, see the
article in IBM's Think
Research magazine.
Workshops
Professional
- I am currently the Information Director for ACM
SIGIR (Information Retrieval).
If you're interested in information retrieval, web search, clustering,
categorization, question answering, text analysis, or anything related,
check out the website and consider
joining the SIG!
- I typically serve on the program committees for the ACM SIGIR
and CIKM conferences, and review articles
for journals in the field of information retrieval, such as ACM TOIS.
- National Engineers Week
- I'm involved in the IBM Research program to visit local middle schools and
educate students about engineers, including what they do, who they are, and
the advantages of pursuing a career in engineering.
|