Computational Lexicology, Computational Linguistics, Natural Language
Processing, Information Retrieval, Text Analysis and Mining, Syntax and
Semantics, Parsing, Finite State Automata, Knowledge Management, Visualization
General Research Areas
- Text Analysis: recognition and extraction of lexical entities such
as names, terms, dates, money, abbreviations, vocabulary generation.
- Document Representation: includes topic shift detection, single
document summaries (derived via different strategies), multiple document
summarization, multi-threaded summaries, document structure analysis.
- Cross-document Coreference: inter- and intra-document aggregation
of disambiguated entities in text and corpora.
- Natural Language Processing: shallow parsing, part-of-speech tagging,
anaphora resolution, coherence determination.
- Search Enhancements: query refinement, document expansion.
- Speech Mining: cleanup and text analysis of ASR transcripts.
- Navigation and Visualization: active markup, lexical navigation, dynamic
- Knowledge Management: the extraction, representation, and application
of domain specific vocabularies and relationships from text.
Research Projects and Technologies
- TALENT (Text Analysis and Language Engineering Technology): These
tools analyze text and extract meaningful lexical information from it,
including concept names and relationships among them. Toolkit highlights
- morpho-lexical analysis
- named entity extraction,
- technical terminology identification,
- abbreviations processing,
- part-of-speech tagging,
- lexical relations highlighting,
- topic segmentation,
- cross-document coreference
- Summarizer: This is a system which uses advance text analysis techniques
to produce indicative summaries of documents. Summaries are intended
for use in document management and retrieval systems, where their role
is to provide users with concise, readable representations of documents'
contents. Summarizer uses a "summarization by sentence extraction"
approach to generate a document summary. Its algorithm comprises a set
of strategies for ranking the sentences in a document by salience, and
for extracting the most salient sentences to produce a summary of any
- Finite State Language Modeling (Intex): INTEX is a Natural Language
Processing development environment based on Finite State Transducers
(FSTs). It parses texts of several million words, and includes large-coverage
dictionaries and grammars. INTEX builds lemmatized concordances
and indices of texts with respect to all types of Finite State patterns;
it is used as a lexical parser to produce the input of a syntactic parser,
but can also be viewed as an information retrieval system.
- Lexical Navigation: A technology which shows how extracted information
can be browsed and navigated to enhance information discovery and search.