Improving Health Care Through Advanced Information Retrieval and Data Mining Techniques

_Principal Investigators:_
Jimmy Huang, Associate Professor
Aijun An, Associate Professor
Xiaohui Yu, Assistant Professor
School of Information Technology and Department of Computer Science, York University

Motivation and Objectives: As more and more medical data (such as patient records and the latest medical articles) become available on-line, the need for advanced information retrieval and knowledge discovery systems increases. Our objective in this research is to develop advanced tools to help doctors and patients search for the most relevant information, discover interesting patterns from medical databases and obtain recommendations for cost-effective treatments. Our long-term goal is to lower the cost and improve the quality of health care. To realize this we will be investigating how virtualization can be leveraged to achieve the above goals.

Research Plan: In this research, we will address these problems by adding advanced information retrieval and data mining components into health care systems. We will work on the following two themes.

1. Discovering Patterns in Medical Databases

We will construct a probabilistic framework for clustering medical event sequences (and patients). A data point in a sequence can be multivariate clinic feature vectors of various dimensionalities. In particular, we will develop a mixture of Markov model-based framework for clustering such heterogeneous clinic data. Model-based clustering places cluster analysis on a principled statistical support. It is based on probability models in which objects are assumed to follow a finite mixture of probability distributions such that each component distribution stands for a unique cluster. This form of clustering provides advantages over traditional non-probabilistic approaches. First, the model-based approach provides a general description of each identified cluster, so that patients are naturally profiled when clustering the sequences of clinic feature vectors. Meanwhile, specific information about individual patients can be preserved. Second, statistical models allow clustering uncertainties in component members, which is important especially for those objects that are close to cluster boundaries. In practice, a patient may suffer from different syndromes at the same time. It is thus of critical importance to assign the patient to more than one group, and give appropriate therapies accordingly.

2. Personalized Retrieval and Recommendations

No two people are exactly alike, especially when it comes to medicine. Medical conditions can vary by age, race, gender, and a variety of other factors; however, on-line search engines are notorious for overloading users with irrelevant information. One reason for this phenomenon is that the retrieval decision made by search engines is primarily based on the current query and document collection. The results for a given query are identical, independent of the user or the context in which the user makes the request. It is unlikely, however, that different users are so similar in their interests or conditions that one standardized way of retrieving information fits all needs. We are proposing a novel approach that integrates the search context of a user into medical IR systems. Our method will integrate an extended Probabilistic Latent Semantic Indexing (PLSA) model with the probabilistic retrieval model for improving retrieval effectiveness in the medical domain. In particular, we will develop an extended PLSA model to learn task-oriented contexts from users' search histories recorded in search logs and incorporate these contexts into the Okapi retrieval system.

Potential Benefit to Ontario: This research will benefit Ontario by improving quality of health care, reducing inefficiency and cost, and improving the health of Ontario people.

_Other Projects_

  • Fine-grained Resource Management and Problem Detection in Dynamic Content Servers
  • Semantically Configurable Modelling Notations and Tools
  • Model Management for Continuously Evolving Systems
  • Modeling, Evolution, and Automated Configuration of Software Services
  • Elaborating and Evaluating UMLís 3-Layer Semantics Architecture
  • Intelligent Autonomic Computing for Computational Biology
  • Performance Management of IT Infrastructure
  • Performance-Model-Assisted Creation and Management of Service Systems
  • -- JimmyHuang - 02 Dec 2007

    Topic revision: r2 - 2007-12-10 - MarinLitoiu
     
    This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
    Ideas, requests, problems regarding TWiki? Send feedback