DBRank 2008

09:00 - 10:00 Keynote 1

"Graph Mining: Laws, Generators and Tools"

Prof. Christos Faloutsos
Carnegie Mellon University

Abstract: How do graphs look like? How do they evolve over time? How can we generate realistic-looking graphs? We review some static and temporal 'laws', and we describe the ``Kronecker'' graph generator, which naturally matches all of the known properties of real graphs. Moreover, we present tools for discovering anomalies and patterns in two types of graphs, static and time-evolving. For the former, we present the 'CenterPiece' subgraphs (CePS), which expects q query nodes (eg., suspicious people) and finds the node that is best connected to all q of them (eg., the master mind of a criminal group). We also show how to compute CenterPiece subgraphs efficiently. For the time evolving graphs, we present tensor-based methods, and apply them on real data, like the DBLP author-paper dataset, where they are able to find natural research communities, and track their evolution. Finally, we also briefly mention some results on influence and virus propagation on real graphs.

Speaker's Bio: Christos Faloutsos is a Professor at Carnegie Mellon University. He has received the Presidential Young Investigator Award by the National Science Foundation (1989), the Research Contributions Award in ICDM 2006, nine ``best paper'' awards, and several teaching awards. He has served as a member of the executive committee of SIGKDD; he has published over 160 refereed articles, 11 book chapters and one monograph. He holds five patents and he has given over 20 tutorials and over 10 invited distinguished lectures. His research interests include data mining for streams and networks, fractals, indexing for multimedia and bio-informatics data, and database performance.

10:00 - 10:30 Coffee break  
10:30 - 12:30 Session 1

(30 Minutes)
A General and Efficient Algorithm for “top” Queries
Goetz Graefe (Hewlett-Packard Laboratories)

(30 Minutes)
On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases
Xi Zhang and Jan Chomicki (University at Buffalo)

(30 Minutes)
A Rank-Rewrite Framework for Summarizing XML Documents
Maya Ramanath and Kondreddi Sarath Kumar (Max-Planck Institute for Informatics)

(15 Minutes)
OutRank: Ranking Outliers in High Dimensional Data
Emmanuel Muller, Ira Assent, Uwe Steinhausen and Thomas Seidl (RWTH Aachen University, Germany)

(15 Minutes)
Automated Generation of Object Summaries from Relational Databases: A Novel Keyword Searching Paradigm
Georgios Fakas (Manchester Metropolitan University, UK)

12: 30 -14:00 Break for lunch
14:00 - 15:00 Keynote 2

"Information Extraction Over Text Databases: What's Ranking Got To Do With It?"

Prof. Luis Gravano
Columbia University

Abstract: Information extraction systems identify and extract intrinsically structured data that is embedded in natural-language text documents, hence enabling expressive SQL-style querying of this data. Unfortunately, information extraction is a time-consuming process, often involving complex text analysis, so exhaustively processing all documents in a large text database --or on the Web-- could be prohibitively expensive. As an alternative to "complete" query results, variants of the top-k query model are then appropriate, for efficiency reasons. Beyond efficiency, query result quality is also important: information extraction is error-prone and not all extracted data is equally likely to be correct, so result quality is an important consideration during top-k query processing. In this talk, I will discuss recent work on cost-based optimization of top-k query variants in this information extraction scenario, where modeling query result quality --in addition to execution efficiency-- is a distinctive and important challenge.

Speaker's Bio: Luis Gravano has been on the faculty of the Computer Science Department, Columbia University, since September 1997, where he has been an associate professor since July 2002. From January through August 2001, Luis was a Senior Research Scientist at Google (on leave from Columbia University). He received his Ph.D. degree in Computer Science from Stanford University in 1997 and a B.S. degree from the Escuela Superior Latinoamericana de Informática (ESLAI), Argentina, in 1991. Luis is an associate editor of the ACM Transactions on Database Systems and a recipient of a CAREER award from the National Science Foundation.

15:00 - 15:30 Coffee break  
15:30 - 17:30 Session 2

(30 Minutes)
Skyline Ranking for Uncertain Data with Maybe Confidence
Hyountaek Yong, Jin-ha Kim and Seung-won Hwang (POSTECH, Korea)

(30 Minutes)
elGiza, A Research-Pyramid Based Search Tool for Vertical Literature Digital Libraries
Sulieman Bani-Ahmad and Gultekin Ozsoyoglu (Case Western Reserve University)

(15 Minutes)
Ranking Multimedia Databases via Relevance Feedback with History and Foresight Support
Marc Wichterich, Christian Beecks and Thomas Seidl (RWTH Aachen University, Germany)

(15 Minutes)
Weighted Boolean Conditions for Ranking
Matthias Beck and Burkhard Freitag (University of Passau, Germany)

(30 Minutes)
Adapting Ranking Functions to User Preference
Keke Chen, Ya Zhang, Zhaohui Zheng, Hongyuan Zha, and Gordon Sun (Yahoo!)