09:00 - 10:00 | Keynote 1 | "Graph Mining: Laws, Generators and Tools" Abstract: How do graphs look like? How do they evolve over time? How can we generate realistic-looking graphs? We review some static and temporal 'laws', and we describe the ``Kronecker'' graph generator, which naturally matches all of the known properties of real graphs. Moreover, we present tools for discovering anomalies and patterns in two types of graphs, static and time-evolving. For the former, we present the 'CenterPiece' subgraphs (CePS), which expects q query nodes (eg., suspicious people) and finds the node that is best connected to all q of them (eg., the master mind of a criminal group). We also show how to compute CenterPiece subgraphs efficiently. For the time evolving graphs, we present tensor-based methods, and apply them on real data, like the DBLP author-paper dataset, where they are able to find natural research communities, and track their evolution. Finally, we also briefly mention some results on influence and virus propagation on real graphs. Speaker's Bio: Christos Faloutsos is a Professor at Carnegie Mellon University.
He has received the Presidential Young Investigator Award by
the National Science Foundation (1989),
the Research Contributions Award in ICDM 2006,
nine ``best paper'' awards, and several teaching awards.
He has served as a member of the executive committee of SIGKDD;
he has published over 160 refereed articles, 11 book chapters
and one monograph. He holds five patents and
he has given over 20 tutorials and over 10 invited distinguished lectures.
His research interests include data mining
for streams and networks, fractals, indexing for
multimedia and bio-informatics data, and database performance. |
10:00 - 10:30 | Coffee break | |
10:30 - 12:30 | Session 1 | (30 Minutes) (30 Minutes) (30 Minutes) (15 Minutes) (15 Minutes) |
12: 30 -14:00 | Break for lunch | |
14:00 - 15:00 | Keynote 2 | "Information Extraction Over Text Databases: What's Ranking Got To Do With It?" Abstract: Information extraction systems identify and extract intrinsically structured data that is embedded in natural-language text documents, hence enabling expressive SQL-style querying of this data. Unfortunately, information extraction is a time-consuming process, often involving complex text analysis, so exhaustively processing all documents in a large text database --or on the Web-- could be prohibitively expensive. As an alternative to "complete" query results, variants of the top-k query model are then appropriate, for efficiency reasons. Beyond efficiency, query result quality is also important: information extraction is error-prone and not all extracted data is equally likely to be correct, so result quality is an important consideration during top-k query processing. In this talk, I will discuss recent work on cost-based optimization of top-k query variants in this information extraction scenario, where modeling query result quality --in addition to execution efficiency-- is a distinctive and important challenge. Speaker's Bio: Luis Gravano has been on the faculty of the Computer Science Department, Columbia University, since September 1997, where he has been an associate professor since July 2002. From January through August 2001, Luis was a Senior Research Scientist at Google (on leave from Columbia University). He received his Ph.D. degree in Computer Science from Stanford University in 1997 and a B.S. degree from the Escuela Superior Latinoamericana de Informática (ESLAI), Argentina, in 1991. Luis is an associate editor of the ACM Transactions on Database Systems and a recipient of a CAREER award from the National Science Foundation. |
15:00 - 15:30 | Coffee break | |
15:30 - 17:30 | Session 2 | (30 Minutes) (30 Minutes) (15 Minutes) (15 Minutes) (30 Minutes)
|