| 08:30am - 10:00am | Keynote | Speaker: Alon Halevy (Google) Title: Table Search Abstract: The Web contains hundreds of millions of high-quality 
		structured data sets, in HTML tables, HTML lists and databases stored behind 
		forms. Presenting these data sets in response to user queries is important 
		for the quality of Web search. In addition, there are a growing number of 
		contexts in which users would like to search specifically within a collection 
		of structured data. Such search is hard because the typical signals used 
		for text search do not apply as effectively to structured data. I will describe 
		several scenarios in which table search is needed and contrast the treatments 
		we’ve developed and continue to pursue at Google. In particular, I’ll touch 
		on our efforts to surface content from the Deep Web, our work on collecting 
		and answering queries over a collection of 150 million HTML tables, and 
		our work on supporting the creation and sharing of tables with Google Fusion 
		Tables, a new service for collaborating on structured data on the Web
		 Speaker's Bio: Dr. Alon Halevy received his Bachelors 
		degree in Computer Science and Mathematics from the Hebrew University in 
		Jerusalem in 1988, and his Ph.D in Computer Science from Stanford University 
		in 1993. From 1993 to 1997, Dr. Halevy was a principal member of technical 
		staff at AT&T Bell Laboratories, and then at AT&T Laboratories. He joined 
		the faculty of the Computer Science and Engineering Department at the University 
		of Washington in 1998. In 2004, Dr. Halevy founded Transformic Inc., a company 
		that creates search engines for the deep web, content residing in 
		databases behind web forms. Currently, he is at Google Inc. in Mountain 
		View, California. Dr. Halevy was a Sloan Fellow (1999-2000), and received 
		the Presidential Early Career Award for Scientists and Engineers (PECASE) 
		in 2000. He serves on the editorial boards of the VLDB Journal, the Journal 
		of Artificial Intelligence Research (currently, a member of the advisory 
		committee), and ACM Transactions on Internet Technology. He served as the 
		program chair for the ACM SIGMOD 2003 Conference, and has given several 
		keynotes at top conferences.
 | 
	
		| 10:30am - 12:00pm | Session 1 | Subspace Similarity Search Using the Ideas of Ranking and Top-k 
		Retrieval Thomas Bernecker , Tobias Emrich , Franz Graf , Hans-Peter Kriegel 
		, Peer Kröger, Matthias Renz , Erich Schubert , Arthur Zimek (Ludwig-Maximilians-Universität 
		München)
 
 Abstract: There are abundant scenarios for applications of similarity 
		search in databases where the similarity of objects is defined for a subset 
		of attributes, i.e., in a subspace, only. While much research has been done 
		in efficient support of single column similarity queries or of similarity 
		queries in the full space, scarcely any support of similarity search in 
		subspaces has been provided so far. The three existing approaches are variations 
		of the sequential scan. Here, we propose the first index-based solution 
		to subspace similarity search in arbitrary subspaces which is based on the 
		concepts of nearest neighbor ranking and top-k retrieval.
 
 
 Efficient k-Nearest Neighbor Queries with the Signature Quadratic 
		Form DistanceChristian Beecks, Merih Uysal, Thomas Seidl (RWTH Aachen University)
 
 Abstract: A frequently encountered query type in multimedia 
		databases is the k-nearest neighbor query which finds the k-nearest neighbors 
		of a given query. To speed up such queries and to meet the user requirements 
		in low response time, approximation techniques play an important role. In 
		this paper, we present an efficient approximation technique applicable to 
		distance measures defined over flexible feature representations, i.e. feature 
		signatures. We apply our approximation technique to the recently proposed 
		Signature Quadratic Form Distance applicable to feature signatures. We performed 
		our experiments on numerous image databases, gathering k-nearest neighbor 
		query rankings in significantly low computation time with an average speed-up 
		factor of 13.
 
 Top-k pipe joinDavide Martinenghi, Marco Tagliasacchi (Politecnico di Milano)
 
 Abstract: In the context of service composition and orchestration, 
		service invocation is typically scheduled according to execution plans, 
		whose topology establishes whether different services are to be invoked 
		in parallel or in a sequence. In the latter case, we may have a configuration, 
		called pipe join, in which the output of a service is used as input for 
		another service. When the services involved in a pipe join output results 
		sorted by score, the problem arises of efficiently determining the join 
		tuples (aka combinations) with the highest combined scores as computed by 
		some aggregation function. In this paper, we study the top-k pipe join problem, 
		i.e., the problem of joining the results of two services in a pipe join 
		with the goal of finding the top-k combinations while minimizing the access 
		cost.
 | 
	
		| 1:30pm - 2:30pm | Session 2 | On Novelty in Publish/Subscribe DeliveryDimitris Souravlias, Marina Drosou, Kostas Stefanidis, Evaggelia Pitoura 
		(University of Ioannina)
 
 Abstract: In publish/subscribe systems, users express their 
		interests in specific items of information and get notified when relevant 
		data items are produced. Such systems allow users to stay informed without 
		the need of going through huge amounts of data. However, as the volume of 
		data being created increases, some form of ranking of matched events is 
		needed to avoid overwhelming the users. In this work-in-progress paper, 
		we explore novelty as a ranking criterion. An event is considered novel, 
		if it matches a subscription that has rarely been matched in the past.
 
 Ranking for Data RepairsMohamed Yakout, Ahmed Elmagarmid, Jennifer Neville (Purdue University)
 
 Abstract: Improving data quality is a time-consuming, labor-intensive 
		and often domain specific operation. A recent principled approach for repairing 
		dirty database is to use data quality rules in the form of database constraints 
		to identify dirty tuples and then use the rules to derive data repairs. 
		Most of existing data repair approaches focus on providing fully automated 
		solutions, which could be risky to depend upon especially for critical data. 
		To guarantee the optimal quality repairs applied to the database, users 
		should be involved to confirm each repair. This highlights the need for 
		an interactive approach that combines the best of both; automatically generating 
		repairs, while efficiently employing user's efforts to verify the repairs. 
		In such approach, the user will guide an online repairing process to incrementally 
		generate repairs. A key challenge in this approach is the response time 
		within the user's interactive sessions, because the process of generating 
		the repairs is time consuming due to the large search space of possible 
		repairs. To this end, we present in this paper a mechanism to continuously 
		generate repairs only to the current top k important violated data quality 
		rules. Moreover, the repairs are grouped and ranked such that the most beneficial 
		in terms of improving data quality comes first to consult the user for verification 
		and feedback. Our experiments on real-world dataset demonstrate the effectiveness 
		of our ranking mechanism to provide a fast response time for the user while 
		improving the data quality as quickly as possible.
 |