[cosi10] Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. Benchmarking cloud serving systems with YCSB. In Proc. ACM Symp. on Cloud Computing, June 2010. [ bib | .pdf ]
Benchmarks defines two so-called tiers: performance and scale-up. The former considers latency and throughput as offered load increases with a fixed amount of resources. The latter looks at traditional scale-up (does performance stay flat as more data, offered load and resources are added) and elastic speedup (does performance improve if more resources are added under constant load). Benchmark is designed to be extensible, but core workload consists of randomized inserts, updates, reads and sequential scans of keyed records. Benchmark is implemented as a multi-threaded Java program with an interface layer used to customize interactions with specific data managers. Not clear whether this is a closed-loop or open-loop client.
[daag10] Sudipto Das, Divyakant Agrawal, and Amr El Abbadi. G-Store: A scalable data store for transactional multi key access in the cloud. In Proc. ACM Symp. on Cloud Computing, June 2010. [ bib | .pdf ]
Argues that many web applications need atomic multi-key access. Allows definition of transient, arbitrary key-groups, across which atomic operations are possible. Key groups are implemented by transferring ownership of all keys in a group to a single leader node in the underlying storage system, so that it can coordinate atomic operations without the need for a distributed coordination protocol. Leader uses write-ahead logging to support failure recovery at the leader node. However, it seems that while the leader is down, the group is unavailable.
[kili10] Emre Kiciman, Benjamin Livshits, Madanlal Musuvathi, and Kevin C. Webb. Fluxo: A system for internet service programming by non-expert developers. In Proc. ACM Symp. on Cloud Computing, June 2010. [ bib | .pdf ]
Restricted application programming model supporting common architectural patterns for web services. Dataflow programming model with nodes representing computation and edges representing data flow.
[alco10] Peter Alvaro, Tyson Condie, Neil Conway, Khaled Elmeleegy, Joseph M . Hellerstein, and Russell Sears. BOOM analytics: Exploring data-centric, declarative programming for the cloud. In Proc. EuroSys Conf., April 2010. [ bib | .pdf | .pdf ]
[voch10] Hoang Tam Vo, Chun Chen, and Beng Chin Ooi. Towards elastic transactional cloud storage with range query support. In Proc. Int'l Conf. on Very Large Data Bases, 2010. [ bib | .pdf | .pdf ]
[wuji10] Sai Wu, Dawei Jiang, Ben Chin Ooi, and Kun-Lung Wu. Efficient b+-tree based indexing for cloud data processing. In Proc. Int'l Conf. on Very Large Data Bases, 2010. [ bib | .pdf | .pdf ]
[tiiy09] Omesh Tickoo, Ravi Iyer, Ramesh Illikkal, and Don Newell. Modeling virtual machine performance: Challenges and approaches. In Proc. Workshop on Hot Topics in Measurement and Modeling of Computer Systems, June 2009. [ bib | .pdf | .pdf ]
[kroe09] Kirk L. Kroeker. The evolution of virtualization. Communications of the ACM, 52(3):18-20, March 2009. [ bib ]
Tech-lite article talking about virtualization on hand-held devices, about virtualization for software deployment, and about performance and management.
[arfo09] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy H. Katz, Andrew Konwinski, Gunho Lee, David A. Patterson, Ariel Rabkin, Ion Stoica, and Matei Zaharia. Above the clouds: A Berkeley view of cloud computing. Technical Report UCB/EECS-2009-28, University of California at Berkeley, February 2009. [ bib | .pdf | .pdf ]
[agsi09] Parag Agrawal, Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava, and Raghu Ramakrishnan. Asynchronous view maintenance for vlsd databases. In Proc. ACM SIGMOD Int'l Conference on Management of Data (SIGMOD'09), pages 179-192, 2009. [ bib | DOI | .pdf ]
[krhe09] Tim Kraska, Martin Hentschel, Gustavo Alonso, and Donald Kossmann. Consistency rationing in the cloud: Pay only when it matters. Proc. of the VLDB Endowment, 2(1):253-264, 2009. [ bib | .pdf | .pdf ]
Proposes that data be assigned to one of three consistency levels: A, B, or C. Data assigned to level C have only session consistency and eventual consistency of updates. Data assigned to level A have serializable consistency. Data in the B category have adaptive consistency, switching between session consistency and serializability at runtime.
[lawh09] Horacio Andrés Lagar-Cavilla, Joseph Andrew Whitney, Adin Matthew Scannell, Philip Patchin, Stephen M. Rumble, Eyal de Lara, Michael Brudno, and Mahadev Satyanarayanan. Snowflock: Rapid virtual machine cloning for cloud computing. In Proc. ACM European Conference on Computer Systems (EuroSys'09), pages 1-12, 2009. [ bib | DOI | .pdf ]
Snowflock implements an fork (clone) operation for running VMs. The is no implicit synchronization or communication between parent and clone after the fork - anything required must be coded explicitly. Cloned children live on a virtual network with the parent, and can only communicate within this network. SnowFlock starts clones with little initial state, and additional state is shipped on demand from the parent, which uses copy-on-write to preserve a snapshot of its state as of the time of cloning. Each clone gets a virtual disk which is a snapshot of the parent's as of the time of cloning. This is implemented with using copy-on-write at the parent, which serves pages to the clones (via blocktap) as necessary. This mechanism is intended for the root device, not for I/O intensive data devices.
[hude08] Wenjin Hu, Todd Deshane, and Jeanna Matthews. Solaris virtualization options. :login, 33(5):7-17, October 2008. [ bib ]
Mostly a how-to guide for system admistrators, covering Containers, Solaris xVM and Solaris xVM VirtualBox.
[chje08] Ronnie Chaiken, Bob Jenkins, Paul Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren Zhou. Scope: Easy and efficient parallel processing of massive data sets. In Proc. Int'l Conference on Very Large Data Bases (VLDB'08), 2008. [ bib | .pdf ]
[cora08] Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, and Ramana Yerneni. PNUTS: Yahoo!'s hosted data serving platform. Proc. of the VLDB Endowment, 1(2):1277-1288, 2008. [ bib | DOI | .pdf ]
[cule08] Brendan Cully, Geoffrey Lefebvre, Dutch T. Meyer, Mike Feeley, Norman C. Hutchinson, and Andrew Warfield. Remus: High availability via asynchronous virtual machine replication. In Proc. USENIX Symposium on Networked Systems Design and Implementation (NSDI), page 161, 2008. [ bib | .pdf | .pdf ]
[degh08] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107-113, 2008. [ bib | DOI ]
[minh08] Umar Farooq Minhas. A performance evaluation of database systems on virtual machines. Technical Report CS-2008-01, David R. Cheriton School of Computer Science, University of Waterloo, January 2008. Masters thesis. [ bib | .pdf | .pdf ]
[olre08] Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. Pig latin: A not-so-foreign language for data processing. In Proc. ACM SIGMOD Int'l Conference on Management of Data, pages 1099-1110, 2008. [ bib | .pdf ]
[sico08] Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava, Erik Vee, Ramana Yerneni, and Raghu Ramakrishnan. Efficient bulk insertion into a distributed ordered table. In Proc. ACM Int'l Conference on Management of Data (SIGMOD'08), pages 765-778, 2008. [ bib | http | .pdf ]
[shde07] Piyush Shivam, Azbayar Demberel, Pradeep Gunda, David E. Irwin, Laura E. Grit, Aydan R. Yumerefendi, Shivnath Babu, and Jeffrey S. Chase. Automated and on-demand provisioning of virtual machines for database applications. In Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD'07), pages 1079-1081, June 2007. [ bib | DOI | .pdf ]
demo paper
[sopo07] Stephen Soltesz, Herbert Potzl, Marc Fiuczynski, Andy Bavier, and Larry Peterson. Container-based operating system virtualization: A scalable high-performance alternative to hypervisors. In Proc. EuroSys 2007, pages 275-288, March 2007. [ bib | .pdf ]
[deha07] Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, and Avinash Lakshman. Dynamo: Amazon's highly available key-value store. In Proc. ACM Symposium on Operating Systems Principles (SOSP'07), pages 205-220, 2007. [ bib | DOI | .pdf ]
[isbu07] Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In Proc. EuroSys Conference, pages 59-72, 2007. [ bib | .pdf ]
[pazh07] Pradeep Padala, Xiaoyun Zhu, Zhikui Wang, Sharad Singhal, and Kang G. Shin. Performance evaluation of virtualization technologies for server consolidation. Technical Report HPL-2007-59, HP Laboratories Palo Alto, 2007. [ bib | .pdf | .pdf ]
Compares Xen, OpenVZ, and base Linux configurations. Looks at two-tier (Apache+PHP and MySQL) system under a RUBiS workload. Considers a variety of configurations: both tiers on a single physical node, each tier on a different node, and multiple application stacks with the web tiers on one node and the database tiers on another node. Found higher CPU overhead in the Xen configuration, relative to OpenVZ and base Linux. Found that Xen DomU had much higher L2 cache miss count than the base Linux system, but is it not clear how much of this is from the kernel in DomU and how much is from the application.
[chde06] Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. Bigtable: a distributed storage system for structured data. In Proc. USENIX Symposium on Operating System Design and Implementation (OSDI'06), 2006. [ bib | .pdf ]
Key space is partitioned into ranges called tablets. BigTable uses multiple independent tablet servers to serve tablets. Tablets are assigned to servers by a BigTable master node. One server at a time per tablet. Tablet server uses a commit log in GFS to commit updates. Recent updates are kept in memory in a memtable. When the memtable fills, it is written to GFS as an immutable SSTable file.
[guch06] D. Gupta, L. Cherkasova, R. Gardner, and A. Vahdat. Enforcing performance isolation across virtual machines in xen. In Proc. of the ACM/IFIP/USENIX 7th International Middleware Conference, 2006. [ bib | .pdf | .pdf ]
[irch06] David E. Irwin, Jeffrey S. Chase, Laura E. Grit, Aydan R. Yumerefendi, David Becker, and Ken Yocum. Sharing networked resources with brokered leases. In Proc. USENIX Technical Conference, pages 199-212, 2006. [ bib | .pdf | .pdf ]
Resource providers make resources available to brokers, which in turn use them to satisy requests from clients. Clients get lease tickets from brokers, which understand which resources are available from which providers, and which implement polcies controlling which clients get which resources. Clients can redeem tickets with resource providers to obtain the lease, which gives the client access to resources for a fixed time window. Shirako is a toolkit to facilitate the constrution of clients, brokers, and resource providers.
[khbe06] G. Khanna, K. Beaty, G. Kar, and A. Kochut. Application performance management in virtualized server environments. In Proc. IEEE/IFIP Network Operations and Management Symposium, pages 373-381, 2006. [ bib | .pdf ]
[rair06] Lavanya Ramakrishnan, David E. Irwin, Laura E. Grit, Aydan R. Yumerefendi, Adriana Iamnitchi, and Jeffrey S. Chase:. Toward a doctrine of containment: Grid hosting with adaptive resource control. In Proc. ACM/IEEE Conference on High Performance Networking and Computing (SC2006), 2006. [ bib | DOI | .pdf ]
[clfr05] Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, and Andrew Warfield. Live migration of virtual machines. In Proc. Symposium on Networked Systems Design and Implementation (NSDI 2005), May 2005. [ bib | .pdf | .pdf ]
[fotu05] Ian Foster and Steven Tuecke. Describing the elephant: The different faces of IT as service. Queue, 3(6):26-29, 2005. [ bib | .pdf ]
[pido05] Rob Pike, Sean Dorward, Robert Griesemer, and Sean Quinlan. Interpreting the data: Parallel analysis with sawzall. Scientific Programming, 13(4):277-298, 2005. [ bib | .pdf ]
[rose05] Mendel Rosenblum. The reincarnation of virtual machines. Queue, 2(5):34-40, 2005. [ bib | .pdf ]
[roga05] Mendel Rosenblum and Tal Garfinkel. Virtual machine monitors: Current technology and future trends. IEEE Computer, 38(5):39-47, 2005. [ bib | .pdf ]
[smna05] James E. Smith and Ravi Nair. The architecture of virtual machines. IEEE Computer, 38(5):32-38, 2005. [ bib | .pdf ]
[waha05] Andrew Warfield, Steven Hand, Keir Fraser, and Tim Deegan. Facilitating the development of soft devices. In Proc. USENIX Annual Technical Conference, pages 379-382, 2005. [ bib | .pdf | .pdf ]
[wimo04] John Wilkes, Jeffrey Mogul, and Jaap Suermondt. Utilification. In Proceedings of the 11th ACM SIGOPS European Workshop, September 2004. [ bib | .pdf | .pdf ]
Discusses the process of preparing software applications and application stacks for execution in a utility computing environment.
[dahe04] Shaul Dar, Gil Hecht, and Eden Shochat. dbswitch: Towards a database utility. In Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD'04), pages 892-896, 2004. [ bib | .pdf ]
[degh04] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters. In Proc. Symposium on Operating Systems Design and Implementation (OSDI'04), pages 137-150, 2004. [ bib | .pdf ]
Proposes a programming model for highly parallelizable computations, and describes a system that implements this model. The computation input is a set of input key/value pairs, and the output is a set of output key/value pairs. The computation itself is defined by two functions. A Map function takes an input key value pair and produces a set of intermediate key/value pairs. A Reduce function takes an intermediate key and a set of values, and produces a single value.
[hupe04] Lan Huang, Gang Peng, and Tzi cker Chiueh. Multi-dimensional storage virtualization. In Proc. Joint International Conference on Measurement and Modeling of Computer Systems, pages 14-24, 2004. [ bib | .pdf ]
[krga04] Ivan Krsul, Arijit Ganguly, Jian Zhang, José A. B. Fortes, and Renato J. O. Figueiredo. VMPlants: Providing and managing virtual machine execution environments for grid computing. In Proc. ACM/IEEE Conference on High Performance Networking and Computing (SC2004), 2004. [ bib | DOI | .pdf ]
[chgo03] A. Chandra, P. Goyal, and P. Shenoy. Quantifying the benefits of resource multiplexing in on-demand data centers. In Proc. First Workshop on Algorithms and Architectures for Self-Managing Systems, June 2003. [ bib | .pdf ]
[badr03] Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. Xen and the art of virtualization. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (SOSP'03), pages 164-177. ACM Press, 2003. [ bib | .pdf ]
Very nice paper describing the hardware virtualization approach used by Xen and changes it necessitates in the OS. Also includes some empirical performance evaluation.
[maei03] Susan Malaika, Andrew Eisenberg, and Jim Melton. Standards for databases on the grid. SIGMOD Record, 32(3), 2003. [ bib | .pdf ]
An overview of some data-related parts of the grid standardization process, including OGSA, DAIS (Data Access and Integration) for standarizing access to relational and XML data sources, OREP (OGSA Replication Services), and DFDL (Data Format and Description Language).
[anar02] Artur Andrzejak, Martin Arlitt, and Jerry Rolia. Bounding the resource savings of utility computing models. Technical Report HPL-2002-339, HP Laboratories, 2002. [ bib | .pdf | .pdf ]
[foke02] I. Foster, C. Kesselman, J. Nick, and S. Tuecke. Grid services for distributed system integration. Computer, 35(6), 2002. [ bib | .pdf | .pdf ]
Extended version can be found at http://www.globus.org/research/papers/ogsa.pdf. This is an overview of the Open Grid Services Architecture (OGSA), which is defines something very much like a distributed object system.
[rozh02] Jerry Rolia, Xiaoyun Zhu, Martin Arlitt, and Artur Andrzejak. Statistical service assurances for applications in utility grid environments. In IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS'02), pages 247-256, 2002. [ bib | .pdf ]
[sach02] Constantine P. Sapuntzakis, Ramesh Chandra, Ben Pfaff, Jim Chow, Monica S. Lam, and Mendel Rosenblum. Optimizing the migration of virtual computers. In Proc. Symposium on Operating System Design and Implementation (OSDI'02), 2002. [ bib | .pdf ]
[chfo01] A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, and S. Tuecke. The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets. Journal of Network and Computer Applications, 23:187-200, 2001. [ bib | .pdf | .pdf ]
Defines the core services of a data grid as a file-oriented storage service plus a distributed directory for meta-data. Also has some discussion of higher level services, like replication.