CS 755
System and Network Architectures and Implementation
(Fall 2013)
Relevant Papers to Read and Discuss
This page is heavily dependent on the list that Martin Karsten developed for the Spring 2011 version of the course.
For off-campus access to digital libraries, please see Connect from Home. Burkay Genc has kindly written a firefox plugin that simplifies the process. On any page, right-click and select "MyLibrary" to be redirected to the library login page (if necessary) and then the proxy-based web page.
Legend
(*) - research paper suitable for critical review
[C] - to be discussed in class
Network Layer
- (*)Loop-Free Routing Using
Diffusing Computations. J. J. Garcia-Lunes-Aceves. IEEE/ACM
Transactions on Networking, 1(1):130-141. Feb 1993.
- (*)Internet
Indirection Infrastructure. Ion Stoica, Daniel Adkins, Shelley Zhuang,
Scott Shenker and Sonesh Surana. IEEE/ACM Transactions on
Networking, 12(2):205-218. Apr 2004.
- [C] Declarative Routing:
Extensible Routing with Declarative Queries. Boon Thau Loo, Joseph M.
Hellerstein, Ion Stoica and Raghu Ramakrishnan. ACM SIGCOMM Computer
Communication Review (Proceedings of SIGCOMM 2005), 35(4):289-300.
Oct 2005.
- [C] On Compact Routing for the Internet . Dmitri Krioukov, k c claffy, Kevin Fall and Arthur Brady. ACM SIGCOMM
Computer Communication Review, 37(3):41-52. Jul 2007.
- (*)Resolving
Inter-Domain Policy Disputes. Cheng Tien Ee, Vijay Ramachandran,
Byung-Gon Chun, Kaushik Lakshminarayanan and Scott Shenker. ACM SIGCOMM
Computer Communication Review (Proceedings of SIGCOMM 2007),
37(4):157-168. Oct 2007.
- A Survey of Remote
Procedure Calls. B.H. Tay and A.L. Ananda. ACM SIGOPS Operating
Systems Review, 24(3):68-79. Jul 1990. - early overview of remote
procedure calls.
- Implementing remote procedure calls. Andrew D. Birrell and Bruce Jay Nelson. ACM Transactions on Computer Systems, 2(1): 39-59, 1984.- all you want to know about implementing remote
procedure calls.
- (*) Network
Objects. Andrew Birrell, Greg Nelson, Susan Owicki, and Edward Wobber.
ACM SIGOPS Operating Systems Review (Proceedings of SOSP '93),
27(5):217-230. Dec 1993 - introduction of object orientation.
- (*) The JBoss
Extensible Server. Marc Fleury and Francisco Reverbel. Middleware
'03 Proceedings of the ACM/IFIP/USENIX 2003 International Conference on
Middleware, LNCS 2003:344-373. Jun 2003. - state of the art web
services system.
- [C]Convenience Over Correctness. Steve Winoski. IEEE Internet Computing, 12(4): 89-92, July/August 2008. - argues why RPC is problematic in an enterprise setting.
- [C]Unraveling the Web services web: an introduction to SOAP, WSDL, and UDDI. Francisco Curbera, Matthew Duftler, Rania Khalaf,William Nagy, Nirmal Mukhi, and Sanjiva Weerawarana. IEEE Internet Computing, (6)2: 86-93, March/April 2002. - an overview of web services.
- Web services are not distributed objects. Werner Vogels. IEEE Internet Computing, 7(6): 59-66, November/December 2003. - discusses differences between services and objects.
- Anatomy of a Web Service. Kamalsinh F Chavda. Journal of Computing Sciences in Colleges, 19(3): 124-134, 2004.
- The Impact of
Research on the Development of Middleware Technology. Wolfgang
Emmerich, Mikio Aoyama, and Joe Sventek. ACM Transactions on Software
Engineering and Methodology, 17(4). Aug 2008. - comprehensive
overview of middleware history.
- SOAP Tutorial.(online tutorial)
- (*) Design, implementation, and performance measurement of a native-mode ATM transport layer (extended version). R. Ahuja, S. Keshav, H. Saran. IEEE/ACM Transactions on Networking, 4(4): 502-515, 1996.
- (*) Performance of Firefly RPC. Michael Schroeder and Michael Burrows. ACM SIGOPS Operating Systems Review (Proceedings of SOSP '89),
23(5): 83-90, 1989.
- Naming
and Binding of Objects. Jerome H. Saltzer. Operating Systems,
LNCS 60, Springer Verlag. 1978.
- On the Naming and Binding
of Network Destinations. Jerome H. Saltzer. IETF RFC 1498.
Aug 1993.
- (*) An Axiomatic Basis
for Communication. Martin Karsten, S. Keshav, Sanjiva Prasad, and Mirza
Beg. ACM SIGCOMM Computer Communication Review (Proceedings of SIGCOMM
2007), 37(4):217-228. Oct 2007.
- Domain Names - Concepts and
Facilities. Paul Mockapetris. IETF RFC 1034, Nov 1987.
- Domain Names -
Implementation and Specification. Paul Mockapetris. IETF RFC
1035, Nov 1987.
- Internet
Indirection Infrastructure. Ion Stoica, Daniel Adkins, Shelley Zhuang,
Scott Shenker, and Sonesh Surana. IEEE/ACM Transactions on
Networking, 12(2):205-218. Apr 2004.
- (*) A Review of
Mobility Support Paradigms for the Internet. Deguang Le, Xiaoming Fu,
and Dieter Hogrefe. IEEE Communications Surveys & Tutorials,
8(1):38-51. Feb 2006.
- (*) Routing
Algorithms for Content-based Publish/Subscribe Systems. J. Legatheaux
Martins and Sergio Duarte. IEEE Communications Surveys &
Tutorials, 12(1):39-58. Feb 2010
- The rise and fall of CORBA.Michi Henning, ACM Queue, June 2006.
- Robust
Synchronization of Software Clocks Across the Internet. Darryl Veitch,
Satish Babu and Attila Pàsztor. IMC'04 - Proceedings of the 4th ACM
SIGCOMM Conference on Internet Measurement. pp. 219-232. Oct 2004.
- physical clock synchronization is difficult.
- (*) Robust
Synchronization of Absolute and Difference Clocks Over Networks. Darryl
Veitch, Julien Ridoux, and Satish Babu Korada. IEEE/ACM Transactions on
Networking, 17(2):417-430. Apr 2009. - physical clock
synchronization is difficult.
- Time, Clocks, and the
Ordering of Events in a Distributed System. Leslie Lamport.
Communications of the ACM, 21(7):558-565. Jul 1978. -
introduction of logical clocks.
- Detecting causal
relationships in distributed computations: In search of the holy grail.
Reinhard Schwarz and Friedemann Mattern. Distributed Computing,
7(3):149-174. Mar 1994. - improved logical clocks.
- [C] Understanding the
Limitations of Causally and Totally Ordered Communication. David R.
Cheriton and Dale Skeen. ACM SIGOPS Operating Systems Review
(Proceedings of SOSP '93), 27(5):44-57. Dec 1993. - debate about
strict event ordering.
- [C] Why bother with
CATOCS?. Robbert van Renesse. ACM SIGOPS Operating Systems
Review, 28(1):22-27. Jan 1994. - debate about strict event
ordering.
- Practical Impact of
Group Communication Theory. Andre Schiper. Future Directions in
Distributed Computing, LNCS 2584. 2003. - application of event
ordering.
- Total Order
Broadcast and Multicast Algorithms: Taxonomy and Survey. Xavier Defago,
Andre Schiper, and Peter Urban. ACM Computing Surveys,
36(4):372-421. Dec 2004. - survey / application of event ordering.
- (*) Plausible clocks:
Constant size logical clocks for Distributed Systems. Francisco J.
Torres-Rojas and Mustaque Ahamad. Distributed Algorithms, LNCS
1151:71-88. 1996. - proposal for efficient logical clocks,
- (*)Interval
Tree Clocks. Paulo Sergio Almeida, Carlos Baquero, and Victor Fonte.
Principles of Distributed Systems, LNCS 5401:259-274. 2008. -
recent proposal for efficient logical clocks,
- (*) The
Chubby Lock Service for Loosely-Coupled Distributed Systems. Mike
Burrows. Proceedings of OSDI 2006, pages 335-350. November 2006. -
Google's synchronization service.
- A Critique of ANSI SQL Isolation Level. Hal Berenson, Philip A. Bernstein, Jim Gray, Jim Melton, Elizabeth J. O'Neil, Patrick E. O'Neil. Proc. ACM SIGMOD International Conference on Management of Data, pages 1-10, 1995. - a good discussion of problems with ANSI SQL-92 specification of isolation levels; also discusses snapshot isolation that is now quite important.
- (*) Serializable isolation for snapshot databases. Michael J. Cahill, Uwe Röhm, Alan D. Fekete. Proc. ACM SIGMOD International Conference on Management of Data, pages 729–738, 2008. - demonstrates how to make snapshot isolation serializable; won the SIGMOD 2008 best paper award.
- [(C] Bigtable: A
Distributed Storage System for Structured Data. Fay Chang, Jeffrey
Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,
Tushar Chandra, Andrew Fikes, and Robert E. Gruber. Proceedings of OSDI
2006, pages 205-218. November 2006. - Google's system.
- [C] Dynamo:
Amazon's Highly Available Key-value Store. Giuseppe DeCandia, Deniz
Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex
Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels.
ACM SIGOPS Operating Systems Review (Proceedings of SOSP '07),
41(6):205-220. Dec 2007. - Amazon's system.
- [*] Megastore:
Providing Scalable, Highly Available Storage for Interactive Services.
Jason Baker, Chris Bond, James Corbett, JJ Furman, Andrey Khorlin, James
Larson, Jean-Michel Leon, Yawei Li, Alexander Lloyd, and Vadim Yushprakh.
Proceedings of CIDR 2011, pages 223-234. Jan 2011. - Google's
system.
- [*] A Unified Theory
of Shared Memory Consistency. Robert C. Steinke and Gary J. Nutt.
Journal of the ACM, 51(5):800-849. Sep 2004. - consistency
models
- [*] Brewer's
Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant
Web Services. Seth Gilbert and Nancy Lynch. ACM SIGACT News,
33(2):51-59. Jun 2002. - problem description for replicated
storage.
- [C] Database
Replication: a Tale of Research across Communities. Bettina Kemme and
Gustavo Alonso. Proceedings of the VLDB Endowment, 3(10):5-12. Sep
2010. - the database perspective.
- (*) The Costs and
Limits of Availability for Replicated Services. Haifeng Yu and Amin
Vahdat. ACM Transactions on Computer Systems, 24(1):70-113. Feb
2006. - research proposal.
- (*) PRACTI
Replication. Nalini Belaramani, Mike Dahlin, Lei Gao, Amol Nayate, Arun
Venkataramani, Praveen Yalagandula, and Jiandan Zheng. Proceedings of
NSDI 2006, pages 59-72. May 2006. - research proposal.
- Database Replication. Bettina Kemme, Ricardo Jimenez-Peris, Marta Patino-Martinez. Synthesis Lectures on Data Management, Morgan & Claypool, 2010.
- [C] Transaction Support for Log-Based Middleware Server Recovery. Rui Wang, Betty Salzberg, and David Lomet. Proc. IEEE 25th International Conference on Data Engineering, pages 353-356, 2009. -
transactional fault-tolerance in middleware systems.
- [*] Reaching Agreement
in the Presence of Faults . Marshall Pease, Robert Shostak, and Leslie
Lamport. Journal of the ACM, 27(2):228-234. Apr 1980. -
classical paper on fault tolerance
- [C] The Byzantine
Generals Problem. Leslie Lamport. ACM Transactions on Programming
Languages and Systems, 4(3):382-401. Jul 1982. - classical paper on
fault tolerance
- [*] Impossibility of
Distributed Consensus with One Faulty Process. Michael J. Fischer,
Nancy A. Lynch, and Michael S. Paterson. Journal of the ACM,
32(2):374-382. Apr 1985. - classical paper on fault tolerance
- Implementing
Fault-Tolerant Services Using the State Machine Approach: A Tutorial.
Fred B. Schneider. ACM Computing Surveys, 22(4):299-319. Dec
1990. - systematic approach to building fault-tolerant services.
- [C] Fault Tolerance
for Highly Available Internet Services: Concepts, Approaches, and
Issues. Narjess Ayari and Denis Barbaron, and Pascale Primet. IEEE
Communications Surveys & Tutorials, 10(2):34-46. May 2008. -
survey paper.
- [*] Definition, Detection, and Recovery of Single-Page Failures, a Fourth Class of Database Failures. Goetz Graefe andHarumi A. Kuno. Proceedings of the VLDB, 5(7): 646-655, 2012. -
database focussed, but a good discussion of failure types.
- A Comparative
Analysis of Network Dependability, Fault-tolerance, Reliability, Security,
and Survivability. M. Al-Kuwaiti, N. Kyriakopoulos, and S. Hussein.
IEEE Communications Surveys & Tutorials, 11(2):106-124. May 2009.
- survey paper.
- (*) Practical
Byzantine Fault Tolerance and Proactive Recovery. Miguel Castro and
Barbara Liskov. ACM Transactions on Computer Systems, 20(4):398-461.
Nov 2002. - research proposal.
- (*) BASE: Using
Abstraction to Improve Fault Tolerance. Miguel Castro, Rodrigo
Rodrigues and Barbara Liskov. ACM Transactions on Computer Systems,
21(3):236-269. Aug 2003. - research proposal.
- (*) DepSpace: A
Byzantine Fault-Tolerant Coordination Service. Alysson Neves Bessani,
Eduardo Pelison Alchieri, Miguel Correia, and Joni Silva Fraga. ACM
SIGOPS Operating Systems Review (Proceedings of Eurosys '08),
42(4):163-176. May 2008. - research proposal.
- (*) Prophecy:
Using History for High-Throughput Fault Tolerance. Proceedings of
NSDI 2010. Apr 2010. - research proposal.
- [*] A Method for
Obtaining Digital Signatures and Public-Key Cryptosystems. R. L.
Rivest, A. Shamir, and L. Adleman. Communications of the ACM,
21(2):120-126. Feb 1978. - classical paper.
- [*] How to Share a
Secret. Adi Shamir. Communications of the ACM, 22(11):612-613
Nov 1979. - classical paper.
- [C] Why Cryptosystems
Fail. Ross Anderson. Communications of the ACM, 37(11):32-40.
Nov 1994. - low-tech problems with crypto-based security.
- [C] XML and Web
Services Security Standards. Nils Agne Nordbotten. IEEE
Communications Surveys & Tutorials, 11(3):4-21. Aug 2009 -
survey paper re web services.
- A Survey of DHT
Security Techniques. Guido Urdaneta, Guillaume Pierre, and Maarten Van
Steen. ACM Computing Surveys, 43(2):8. Jan 2011. - survey paper
re DHT security.
- [C] k-Anonymity: a Model for
Protecting Privacy. Latanya Sweeney. International Journal of
Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557-570. Oct
2002. - research proposal.
- [C] Achieving
k-Anonymity Privacy Protection using Generalization and Suppression.
Latanya Sweeney. International Journal of Uncertainty, Fuzziness and
Knowledge-Based Systems, 10(5):571-588. Oct 2002. - research
proposal.
- Privacy-Preserving
Data Publishing: A Survey of Recent Developments. Benjamin C. M.
Fung, Ke Wang, Rui Chen, and Philip S. Yu. ACM Computing Surveys,
42(4):14. Jun 2010. - recent survey paper.
- (*) Distributed Key
Generation for the Internet. Aniket Kate and Ian Goldberg.
Proceedings of ICDCS 2009, pages 119-128. June 2009. - research
proposal.
- [C] TrInc:
Small Trusted Hardware for Large Distributed Systems. Dave Levin, John
R. Douceur, Jacob R. Lorch, and Thomas Moscibroda. Proceedings of NSDI
2009. Apr 2009. research proposal.
Peer-to-Peer
- Chord: A scalable peer-to-peer lookup service for internet applications. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan. Proc. 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM '01), pages149-160, 2001. - a popular DHT-based P2P system
- [C] Improving Data Access in P2P Systems. Karl Aberer, Manfred Hauswirth, Magdalena Punceva, Roman Schmidt, IEEE Internet Computing, 6(1), January/February 2002. - a tree-based P2P system
- Making gnutella-like P2P systems scalable. Yatin Chawathe, Sylvia Ratnasamy, Lee Breslau, Nick Lanham, and Scott Shenker. 2003. Proc. 2003 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM '03). pages, 407-418, 2003. - improving unstructured P2P systems
- [C] Load balancing in dynamic structured P2P systems. Brighten Godfrey, Karthik Lakshminarayanan, Sonesh Surana, Richard Karp, andIon Stoica. Proc. 23rd Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM 2004). Volume 4, pages 2253 - 2262, 2004. - improving one aspect of structured systems
- (*) PeerDB: a P2P-based system for distributed data sharing. Beng Chin Ooi, Kian-Lee Tan, Aoying Zhou. Proc. 19th International Conference on Data Engineering, pages 633 - 644, 2003. - how to design a P2P database system
- (*) Adaptive Replication in Peer-to-Peer Systems. Vijay Gopalakrishnan, Bujor D. Silaghi, Bobby Bhattacharjee. Proc. 24th International Conference on Distributed Computing Systems, pages 360-369, 2004. - addressing replication in P2P systems
- (*) Replication strategies in unstructured peer-to-peer networks. Edith Cohen and Scott Shenker: Proc. 2002 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM '02). pages 177-190, 2002. - focuses on replication in unstructured systems
- [C] Data currency in replicated DHTs. Reza Akbarinia, Esther Pacitti, and Patrick Valduriez. Proc. ACM Int. Conf. On Management of Data (SIGMOD). Pages 211-222, 2007.- focuses on replication in DHT-based systems
- (*) The Impact of Caching on BitTorrent-Like Peer-to-Peer Systems. Frank Lehrieder, György Dán, Tobias Hoßfeld, Simon Oechsner, and Vlad Singeorzan. Proc. 10th IEEE International Conference on Peer-to-Peer Computing. pages 1-10, 2010. - addresses caching that is similar to replication
Cloud Computing
- [C] Kingfisher: Cost-aware elasticity in the cloud. Upendra Sharma, Prashant Shenoy, Sambit Sahu, andAnees Shaikh. Proc. IEEE INFOCOM. pages 206-210, 2011.
- (*) Improving large graph processing on partitioned graphs in the cloud.Rishan Chen, Mao Yang, Xuetian Weng, Byron Choi, Bingsheng He, Xiaoming Li. Proc. 3rd ACM Symp. on Cloud Computing, 2012.
- (*) Generalized resource allocation for the cloud. Anshul Rai, Ranjita Bhagwan, Saikat Guha. Proc. 3rd ACM Symp. on Cloud Computing, 2012.
- (*) True elasticity in multi-tenant data-intensive compute clusters. Ganesh Ananthanarayanan, Chris Douglas, Raghu Ramakrishnan, Sriram Rao, Ion Stoica. Proc. 3rd ACM Symp. on Cloud Computing, 2012.
- [C] MDCC: multi-data center consistency. Tim Kraska, Gene Pang, Michael J. Franklin, Samuel Madden, Alan Fekete. Proc. 8th ACM SIGOPS/EuroSys European Conf. on Comp. Syst., 2013, pages 113-126.
- (*) Consistency-Based Service Level Agreements for Cloud Storage.
Douglas B. Terry, Vijayan Prabhakaran, Ramakrishna Kotla, Mahesh Balakrishnan, Marcos K. Aguilera, Hussam Abu-Libdeh. Proc. 24th ACM Symp. on Operating System Principles, 2013.
- [C] Performance Isolation and Fairness for Multi-Tenant Cloud Storage.
David Shue, Michael J. Freedman, and Anees Shaikh. Proc. 10th USENIX Symp. on Operating System Design and Implementation, 2012.