CS 755 - Literature
The first few classes will be held as lectures. The list of papers below
does not show 'class papers' for these classes and the lecture material is
usually well-covered in textbooks. The papers listed below are primarily
intended as background material and starting points for advanced readings.
Later classes will be a mix of lectures and student presentations of
'assigned class papers' as listed below.
All papers listed below are directly accessible from computers on the UW
network. To access digital libraries papers outside the UW network, use the
library proxy. For
firefox, the URL Swap
add-on is quite useful.
Legend:
[C] - potential class paper: suitable for student presentation and critical review
[A] - assigned class paper: short review & student presentation
(B) - background paper: suitable for critical review
- (B) Loop-Free Routing Using
Diffusing Computations. J. J. Garcia-Lunes-Aceves. IEEE/ACM
Transactions on Networking, 1(1):130-141. Feb 1993.
- (B) An Algebraic
Theory of Dynamic Network Routing. Joao Luis Sobrinho. IEEE/ACM
Transactions on Networking, 13(5):1160-1173. Oct 2005.
- (B) Declarative
Routing: Extensible Routing with Declarative Queries. Boon Thau Loo,
Joseph M. Hellerstein, Ion Stoica and Raghu Ramakrishnan. ACM SIGCOMM
Computer Communication Review (Proceedings of SIGCOMM 2005),
35(4):289-300. Oct 2005.
- On Compact Routing
for the Internet. Dmitri Krioukov, k c claffy, Kevin Fall and Arthur
Brady. ACM SIGCOMM Computer Communication Review, 37(3):41-52. Jul
2007.
- Resolving
Inter-Domain Policy Disputes. Cheng Tien Ee, Vijay Ramachandran,
Byung-Gon Chun, Kaushik Lakshminarayanan and Scott Shenker. ACM SIGCOMM
Computer Communication Review (Proceedings of SIGCOMM 2007),
37(4):157-168. Oct 2007.
- The
Transmission Control Protocol. Wael Noureddine and Fouad Tobagi.
Technical Report. Jul 2002.
- Host-to-Host
Congestion Control for TCP. Alexander Afanasyev, Neil Tilley, Peter
Reiher, and Leonard Kleinrock. IEEE Communications Surveys &
Tutorials, 12(3):304-342. Aug 2010.
- Stream Control Transmission
Protocol. R. Stewart, Ed. RFC 4960. Sep 2007.
- (B) Designing DCCP:
Congestion Control Without Reliability. Eddie Kohler, Mark Handley, and
Sally Floyd. ACM SIGCOMM Computer Communication Review (Proceedings of
SIGCOMM 2006), 36(4):27-38. Oct 2006.
- (B) Structured
Streams: a New Transport Abstraction. Bryan Ford. ACM SIGCOMM
Computer Communication Review (Proceedings of SIGCOMM 2007),
37(4):361-372. Oct 2007.
- Routing
Algorithms for Content-based Publish/Subscribe Systems. J. Legatheaux
Martins and Sergio Duarte. IEEE Communications Surveys &
Tutorials, 12(1):39-58. Feb 2010.
- Implementing remote
procedure calls. Andrew D. Birrell and Bruce Jay Nelson. ACM
Transactions on Computer Systems, 2(1): 39-59, 1984.- all you ever wanted
to know about implementing remote procedure calls.
- (B) Performance of
Firefly RPC. Michael Schroeder and Michael Burrows. ACM SIGOPS
Operating Systems Review (Proceedings of SOSP '89), 23(5): 83-90, 1989.
- A Survey of Remote
Procedure Calls. B.H. Tay and A.L. Ananda. ACM SIGOPS Operating
Systems Review, 24(3):68-79. Jul 1990. - early overview of remote
procedure calls.
- Network Objects.
Andrew Birrell, Greg Nelson, Susan Owicki, and Edward Wobber. ACM SIGOPS
Operating Systems Review (Proceedings of SOSP '93), 27(5):217-230. Dec
1993 - introduction of object orientation.
- The JBoss
Extensible Server. Marc Fleury and Francisco Reverbel. Middleware
'03 Proceedings of the ACM/IFIP/USENIX 2003 International Conference on
Middleware, LNCS 2003:344-373. Jun 2003. - state of the art web
services system.
- The Rise and Fall of
CORBA. Michi Henning. ACM Queue, 4(5):28-34. Jun 2006 -
interesting historical perspective on CORBA middleware
- Convenience Over
Correctness. Steve Winoski. IEEE Internet Computing, 12(4):
89-92, Jul/Aug 2008. - argues why RPC is problematic in an enterprise
setting.
- Unraveling the Web
services web: an introduction to SOAP, WSDL, and UDDI. Francisco
Curbera, Matthew Duftler, Rania Khalaf, William Nagy, Nirmal Mukhi, and
Sanjiva Weerawarana. IEEE Internet Computing, (6)2: 86-93,
Mar/Apr 2002. - an overview of web services.
- Web services are
not distributed objects. Werner Vogels. IEEE Internet
Computing, 7(6): 59-66, Nov/Dec 2003. - discusses differences
between services and objects.
- The Impact of
Research on the Development of Middleware Technology. Wolfgang
Emmerich, Mikio Aoyama, and Joe Sventek. ACM Transactions on Software
Engineering and Methodology, 17(4). Aug 2008. - comprehensive
overview of middleware history.
- A Method for Obtaining
Digital Signatures and Public-Key Cryptosystems. R. L. Rivest, A.
Shamir, and L. Adleman. Communications of the ACM, 21(2):120-126.
Feb 1978. - classical paper.
- How to Share a
Secret. Adi Shamir. Communications of the ACM, 22(11):612-613
Nov 1979. - classical paper.
- A
Simple Active Attack Against TCP. Laurent Joncheray. Proceedings of
5th USENIX Security Symposium. Jun 1995. - early TCP attack.
- Why Cryptosystems
Fail. Ross Anderson. Communications of the ACM, 37(11):32-40.
Nov 1994. - low-tech problems with crypto-based security.
- XML and Web
Services Security Standards. Nils Agne Nordbotten. IEEE
Communications Surveys & Tutorials, 11(3):4-21. Aug 2009 -
survey paper covering web services.
- Naming
and Binding of Objects. Jerome H. Saltzer. Operating Systems,
LNCS 60, Springer Verlag. 1978.
- On the Naming and Binding
of Network Destinations. Jerome H. Saltzer. IETF RFC 1498.
Aug 1993.
- [C] An Axiomatic
Basis for Communication. Martin Karsten, S. Keshav, Sanjiva Prasad,
and Mirza Beg. ACM SIGCOMM Computer Communication Review (Proceedings of
SIGCOMM 2007), 37(4):217-228. Oct 2007.
- Domain Names - Concepts and
Facilities. Paul Mockapetris. IETF RFC 1034, Nov 1987.
- Domain Names -
Implementation and Specification. Paul Mockapetris. IETF RFC
1035, Nov 1987.
- [C] Chord: A
Scalable Peer-to-Peer Lookup Protocol for Internet Applications. Ion
Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M. Frans
Kaashoek, Frank Dabek, and Hari Balakrishnan. IEEE/ACM Transactions on
Networking, 11(1):17-32. Feb 2003.
- [A] Internet
Indirection Infrastructure. Ion Stoica, Daniel Adkins, Shelley Zhuang,
Scott Shenker, and Sonesh Surana. IEEE/ACM Transactions on
Networking, 12(2):205-218. Apr 2004.
- A Review of
Mobility Support Paradigms for the Internet. Deguang Le, Xiaoming Fu,
and Dieter Hogrefe. IEEE Communications Surveys & Tutorials,
8(1):38-51. Feb 2006.
- Reaching Agreement in
the Presence of Faults . Marshall Pease, Robert Shostak, and Leslie
Lamport. Journal of the ACM, 27(2):228-234. Apr 1980. -
classical paper on fault tolerance
- The Byzantine Generals
Problem. Leslie Lamport. ACM Transactions on Programming Languages
and Systems, 4(3):382-401. Jul 1982. - classical paper on fault
tolerance
- Impossibility of
Distributed Consensus with One Faulty Process. Michael J. Fischer,
Nancy A. Lynch, and Michael S. Paterson. Journal of the ACM,
32(2):374-382. Apr 1985. - classical paper on fault tolerance
- [A]The Part-Time
Parliament. Leslie Lamport. ACM Transactions on Computer
Systems, 16(2):133-169. May 1998. - classical fault tolerance
solution
- Unreliable Failure
Detectors for Reliable Distributed Systems. Tushar Chandra and Sam
Toueg. Journal of the ACM, 43(2):225-267. Mar 1996.
- formal treatment of fault tolerance approaches
- Implementing
Fault-Tolerant Services Using the State Machine Approach: A Tutorial.
Fred B. Schneider. ACM Computing Surveys, 22(4):299-319. Dec 1990.
- systematic approach to building fault-tolerant services.
- Fault Tolerance
for Highly Available Internet Services: Concepts, Approaches, and
Issues. Narjess Ayari and Denis Barbaron, and Pascale Primet. IEEE
Communications Surveys & Tutorials, 10(2):34-46. May 2008. -
survey paper.
- A Comparative
Analysis of Network Dependability, Fault-tolerance, Reliability, Security,
and Survivability. M. Al-Kuwaiti, N. Kyriakopoulos, and S. Hussein.
IEEE Communications Surveys & Tutorials, 11(2):106-124. May 2009.
- survey paper.
- [C] Practical
Byzantine Fault Tolerance and Proactive Recovery. Miguel Castro and
Barbara Liskov. ACM Transactions on Computer Systems, 20(4):398-461.
Nov 2002. - research proposal.
- [A] The
Chubby Lock Service for Loosely-Coupled Distributed Systems. Mike
Burrows. Proceedings of OSDI 2006, pages 335-350. Nov 2006. -
Google's fault-tolerant synchronization service.
- [C] DepSpace: A
Byzantine Fault-Tolerant Coordination Service. Alysson Neves Bessani,
Eduardo Pelison Alchieri, Miguel Correia, and Joni Silva Fraga. ACM
SIGOPS Operating Systems Review (Proceedings of Eurosys '08),
42(4):163-176. May 2008. - research proposal .
- [C] Prophecy:
Using History for High-Throughput Fault Tolerance. Proceedings of
NSDI 2010. Apr 2010. - research proposal.
- [C] Transaction Support
for Log-Based Middleware Server Recovery. Rui Wang, Betty Salzberg, and
David Lomet. Proceedings ICDE 2009, pages 353-356, Mar 2009. -
transactional fault-tolerance in middleware systems.
- [C] Robust
Synchronization of Absolute and Difference Clocks Over Networks. Darryl
Veitch, Julien Ridoux, and Satish Babu Korada. IEEE/ACM Transactions on
Networking, 17(2):417-430. Apr 2009. - physical clock
synchronization is difficult.
- Time, Clocks, and the
Ordering of Events in a Distributed System. Leslie Lamport.
Communications of the ACM, 21(7):558-565. Jul 1978. -
introduction of logical clocks.
- Detecting causal
relationships in distributed computations: In search of the holy grail.
Reinhard Schwarz and Friedemann Mattern. Distributed Computing,
7(3):149-174. Mar 1994. - improved logical (vector) clocks.
- [A] Interval
Tree Clocks. Paulo Sergio Almeida, Carlos Baquero, and Victor Fonte.
Principles of Distributed Systems, LNCS 5401:259-274. 2008. -
recent proposal for efficient logical clocks,
- Understanding the
Limitations of Causally and Totally Ordered Communication. David R.
Cheriton and Dale Skeen. ACM SIGOPS Operating Systems Review
(Proceedings of SOSP '93), 27(5):44-57. Dec 1993. - debate about
strict event ordering.
- Why bother with
CATOCS?. Robbert van Renesse. ACM SIGOPS Operating Systems
Review, 28(1):22-27. Jan 1994. - debate about strict event
ordering.
- Practical Impact of
Group Communication Theory. Andre Schiper. Future Directions in
Distributed Computing, LNCS 2584. 2003. - application of event
ordering.
- [A] Distributed
Snapshots: Determining Global States of Distributed Systems. Kanianthra
Mani Chandy and Leslie Lamport. ACM Transactions on Computer
Systems, 3(1):63-75. Feb 1985. - classic paper on capturing the
global state of distributed system
- Total Order
Broadcast and Multicast Algorithms: Taxonomy and Survey. Xavier Defago,
Andre Schiper, and Peter Urban. ACM Computing Surveys,
36(4):372-421. Dec 2004. - survey / application of event ordering.
- A Unified Theory of
Shared Memory Consistency. Robert C. Steinke and Gary J. Nutt.
Journal of the ACM, 51(5):800-849. Sep 2004. - memory
consistency models
- [C] Disconnected
Operation in the Coda File System. James J. Kistler and M.
Satyanarayanan. ACM Transactions on Computer Systems, 10(1):3-25.
Feb 1992. - traditional distributed and replicated file system
- Brewer's Conjecture
and the Feasibility of Consistent, Available, Partition-Tolerant Web
Services. Seth Gilbert and Nancy Lynch. ACM SIGACT News,
33(2):51-59. Jun 2002. - problem description for replicated
storage.
- [C] BASE: Using
Abstraction to Improve Fault Tolerance. Miguel Castro, Rodrigo
Rodrigues and Barbara Liskov. ACM Transactions on Computer Systems,
21(3):236-269. Aug 2003. - research proposal.
- [C] The Costs and
Limits of Availability for Replicated Services. Haifeng Yu and Amin
Vahdat. ACM Transactions on Computer Systems, 24(1):70-113. Feb
2006. - research proposal.
- [C] PRACTI
Replication. Nalini Belaramani, Mike Dahlin, Lei Gao, Amol Nayate, Arun
Venkataramani, Praveen Yalagandula, and Jiandan Zheng. Proceedings of
NSDI 2006, pages 59-72. May 2006. - research proposal.
- [A] Bigtable: A
Distributed Storage System for Structured Data. Fay Chang, Jeffrey
Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,
Tushar Chandra, Andrew Fikes, and Robert E. Gruber. Proceedings of OSDI
2006, pages 205-218. Nov 2006. - Google's system.
- [C] Dynamo: Amazon's
Highly Available Key-value Store. Giuseppe DeCandia, Deniz Hastorun,
Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin,
Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels. ACM
SIGOPS Operating Systems Review (Proceedings of SOSP '07),
41(6):205-220. Dec 2007. - Amazon's system.
- Cassandra - A
Decentralized Structured Storage SystemAvinash Lakshman and Prashant
Malik. ACM SIGOPS Operating Systems Review, 44(2):35-40. April 2010.
- Facebook's system, now Apache project
- [A] Megastore:
Providing Scalable, Highly Available Storage for Interactive Services.
Jason Baker, Chris Bond, James Corbett, JJ Furman, Andrey Khorlin, James
Larson, Jean-Michel Leon, Yawei Li, Alexander Lloyd, and Vadim Yushprakh.
Proceedings of CIDR 2011, pages 223-234. Jan 2011. - Google's
system.
- [C] MDCC :
Multi-Data Center Consistency. Tim Kraska, Gene Pang, Michael J.
Franklin, Samuel Madden, and Alan Fekete. Proceedings of ACM
EuroSys'13, pages 113-126. Apr 2013.
- Database
Replication: a Tale of Research across Communities. Bettina Kemme and
Gustavo Alonso. Proceedings of the VLDB Endowment, 3(10):5-12. Sep
2010. - the database perspective.
- [C] Don't Settle for
Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS.
Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen.
Proceedings of SOSP '11, pages 401-416. Oct 2011. - next round of
consistency debate
- [A] Consistency-Based Service
Level Agreements for Cloud Storage. Douglas B. Terry, Silicon Valley,
Vijayan Prabhakaran, Ramakrishna Kotla, Mahesh Balakrishnan, Marcos K.
Aguilera, and Hussam Abu-Libdeh. Proceedings of ACM SOSP'13, pages
309-324. Nov 2013. - service level agreements
- Kingfisher:
Cost-aware Elasticity in the Cloud. Upendra Sharma, Prashant Shenoy,
Sambit Sahu and Anees Shaikh. Proceedings of IEEE INFOCOM 2011,
pages 206-210. Apr 2011. - intro to resource provisioning and
elasticity
- [A] Generalized
Resource Allocation for the Cloud. Anshul Rai, Ranjita Bhagwan, and
Saikat Guha. Proceedings of ACM SoCC 2012, article 15. Oct 2012. -
resoure allocation
- [C] Performance
Isolation and Fairness for Multi-Tenant Cloud Storage. David Shue,
Michael J. Freedman, and Anees Shaikh. Proceedings of OSDI'12,
pages 349-362. Oct 2012. - resource sharing mandates isolation and
fairness
- [C] On Fault
Resilience of OpenStack. Xiaoen Ju, Livio Soares, Kang G. Shin, Kyung
Dong Ryu, and Dilma Da Silva. Proceedings of ACM SoCC 2013, article
2. Oct 2013. - experimental investigation of OpenStack
- [C] Small is Better:
Avoiding Latency Traps in Virtualized Data Centers. Yunjing Xu, Michael
Bailey, Brian Noble, and Farnam Jahanian. Proceedings of ACM SoCC
2013, article 7. Oct 2013. - investigates latency challenges
- [A] MapReduce:
Simplified Data Processing on Large Clusters. Jeffrey Dean and Sanjay
Ghemawat. Proceedings of OSDI'04, article 10-10. Oct 2004. -
classical paper introducing MapReduce
- [C] Apache Hadoop
YARN: Yet Another Resource Negotiator Vinod Kumar Vavilapalli, Arun C
Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas
Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, Bikas Saha, Carlo Curino,
Owen O'Malley, Sanjay Radia, Benjamin Reed, and Eric Baldeschwieler.
Proceedings of ACM SoCC 2013, article 5. Oct 2013. - new compute
platform for Hadoop
- [A] Improving Large
Graph Processing on Partitioned Graphs in the Cloud. Rishan Chen, Mao
Yang, Xuetian Weng, Byron Choi, Bingsheng He, and Xiaoming Li.
Proceedings of ACM SoCC 2012, article 3. Oct 2012. - large graph
processing
- [C] Resilient
Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster
Computing. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur
Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and
Ion Stoica. Proceedings of NSDI'12, article 2-2. Apr 2012. -
large-scale in-memory computations