Systems Software for Datacenters; CS 7942, Spring 2012
Organization
Instructor: John Regehr
Meeting time: Weds, 09:40 AM-10:30 AM in WEB 2460
mailing
list
Schedule
Some ideas for topics and papers
-
General overview of a warehouse scale machines
-
Warehouse scale computing [book, available online]
-
Three ages of google [blog post]
-
Evolution timeline from here: http://highscalability.com/are-cloud-based-memory-architectures-next-big-thing
-
Some examples of available data-center architectures: traditional, container
-
Storage stack:
-
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/bigtable-osdi06.pdf
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/36971.pdf
Megastore on top of Bigtable (Google) [bigtable and megastore papers]
-
HBase on top of Haystack (Facebook) [hbase slides? and haystack paper]
- A side story of how Facebook ditched Netapp (facebook slides)
-
RAM Clouds (RAID lab Berkeley) [ram cloud paper]
-
SSD caching (FusionIO talk)
-
Networking stack
-
traditional architectures [datacenter networks book - available online]
-
- A side story: Time for low-latency (RAID lab Berkeley) [short paper]
-
- problems with traditional designs (hamilton)
-
datacenter TCP, and possibly similar TCP modifications from Google
-
a case for an optical data-center [Vahdat, google papers]
- 100GbE and beyond for warehouse scale computing interconnects
-
Application stack:
-
LAMP + virtualization
-
- A side story: Datacenter needs an operating system [NOX paper]
-
- again some review of software stack evolution from here:
http://highscalability.com/are-cloud-based-memory-architectures-next-big-thing
-
App engine (Google), Azure (MS)
http://www.theregister.co.uk/2011/06/07/inside_google_app_engine/
-
Real world examples (I argue these should show up early)
-
Google real-time incremental indexing
-
Facebook
-
http://royal.pingdom.com/2010/06/18/the-software-behind-facebook/
-
http://www.facebook.com/note.php?note_id=454991608919#
-
Netflix (Netflix's Transition to High-Availability Storage Systems)
-
EBay
-
http://highscalability.com/blog/2009/11/17/10-ebay-secrets-for-planet-wide-scaling.html
-
Reddit
-
http://highscalability.com/blog/2010/5/17/7-lessons-learned-while-building-reddit-to-270-million-page.html
-
Stack overflow:
http://highscalability.com/blog/2011/3/3/stack-overflow-architecture-update-now-at-95-million-page-vi.html
-
tripadvisor
http://highscalability.com/blog/2011/6/27/tripadvisor-architecture-40m-visitors-200m-dynamic-page-view.html
...
-
Data processing
-
Map-reduce
-
Dryad/DryadLINQ
-
Dremel maybe?
-
Graph processing (G2 from MS, a similar paper from google (Pregel))
-
Scheduling and load balancing
- ???
-
Modeling and Synthesizing Task Placement Constraints in Google Compute Clusters (Google)
-
Performance monitoring, debugging
-
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/36356.pdf
SLA stuff...
-
Fault tolerance
-
On Designing and Deploying Internet-Scale Services [paper]
- Side note: summaries of these two papers:
- DRAM failures in the wild [paper], disk failure tendencies [paper]
-
Netflix lessons learnt from AWS outage
http://techblog.netflix.com/2011/04/lessons-netflix-learned-from-aws-outage.html
add some hardcore papers
-
Power and energy efficiency
-
Architecture-level power breakdown
-
- something from James Hamilton
-
- Architecture papers
-
Energy proportional networking (Google)
-
Security