Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 34 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
34
Dung lượng
7,9 MB
Nội dung
Berkeley RAD Lab: Research in Internet-scale Computing Systems Randy H. Katz randy@cs.berkeley.edu 28 March 2007 2 Five Year Mission • Observation: Internet systems complex, fragile, manually managed, evolving rapidly – To scale Ebay, must build Ebay-sized company – To scale YouTube, get acquired by a Google-sized company • Mission: Enable a single person to create, evolve, and operate the next-generation IT service – “The Fortune 1 Million” by enabling rapid innovation • Approach: Create core technology spanning systems, networking, and machine learning • Focus: Making datacenter easier to manage to enable one person to Analyze, Deploy, Operate a scalable IT service 3 Jan 07 Announcements by Microsoft and Google • Microsoft and Google race to build next-gen DCs – Microsoft announces a $550 million DC in TX – Google confirm plans for a $600 million site in NC – Google two more DCs in SC; may cost another $950 million about 150,000 computers each • Internet DCs are the next computing platform • Power availability drives deployment decisions 4 Datacenter is the Computer • Google program == Web search, Gmail,… • Google computer == Warehouse-sized facilities and workloads likely more common Luiz Barroso’s talk at RAD Lab 12/11/06 Sun Project Blackbox 10/17/06 Compose datacenter from 20 ft. containers! – Power/cooling for 200 KW – External taps for electricity, network, cold water – 250 Servers, 7 TB DRAM, or 1.5 PB disk in 2006 – 20% energy savings – 1/10th? cost of a building 5 See web2.wsj2.com/ruby_on_rails_11_web_20_on_rocket_fuel.htm See http://www.theserverside.com/news/thread.tss?thread_id=33120 Datacenter Programming System • Ruby on Rails: open source Web framework optimized for programmer happiness and sustainable productivity: – Convention over configuration – Scaffolding: automatic, Web-based, UI to stored data – Program the client: write browser-side code in Ruby, compile to Javascript – “Duck Typing/Mix-Ins” • Proven Expressiveness – Lines of code Java vs. RoR: 3:1 – Lines of configuration Java vs. RoR: 10:1 • More than a fad – Java on Rails, Python on Rails, … 6 Datacenter Synthesis + OS • Synthesis: change DC via written specification – DC Spec Language compiled to logical configuration • OS: allocate, monitor, adjust during operation – Director using machine learning, Drivers send commands Synth OS 7 “System” Statistical Machine Learning • S 2 ML Strengths – Handle SW churn: Train vs. write the logic – Beyond queuing models: Learns how to handle/make policy between steady states – Beyond control theory: Coping with complex cost functions – Discovery: Finding trends, needles in data haystack – Exploit cheap processing advances: fast enough to run online • S 2 ML as an integral component of DC OS 8 Datacenter Monitoring • S 2 ML needs data to analyze • DC components come with sensors already – CPUs (performance counters) – Disks (SMART interface) • Add sensors to software – Log files – D-trace for Solaris, Mac OS • Trace 10K++ nodes within and between DCs – *Trace: App-oriented path recording framework – X-Trace: Cross-layer/-domain including network layer 9 Middleboxes in Today’s DC • Middle boxes inserted on physical path – Policy via plumbing – Weakest link: 1 point of failure, bottleneck – Expensive to upgrade and introduce new functionality • Identity-based Routing Layer: policy not plumbing to route classified packets to appropriate middlebox services High Speed Network load balancer intrusion detector firewall 10 First Milestone: DC Energy Conservation • DCs limited by power – For each dollar spent on servers, add $0.48 (2005)/$0.71 (2010) for power/cooling – $26B spent to power and cool servers in 2005 grows to $45B in 2010 • Attractive application of S 2 ML – Bringing processor resources on/off-line: Dynamic environment, complex cost function, measurement- driven decisions • Preserve 100% Service Level Agreements • Don’t hurt hardware reliability • Then conserve energy • Conserve energy and improve reliability – MTTF: stress of on/off cycle vs. benefits of off-hours [...]... point for startups 14 Why a New Funding Model? • DARPA has exiting long-term research in experimental computing systems • NSF swamped with proposals, yielding even more conservative decisions • Community emphasis on theoretical vs experimentaloriented systems- building research • Alternative: turn to Industry for funding – Opportunity to shape research agenda 15 New Funding Model • 30 grad students + 5... of systems students to industry (and academia) • National Academy study mentions Berkeley in 7 of 19 $1B+ industries from IT research, Stanford 4 times • Timesharing (SDS 940), Client-Server Computing (BSD Unix), Graphics, Entertainment, Internet, LANs, Workstations (SUN), GUI, VLSI Design (Spice), RISC (ARM, MIPS, SPARC), Relational DB (Ingres/Postgres), Parallel DB, Data Mining, Parallel Computing, ... Berkeley & Stanford Swing for Fences (R, not r or D) Even if hit a single, train next generation of leaders Technology Transfer engine – Success = Train students to go forth & multiply – Publish everything, including source code – Ideal launching point for startups 29 Chance to Partner with a Great University • Chance to Work on the “Next Great Thing” • US News & World Report ranking of CS Systems universities:... (Meta Object Programming) 2 Scaffolding: automatic, Web based, (pedestrian) User Interface to stored data 3 Program the client: v 1.1 write browser-side code in Ruby then compile to Javascript 4 “Duck Typing/Mix-Ins” • • Looks like string, responds like string, it’s a string! Mix -in improvement over multiple inheritance 20 DC Monitoring • Imagine a world where path information always passed along so... modern computing systems • Draw on talented but inexperienced people – Pick from worldwide talent pool for students & faculty – Don’t know what they can’t do • Inexpensive allows focus on speculative ideas – Mostly grad student salaries – Faculty part time • Tech Transfer engine – Success = Train students to go forth and replicate – Promiscuous publication, including source code – Ideal launching point... by System Statistical Machine Learning – Virtual Machines and Network Storage for flexible resource allocation – Power reduction and reliability enhancement by fast powerdown/restart for processing nodes – Pervasive monitoring, tracing, simultation, workload generation for runtime analysis/operation 18 Discussion Points • Jointly designed datacenter testbed – Mini-DC consisting of clusters, middleboxes,... Cross-layer – Include network and middleware services such as IP and LDAP • Cross-domain – Multiple datacenters, composed services, overlays, mash-ups – Control to individual administrative domains • “Network path” sensor – Put individual requests/responses, at different network layers, in the context of an end-to-end request 23 Actuator: Policy-based Routing Layer • Assign ID to incoming packets (hash... to > $1B IT industry from Research Start National Research Council Computer Science & Telecommunications Physical RAD Lab: Radical Collocation • Innovation from spontaneous meetings of people with different areas of expertise • Communication inversely proportional to distance – Almost never if > 100 feet or on different floor • • • • Everyone (including faculty) in open offices Great Meeting Rooms,... networking – Evaluation of existing network elements – Platform for investigating power reduction schemes in network elements • Mutual information exchange – Network storage architecture – System Statistical Machine Learning 19 Ruby on Rails = DC PL • Reasons to love Ruby on Rails 1 Convention over Configuration • Rails framework feature enabled by Ruby language feature (Meta Object Programming) 2... Bodik, Michael Armbrust, Kevin Canini, Armando Fox, Michael Jordan and David Patterson, 2007 27 RAD Lab 2.0 2nd Milestone: Killer Web 2.0 Apps • Demonstrate RAD Lab vision of 1 person creating next great service and scale up • Where get example great apps, given grad students creating the technology? • Use “Undergraduate Computing Clubs” to create exciting apps in RoR using RAD Lab equipment, technology . Berkeley RAD Lab: Research in Internet-scale Computing Systems Randy H. Katz randy@cs.berkeley.edu 28 March 2007 2 Five Year Mission • Observation: Internet systems complex, fragile,. service – “The Fortune 1 Million” by enabling rapid innovation • Approach: Create core technology spanning systems, networking, and machine learning • Focus: Making datacenter easier to manage to enable. framework – X-Trace: Cross-layer/-domain including network layer 9 Middleboxes in Today’s DC • Middle boxes inserted on physical path – Policy via plumbing – Weakest link: 1 point of failure, bottleneck