Ronaldo Ama:Future of big data analytics.pdf

上传人:来看看 文档编号:3330892 上传时间:2019-08-13 格式:PDF 页数:12 大小:1,008.65KB
返回 下载 相关 举报
Ronaldo Ama:Future of big data analytics.pdf_第1页
第1页 / 共12页
Ronaldo Ama:Future of big data analytics.pdf_第2页
第2页 / 共12页
Ronaldo Ama:Future of big data analytics.pdf_第3页
第3页 / 共12页
Ronaldo Ama:Future of big data analytics.pdf_第4页
第4页 / 共12页
Ronaldo Ama:Future of big data analytics.pdf_第5页
第5页 / 共12页
点击查看更多>>
资源描述

《Ronaldo Ama:Future of big data analytics.pdf》由会员分享,可在线阅读,更多相关《Ronaldo Ama:Future of big data analytics.pdf(12页珍藏版)》请在三一文库上搜索。

1、 2009 VMware Inc. All rights reserved Big Data in the Cloud Ronaldo Am VP, R&D, Data Services VMware, Inc 2 The answer is What was your question? Well, this is a Hadoop show after all 3 Big Data Landscape ETL Real Time Streams (Social, sensors) Structured and Unstructured Data (HDFS, MAPR) Real Time

2、 Database (Shark, Gemfire, hBase, Cassandra) Interactive Analytics (Impala, Greenplum, AsterData, Netezza) Batch Processing (Map-Reduce) Real-Time Processing (s4, storm, spark) Data Visualization (Excel, Tableau) (Informatica, Talend, Spring Integration) Compute Storage Networking Cloud Infrastructu

3、re Machine Learning (Mahout, etc) 4 Hadoop batch analysis Technology Stack HDFS HBase real-time queries NoSQL Cassandra, Mongo, etc Big SQL Impala Compute layer Data layer Other Spark, Shark, Solr, Platfora, Etc, Compute Storage Networking Cloud Infrastructure Host Host Host Host Host Host Some sort

4、 of distributed, resource management OS + Filesystem Host 5 Why Virtualize Hadoop Shrink and expand cluster on demand Independent scaling of Compute and data Strong multi-tenancy Elasticity & Multi-tenancy High availability for entire Hadoop stack One click to setup Proven solution Highly Availabili

5、ty Rapid deployment, cloning Unified life-cycle management Easy to configure/reconfigure Operational Simplicity 6 Common Infrastructure for Big Data Single purpose clusters for various business applications lead to cluster sprawl. Virtualization Platform Simplify Single Hardware Infrastructure Unifi

6、ed operations Optimize Shared Resources = higher utilization Elastic resources = faster on-demand access MPP DB Hadoop HBase Virtualization Platform MPP DB Hadoop HBase Cluster Sprawling Cluster Consolidation 7 Mixing Workloads: Three big types of Isolation are Required Resource Isolation Control th

7、e greedy noisy neighbor Reserve resources to meet needs Version Isolation Allow concurrent OS, App, Distro versions Security Isolation Provide privacy between users/groups Runtime and data privacy required Host Host Host Host Host Host Some sort of distributed, resource management OS + Filesystem Ho

8、st 8 Virtual Storage Architecture Include Local Disk Shared Storage: SAN or NAS Easy to provision Automated cluster rebalancing Leverage vmotion/HA/FT Local Storage: Local Disks Local disk for Hadoop Scalable Bandwidth, lower cost/GB Host Hadoop Other VM Other VM Host Hadoop Hadoop Other VM Host Had

9、oop Hadoop Other VM Host Hadoop Other VM Other VM Host Hadoop Hadoop Other VM Host Hadoop Hadoop Other VM Shared Storage Shared Storage Local Storage 9 Hadoop Runs Well on Virtualization 0 50 100 150 200 250 300 350 400 450 TeraGenTeraSortTeraValidate Elapsed time, seconds (lower is better)Elapsed t

10、ime, seconds (lower is better) Native 1 VM 2 VMs 4 VMs Source: http:/ 10 Project Serengeti Open source project launched in June 2012, meta-updates released on regular schedule (3 Months intervals) Toolkit that leverage virtualization to simplify Hadoop deployment and operations Commercial support vi

11、a Data Director Deploy a Hadoop cluster in 10 Minutes Customize Hadoop cluster Use Your Favorite Hadoop Distribution One stop command center SerengetiSerengeti 11 Hadoop Resources Download and try Serengeti projectserengeti.org Commercial support via Data Director platform/vfabric-data- director/overview.html VMware Hadoop site Hadoop performance on vSphere Performance-vSphere5.pdf Hadoop High Availability solution VMware-HA-solution.pdf 2009 VMware Inc. All rights reserved THANK YOU! Ronaldo Am VP, R&D, Data Services VMware, Inc

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 建筑/环境 > 装饰装潢


经营许可证编号:宁ICP备18001539号-1