大型网站所使用的工具.ppt

资源描述

《大型网站所使用的工具.ppt》由会员分享，可在线阅读，更多相关《大型网站所使用的工具.ppt（33页珍藏版）》请在三一文库上搜索。

1、大型網站所使用的工具,Perlbal - http:/ 多個網頁伺服器的負載平衡 MogileFS - http:/ 分散式檔案系統有公司認為 MogileFS 比起 Hadoop 適合拿來處理小檔案 memcached - http:/memcached.org/ 共享記憶體? 把資料庫或其他需要經常讀取的部分，用記憶體快取(Cache)方式存放 Moxi - http:/ Memcache 的 PROXY More Resource: http:/ http:/ to scale up web service in the past ?,Source: http:/ http:/ htt

2、p:/ http:/ Intro,王耀聰陳威宇 jazznchc.org.tw wauenchc.org.tw,教育訓練課程,HBase is a distributed column-oriented database built on top of HDFS.,HBase is ,A distributed data store that can scale horizontally to 1,000s of commodity servers and petabytes of indexed storage. Designed to operate on top of the Hado

3、op distributed file system (HDFS) or Kosmos File System (KFS, aka Cloudstore) for scalability, fault tolerance, and high availability. Integrated into the Hadoop map-reduce platform and paradigm.,Benefits,Distributed storage Table-like in data structure multi-dimensional map High scalability High av

4、ailability High performance,Who use HBase,Adobe 內部使用 (Structure data) Kalooga 圖片搜尋引擎 http:/ Meetup 社群聚會網站 http:/ Streamy 成功從 MySQL 移轉到 Hbase http:/ Trend Micro 雲端掃毒架構 http:/ Yahoo! 儲存文件 fingerprint 避免重複 http:/ More - http:/wiki.apache.org/hadoop/Hbase/PoweredBy,Backdrop,Started toward by Chad Walter

5、s and Jim 2006.11 Google releases paper on BigTable 2007.2 Initial HBase prototype created as Hadoop contrib. 2007.10 First useable HBase 2008.1 Hadoop become Apache top-level project and HBase becomes subproject 2008.10 HBase 0.18, 0.19 released,HBase Is Not ,Tables have one primary index, the row

6、key. No join operators. Scans and queries can select a subset of available columns, perhaps by using a wildcard. There are three types of lookups: Fast lookup using row key and optional timestamp. Full table scan Range scan from region start to end.,HBase Is Not (2),Limited atomicity and transaction

7、 support. HBase supports multiple batched mutations of single rows only. Data is unstructured and untyped. No accessed or manipulated via SQL. Programmatic access via Java, REST, or Thrift APIs. Scripting via JRuby.,Why Bigtable?,Performance of RDBMS system is good for transaction processing but for

8、 very large scale analytic processing, the solutions are commercial, expensive, and specialized. Very large scale analytic processing Big queries typically range or table scans. Big databases (100s of TB),Why Bigtable? (2),Map reduce on Bigtable with optionally Cascading on top to support some relat

9、ional algebras may be a cost effective solution. Sharding is not a solution to scale open source RDBMS platforms Application specific Labor intensive (re)partitionaing,Why HBase ?,HBase is a Bigtable clone. It is open source It has a good community and promise for the future It is developed on top o

10、f and has good integration for the Hadoop platform, if you are using Hadoop already. It has a Cascading connector.,HBase benefits than RDBMS,No real indexes Automatic partitioning Scale linearly and automatically with new nodes Commodity hardware Fault tolerance Batch processing,Data Model,Tables ar

11、e sorted by Row Table schema only define its column families . Each family consists of any number of columns Each column consists of any number of versions Columns only exist when inserted, NULLs are free. Columns within a family are sorted and stored together Everything except table names are byte

12、(Row, Family: Column, Timestamp) Value,Row key,Column Family,value,TimeStamp,Members,Master Responsible for monitoring region servers Load balancing for regions Redirect client to correct region servers The current SPOF regionserver slaves Serving requests(Write/Read/Scan) of Client Send HeartBeat t

13、o Master Throughput and Region numbers are scalable by region servers,Regions,表格是由一或多個 region 所構成 Region 是由其 startKey 與 endKey 所指定每個 region 可能會存在於多個不同節點上，而且是由數個HDFS 檔案與區塊所構成，這類 region 是由 Hadoop 負責複製,實際個案討論部落格,邏輯資料模型一篇 Blog entry 由 title, date, author, type, text 欄位所組成。一位User由 username, password等

14、欄位所組成。每一篇的 Blog entry可有許多Comments。每一則comment由 title, author, 與 text 組成。 ERD,部落格 HBase Table Schema,Row key type (以2個字元的縮寫代表)與 timestamp組合而成。因此 rows 會先後依 type 及 timestamp 排序好。方便用 scan () 來存取 Table的資料。 BLOGENTRY 與 COMMENT的”一對多”關係由comment_title, comment_author, comment_text 等column families 內的動態數量的c

15、olumn來表示每個Column的名稱是由每則 comment的 timestamp來表示，因此每個column family的 column 會依時間自動排序好,Architecture,ZooKeeper,HBase depends on ZooKeeper (Chapter 13) and by default it manages a ZooKeeper instance as the authority on cluster state,Operation,The -ROOT- table holds the list of .META. table regions,The .ME

16、TA. table holds the list of all user-space regions.,Installation (1),$ wget http:/ $ sudo tar -zxvf hbase-*.tar.gz -C /opt/ $ sudo ln -sf /opt/hbase-0.20.3 /opt/hbase $ sudo chown -R $USER:$USER /opt/hbase $ sudo mkdir /var/hadoop/ $ sudo chmod 777 /var/hadoop,啟動Hadoop,Setup (1),$ vim /opt/hbase/con

17、f/hbase-env.sh export JAVA_HOME=/usr/lib/jvm/java-6-sun export HADOOP_CONF_DIR=/opt/hadoop/conf export HBASE_HOME=/opt/hbase export HBASE_LOG_DIR=/var/hadoop/hbase-logs export HBASE_PID_DIR=/var/hadoop/hbase-pids export HBASE_MANAGES_ZK=true export HBASE_CLASSPATH=$HBASE_CLASSPATH:/opt/hadoop/conf,$

18、 cd /opt/hbase/conf $ cp /opt/hadoop/conf/core-site.xml ./ $ cp /opt/hadoop/conf/hdfs-site.xml ./ $ cp /opt/hadoop/conf/mapred-site.xml ./,Setup (2), name value ,Startup & Stop,全部啟動/關閉 $ bin/start-hbase.sh $ bin/stop-hbase.sh 個別啟動/關閉 $ bin/hbase-daemon.sh start/stop zookeeper $ bin/hbase-daemon.sh s

19、tart/stop master $ bin/hbase-daemon.sh start/stop regionserver $ bin/hbase-daemon.sh start/stop thrif $ bin/hbase-daemon.sh start/stop rest,Testing (4),$ hbase shell create test, data 0 row(s) in 4.3066 seconds list test 1 row(s) in 0.1485 seconds put test, row1, data:1, value1 0 row(s) in 0.0454 se

20、conds put test, row2, data:2, value2 0 row(s) in 0.0035 seconds put test, row3, data:3, value3 0 row(s) in 0.0090 seconds, scan test ROW COLUMN+CELL row1 column=data:1, timestamp=1240148026198, value=value1 row2 column=data:2, timestamp=1240148040035, value=value2 row3 column=data:3, timestamp=12401

21、48047497, value=value3 3 row(s) in 0.0825 seconds disable test 09/04/19 06:40:13 INFO client.HBaseAdmin: Disabled test 0 row(s) in 6.0426 seconds drop test 09/04/19 06:40:17 INFO client.HBaseAdmin: Deleted test 0 row(s) in 0.0210 seconds list 0 row(s) in 2.0645 seconds,Connecting to HBase,Java clien

22、t get(byte row, byte column, long timestamp, int versions); Non-Java clients Thrift server hosting HBase client instance Sample ruby, c+, & java (via thrift) clients REST server hosts HBase client TableInput/OutputFormat for MapReduce HBase as MR source or sink HBase Shell JRuby IRB with “DSL” to ad

23、d get, scan, and admin ./bin/hbase shell YOUR_SCRIPT,Thrift,a software framework for scalable cross-language services development. By facebook seamlessly between C+, Java, Python, PHP, and Ruby. This will start the server instance, by default on port 9090 The other similar project “rest”,$ hbase-daemon.sh start thrift $ hbase-daemon.sh stop thrift,References,HBase 介紹 http:/www.wretch.cc/blog/trendnop09/21192672 Hadoop: The Definitive Guide Book, by Tom White HBase Architecture 101 http:/

展开阅读全文