1、目录一、产品简介41.1 产品概述4二、功能测试42.1 hadoop测试42.2 导入数据到hbase72.2.1 下载生成数据的包72.2.2 生成数据82.2.3 将数据传到hdfs92.2.4 将hdfs中的数据导入到hbase92.3 SQL语法兼容性测试182.3.1 Hive与hbase集成182.3.2 创建hbase识别的数据库182.3.3 插入数据192.4 Hadoop测试242.4.1 写数据242.4.2 读数据242.4.3 清除测试数据242.5 Hbase pe测试252.5.1 顺序写(百万)252.5.2 顺序读(百万)272.5.3 随机写(百万)292
2、5.4 随机读(百万)312.6 Hive测试332.6.1 hive简单测试332.6.2 hive建表测试332.6.3 查看数据测试342.6.4 分桶区测试412.6.5 索引测试422.6.6 切换计算引擎为tez测试432.7 Hbase简单测试432.8 Phoenix测试452.9 spark测试492.9.1 Spark-shell492.9.2 spark-shell在yarn上运行512.9.3 Spark-sql592.9.4 spark-sql在yarn上运行592.9.5 spark在submit模式下运行602.10 docker测试602.10.1 搜索在线可
3、用镜像名602.10.2 从官网拉取镜像622.11 flume测试632.11.1 创建数据存放的文件夹632.11.2 创建配置文件642.11.3 启动flume642.11.4 查看数据652.12 sqoop测试662.12.1 运行sqoop662.12.2 Mysql导入HDFS662.12.3 关系型数据的表结构复制到hive702.12.4 关系数据库导入文件到hive712.12.5 hdfs数据导入到mysql732.13 Kafka测试832.14 Zeppelin测试832.15 YCSB测试HBase871YCSB介绍872.安装YCSB873.YCSB测试hbas
4、e882.16 HA测试91一、配置NameNode的HA91二、配置ResourceManager HA105三、配置Hbase Master109客户服务110版权所有 北京红象云腾系统技术有限公司。保留所有权利。 北京红象云腾系统技术有限公司版权所有,并保留对本说明书及本声明的最终解释权和修改权。本说明书的版权归北京红象云腾系统技术有限公司所有。未得到北京红象云腾系统技术有限公司的书面许可,任何人不得以任何方式或形式对本说明书内的任何部分进行复制、摘录、备份、修改、传播、翻译成其他语言、或将其全部或部分用于商业用途一、 产品简介1.1 产品概述红象CRH平台是业内唯一全面支持国产和国际芯
5、片,同时支持五种芯片架构,提供FPGA/GPU硬件加速的大数据平台,CRH平台支持一体化架构(YARN),做到了真正的安全可靠,完全做到了以开源Apache Hadoop为基础构建的大数据平台,CRH平台经过无数次的迭代测试,不断完善功能,不仅使性能达到最优,而且还保障企业级使用时的稳定性和可靠性。CRH满足了企业静态数据分析的全部需求,助力企业在建设数据实时分析、数据仓库、机器学习、数据安全等方面加速落地,在商业中加速决策,为企业发掘数据价值提供了可靠的数据分析能力。CHINA REDOOP HYPERLOOP(CRH) 寓意”红象数据高铁”,作为分布式动力的新一代大数据技术 ,致力于中国I
6、T系统大提速事业。二、功能测试2.1 hadoop测试创建数据hdfsKylin:$ vi /demoone 1two 2three 3将demo文件存入到/tmp上传至hdfshdfsKylin:$ hadoop fs -put /demo /tmp查看demo文件是否存在hdfsKylin:$ hadoop fs -ls /tmpdrwx- - ambari-qa hdfs 0 2017-08-24 08:47 /tmp/ambari-qa-rw-r-r- 3 hdfs hdfs 27 2017-08-25 01:48 /tmp/demodrwxr-xr-x - hdfs hdfs 0 2
7、017-08-24 08:25 /tmp/entity-file-historydrwx-wx-wx - hive hdfs 0 2017-08-24 08:56 /tmp/hive-rwxr-xr-x 3 hdfs hdfs 2670 2017-08-24 08:22 /tmp/ida8c06100_date222417-rwxr-xr-x 3 ambari-qa hdfs 2839 2017-08-24 08:57 /tmp/idtest.ambari-qa.1503536271.21.in-rwxr-xr-x 3 ambari-qa hdfs 957 2017-08-24 08:57 /
8、tmp/idtest.ambari-qa.1503536271.21.pigdrwx- - ambari-qa hdfs 0 2017-08-24 08:39 /tmp/temp-1287162259drwxr-xr-x - ambari-qa hdfs 0 2017-08-24 08:40 /tmp/tezsmokeinputdrwxr-xr-x - ambari-qa hdfs 0 2017-08-24 08:48 /tmp/tezsmokeoutputhdfsKylin:$ md5sum demodemo.md5删除demo文件hdfsKylin:$ rm -rf /tmp/demo查看
9、demo文件已不再hdfsKylin:$ ls去HDFS上get一个demo文件hdfsKylin:$ hadoop fs -get /tmp/demo此时demo文件已经从hdfs上get下来hdfsKylin:$ lsdemo说明两个文件一样hdfsKylin:$ md5sum -c demo.md5demo: OK执行hadoop-mapreduce-examples.jar hdfsKylin:/usr/crh/5.0.2.4-1136/hadoop-mapreduce$ hadoop jar hadoop-mapreduce-examples.jar wordcount /tmp/
10、demo /output1WARNING: Use yarn jar to launch YARN applications.17/08/25 01:54:34 INFO impl.TimelineClientImpl: Timeline service address: http:/kylin:8188/ws/v1/timeline/17/08/25 01:54:34 INFO client.RMProxy: Connecting to ResourceManager at kylin/192.168.0.97:805017/08/25 01:54:36 INFO input.FileInp
11、utFormat: Total input paths to process : 117/08/25 01:54:36 INFO mapreduce.JobSubmitter: number of splits:117/08/25 01:54:36 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1503537271434_000217/08/25 01:54:37 INFO impl.YarnClientImpl: Submitted application application_1503537271434_00021
12、7/08/25 01:54:38 INFO mapreduce.Job: The url to track the job: http:/Kylin:8088/proxy/application_1503537271434_0002/17/08/25 01:54:38 INFO mapreduce.Job: Running job: job_1503537271434_000217/08/25 01:55:00 INFO mapreduce.Job: Job job_1503537271434_0002 running in uber mode : false17/08/25 01:55:00
13、 INFO mapreduce.Job: map 0% reduce 0%17/08/25 01:55:11 INFO mapreduce.Job: map 100% reduce 0%17/08/25 01:55:21 INFO mapreduce.Job: map 100% reduce 100%17/08/25 01:55:22 INFO mapreduce.Job: Job job_1503537271434_0002 completed successfully17/08/25 01:55:23 INFO mapreduce.Job: Counters: 49File System
14、CountersFILE: Number of bytes read=81FILE: Number of bytes written=259199FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=118HDFS: Number of bytes written=43HDFS: Number of read operations=6HDFS: Number of large read
15、 operations=0HDFS: Number of write operations=2Job Counters Launched map tasks=1Launched reduce tasks=1Data-local map tasks=1Total time spent by all maps in occupied slots (ms)=8376Total time spent by all reduces in occupied slots (ms)=16470Total time spent by all map tasks (ms)=8376Total time spent
16、 by all reduce tasks (ms)=8235Total vcore-milliseconds taken by all map tasks=8376Total vcore-milliseconds taken by all reduce tasks=8235Total megabyte-milliseconds taken by all map tasks=5360640Total megabyte-milliseconds taken by all reduce tasks=10540800Map-Reduce FrameworkMap input records=4Map
17、output records=8Map output bytes=59Map output materialized bytes=81Input split bytes=91Combine input records=8Combine output records=8Reduce input groups=8Reduce shuffle bytes=81Reduce input records=8Reduce output records=8Spilled Records=16Shuffled Maps =1Failed Shuffles=0Merged Map outputs=1GC tim
18、e elapsed (ms)=327CPU time spent (ms)=4890Physical memory (bytes) snapshot=397238272Virtual memory (bytes) snapshot=4828192768Total committed heap usage (bytes)=596115456Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format Counters Bytes Read=27File Ou
19、tput Format Counters Bytes Written=43将生成数据的程序包拷贝到根目录下RedoopTestData.jarrootKylin:$ lsdata RedoopTestData.jar生成三张表的数据rootKylin:/# java -cp RedoopTestData.jar DBGen -p ./data -b 1 -c 1 -t 1将数据传至hdfshdfsKylin:/$ hadoop fs -mkdir /datahdfsKylin:/$ hadoop fs -put data/* /data查看hdfs,三张表已存在hdfsKylin:$ hado
20、op fs -ls /dataFound 3 itemsdrwxr-xr-x - hdfs hdfs 0 2017-08-25 10:15 /data/booksdrwxr-xr-x - hdfs hdfs 0 2017-08-25 10:16 /data/customersdrwxr-xr-x - hdfs hdfs 0 2017-08-25 10:16 /data/transactions2.2 导入数据到hbase2.2.1 创建表在hbase中建表hbase(main):002:0 create books,NAME=info,COMPRESSION=snappy0 row(s) in
21、 3.8830 seconds= Hbase:Table - bookshbase(main):007:0 create customers,NAME=info,COMPRESSION=snappy0 row(s) in 1.2640 seconds= Hbase:Table customershbase(main):008:0 create transactions,NAME=info,COMPRESSION=snappy0 row(s) in 2.2700 seconds= Hbase:Table - transactions2.2.3 将hdfs中的数据导入到hbaseBooks表hdf
22、sKylin:$ sudo -u hdfs hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=| -Dimporttsv.columns=HBASE_ROW_KEY,info:isbn,info:category,info:publish_date,info:publisher,info:price -Dimporttsv.bulk.output=/tmp/hbase/books books /data/books/booksSLF4J: Class path contains multiple SL
23、F4J bindings.SLF4J: Found binding in jar:file:/usr/crh/5.0.2.4-1136/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.classSLF4J: Found binding in jar:file:/usr/crh/5.0.2.4-1136/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.classSLF4J: See http:/www.slf
24、4j.org/codes.html#multiple_bindings for an explanation.2017-08-26 13:33:39,743 INFO main mapreduce.Job: map 100% reduce 100%2017-08-26 13:33:42,769 INFO main mapreduce.Job: Job job_1475035075157_0001 completed successfully2017-08-26 13:33:44,121 INFO main mapreduce.Job: Counters: 50File System Count
25、ersFILE: Number of bytes read=6564638336FILE: Number of bytes written=9848373008FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=1074660143HDFS: Number of bytes written=1206386815HDFS: Number of read operations=29HDF
26、S: Number of large read operations=0HDFS: Number of write operations=3Job Counters Launched map tasks=8Launched reduce tasks=1Data-local map tasks=8Total time spent by all maps in occupied slots (ms)=1310071Total time spent by all reduces in occupied slots (ms)=1195138Total time spent by all map tas
27、ks (ms)=1310071Total time spent by all reduce tasks (ms)=597569Total vcore-milliseconds taken by all map tasks=1310071Total vcore-milliseconds taken by all reduce tasks=597569Total megabyte-milliseconds taken by all map tasks=1341512704Total megabyte-milliseconds taken by all reduce tasks=1223821312
28、Map-Reduce FrameworkMap input records=15968981Map output records=15968981Map output bytes=3234412153Map output materialized bytes=3282319144Input split bytes=792Combine input records=15968981Combine output records=15968981Reduce input groups=15968981Reduce shuffle bytes=3282319144Reduce input record
29、s=15968981Reduce output records=79844905Spilled Records=47906943Shuffled Maps =8Failed Shuffles=0Merged Map outputs=8GC time elapsed (ms)=27277CPU time spent (ms)=731660Physical memory (bytes) snapshot=7252418560Virtual memory (bytes) snapshot=16257306624Total committed heap usage (bytes)=7138705408
30、ImportTsvBad Lines=0Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format Counters Bytes Read=1074659351File Output Format Counters Bytes Written=1206386815Customers表hdfsKylin:$ sudo -u hdfs hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.
31、separator=| -Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:data_of_birth,info:gender,info:state,info:email,info:phone -Dimporttsv.bulk.output=/tmp/hbase/customers customers /data/customers/customersSLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in jar:file:/usr/crh/5.0.2.4
32、1136/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.classSLF4J: Found binding in jar:file:/usr/crh/5.0.2.4-1136/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.classSLF4J: See http:/www.slf4j.org/codes.html#multiple_bindings for an explanation.2017-08
33、26 13:42:23,409 INFO main mapreduce.Job: map 100% reduce 100%2017-08-26 13:42:27,433 INFO main mapreduce.Job: Job job_1475035075157_0002 completed successfully2017-08-26 13:42:27,626 INFO main mapreduce.Job: Counters: 50File System CountersFILE: Number of bytes read=3079022558FILE: Number of bytes
34、written=6159460878FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=1074660192HDFS: Number of bytes written=1293085179HDFS: Number of read operations=29HDFS: Number of large read operations=0HDFS: Number of write oper
35、ations=3Job Counters Launched map tasks=8Launched reduce tasks=1Data-local map tasks=8Total time spent by all maps in occupied slots (ms)=658668Total time spent by all reduces in occupied slots (ms)=687254Total time spent by all map tasks (ms)=658668Total time spent by all reduce tasks (ms)=343627To
36、tal vcore-milliseconds taken by all map tasks=658668Total vcore-milliseconds taken by all reduce tasks=343627Total megabyte-milliseconds taken by all map tasks=674476032Total megabyte-milliseconds taken by all reduce tasks=703748096Map-Reduce FrameworkMap input records=13353588Map output records=133
37、53588Map output bytes=3038961746Map output materialized bytes=3079022558Input split bytes=856Combine input records=13353588Combine output records=13353588Reduce input groups=13353588Reduce shuffle bytes=3079022558Reduce input records=13353588Reduce output records=80121528Spilled Records=26707176Shuf
38、fled Maps =8Failed Shuffles=0Merged Map outputs=8GC time elapsed (ms)=28698CPU time spent (ms)=570690Physical memory (bytes) snapshot=7279644672Virtual memory (bytes) snapshot=16289914880Total committed heap usage (bytes)=7179075584ImportTsvBad Lines=0Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRON
39、G_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format Counters Bytes Read=1074659336File Output Format Counters Bytes Written=1293085179Transactions表hdfsKylin:$ sudo -u hdfs hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=| -Dimporttsv.columns=HBASE_ROW_KEY,info:customer_id,in
40、fo:book_id,info:quantity,info:transaction_date -Dimporttsv.bulk.output=/tmp/hbase/transactions transactions /data/transactions/transactionsSLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in jar:file:/usr/crh/5.0.2.4-1136/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/Stat
41、icLoggerBinder.classSLF4J: Found binding in jar:file:/usr/crh/5.0.2.4-1136/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.classSLF4J: See http:/www.slf4j.org/codes.html#multiple_bindings for an explanation.2017-08-26 13:53:45,291 INFO main mapreduce.Job: map 100% reduce 100
42、2017-08-26 13:53:50,332 INFO main mapreduce.Job: Job job_1475035075157_0003 completed successfully2017-08-26 13:53:50,467 INFO main mapreduce.Job: Counters: 50File System CountersFILE: Number of bytes read=8011341058FILE: Number of bytes written=12018427388FILE: Number of read operations=0FILE: Num
43、ber of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=1074660264HDFS: Number of bytes written=1513797552HDFS: Number of read operations=29HDFS: Number of large read operations=0HDFS: Number of write operations=3Job Counters Launched map tasks=8Launched reduce tas
44、ks=1Data-local map tasks=8Total time spent by all maps in occupied slots (ms)=1095666Total time spent by all reduces in occupied slots (ms)=1012006Total time spent by all map tasks (ms)=1095666Total time spent by all reduce tasks (ms)=506003Total vcore-milliseconds taken by all map tasks=1095666Total vcore-milliseconds taken by all reduce tasks=506003Total megabyte-milliseconds taken by all map tasks=1121961984Total megabyte-milliseconds taken by all reduce tasks=1036294144