副标题[/!--empirenews.page--]
【第一部分】《深入理解大数据》一书的源代码?
http://download.csdn.net/detail/heming621/9423291
http://hadoop.apache.org/
https://www.zhihu.com/question/19795366
http://mooc.guokr.com/course/2194/%E5%A4%A7%E6%95%B0%E6%8D%AE%E7%B3%BB%E7%BB%9F%E5%9F%BA%E7%A1%80/
http://download.csdn.net/album/detail/3466/1/1
【第二部分】安装hadoop1.2.1安装
【1】安装java程序 jdk-6u45-linux-i586-rpm.rar 解压后为 jdk-6u45-linux-i586-rpm.bin 安装执行 ./jdk-6u45-linux-i586-rpm.bin 安装成功后目录为 /usr/java/jdk1.6.0_45 A22811459:/usr/java/jdk1.6.0_45 # pwd /usr/java/jdk1.6.0_45 A22811459:/usr/java/jdk1.6.0_45 # ls COPYRIGHT? LICENSE? README.html? THIRDPARTYLICENSEREADME.txt? bin? include? jre? lib? man? src.zip
【1.2】在系统中/etc/profile添加java路径,便于调用 #set java export JAVA_HOME=/usr/java/jdk1.6.0_45 export JRE_HOME=$JAVA_HOME/jre export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib export PATH=$PATH:$JAVA_HOME/bin
【1.3】让配置生效 # source /etc/profile
【1.4】查看java版本,说明安装成功 A22811459:/usr/java/jdk1.6.0_45 # java -version java version "1.6.0_45" Java(TM) SE Runtime Environment (build 1.6.0_45-b06) Java HotSpot(TM) Server VM (build 20.45-b01,mixed mode
【1.5】可以写一个简单的java程序进行编译运行,进一步确保java安装成功 HelloWel.java
public class HelloWel { ?? ??? public static void main(String[] args) ?? ??? { ?? ?????? System.out.println("JAVA OK");?? ? ?? ??? }?? ? }
编译和运行 # javac HelloWel.java # java HelloWel JAVA OK 至此可百分百确保Java安装没有问题,java路径(后面会用到)为 /usr/java/jdk1.6.0_45
【2】hadoop1.2.1安装 参考《深入理解大数据》 【2.1】创建hadoop用户 #groupadd hadoop-user #useradd -g hadoop-user hadoop #passwd hadoop
【2.2】配置SSH #ssh-keygen -t rsa # cd /root/.ssh/ #cp id_rsa.pub authorized_keys #ssh localhost 查看结果 # ls authorized_keys? id_rsa? id_rsa.pub? known_hosts
【2.3】配置hadoop环境 hadoop系统版本 hadoop-1.2.1.tar.gz 解压后linux目录为 /home/longhui/hadoop/hadoop-1.2.1/
【2.3.1】配置 conf/hadoop-env.sh 配置JAVA_HOME对应的路径 export JAVA_HOME=/usr/java/jdk1.6.0_45
【2.3.2】配置三个xml文件 【1】core-site.xml配置 <configuration> <property> <name>hadoop.tmp.dir</name> <value>/tmp/hadoop</value> </property> <property> <name>fs.default.name</name> <value>hdfs://A22811459:9000</value> </property> </configuration> 【备注】 临时文件夹为/tmp/hadoop,配置成功后该目录下会生成两个文件夹dfs? mapred,并且/tmp目录下会生成一些pid文件 A22811459:/tmp # ls hadoop hadoop/??????????????????????????? hadoop-root-jobtracker.pid???????? hadoop-root-secondarynamenode.pid hadoop-root-datanode.pid?????????? hadoop-root-namenode.pid?????????? hadoop-root-tasktracker.pid 【2】hdfs-site.xml <configuration> <property> <name>dfs.name.dir</name> <value>/home/longhui/hadoop/dfs/name</value> </property> <property> <name>dfs.data.dir</name> <value>/home/longhui/hadoop/dfs/data</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> 【备注】 配置成功后/home/longhui/hadoop/dfs/name下会生成一些文件current? image? in_use.lock? previous.checkpoint /home/longhui/hadoop/dfs/data生成blocksBeingWritten? current? detach? in_use.lock? storage? tmp 【3】mapred-site.xml <configuration> <property> <name>mapred.job.tracker</name> <value>A22811459:9001</value> </property> <property> <name>mapreduce.cluster.local.dir</name> <value>/home/longhui/hadoop/mapred/local</value> </property> <property> <name>mapreduce.jobtracker.system.dir</name> <value>/home/longhui/hadoop/mapred/system</value> </property> </configuration> 【4】由于主机名为A22811459,所以就不是localhost,并且/etc/hosts文件中也要修改下 127.0.0.1?????? A22811459
【2.3.3】在/etc/profile中添加hadoop路径并# source /etc/profile 生效 #set hadoop export HADOOP_HOME_WARN_SUPPRESS=1 export HADOOP_HOME=/home/longhui/hadoop/hadoop-1.2.1 export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
【2.3.4】格式化HDFS文件系统 执行 bin/hadoop namenode -format 或直接hadoop namenode -format 接着输入Y # hadoop namenode -format 16/12/15 12:59:50 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG:?? host = A22811459/127.0.0.1 STARTUP_MSG:?? args = [-format] STARTUP_MSG:?? version = 1.2.1 STARTUP_MSG:?? build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013 STARTUP_MSG:?? java = 1.6.0_45 ************************************************************/ Re-format filesystem in /home/longhui/hadoop/dfs/name ? (Y or N) Y 16/12/15 12:59:52 INFO util.GSet: Computing capacity for map BlocksMap 16/12/15 12:59:52 INFO util.GSet: VM type?????? = 32-bit 16/12/15 12:59:52 INFO util.GSet: 2.0% max memory = 932118528 16/12/15 12:59:52 INFO util.GSet: capacity????? = 2^22 = 4194304 entries 16/12/15 12:59:52 INFO util.GSet: recommended=4194304,actual=4194304 16/12/15 12:59:53 INFO namenode.FSNamesystem: fsOwner=root 16/12/15 12:59:53 INFO namenode.FSNamesystem: supergroup=supergroup 16/12/15 12:59:53 INFO namenode.FSNamesystem: isPermissionEnabled=true 16/12/15 12:59:53 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 16/12/15 12:59:53 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s),accessTokenLifetime=0 min(s) 16/12/15 12:59:53 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0 16/12/15 12:59:53 INFO namenode.NameNode: Caching file names occuring more than 10 times 16/12/15 12:59:53 INFO common.Storage: Image file /home/longhui/hadoop/dfs/name/current/fsimage of size 110 bytes saved in 0 seconds. 16/12/15 12:59:53 INFO namenode.FSEditLog: closing edit log: position=4,editlog=/home/longhui/hadoop/dfs/name/current/edits 16/12/15 12:59:53 INFO namenode.FSEditLog: close success: truncate to 4,editlog=/home/longhui/hadoop/dfs/name/current/edits 16/12/15 12:59:53 INFO common.Storage: Storage directory /home/longhui/hadoop/dfs/name has been successfully formatted. 16/12/15 12:59:53 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at A22811459/127.0.0.1 ************************************************************/ 【备注】如果警告Warning:?$HADOOP_HOME?is?deprecated.? 解决方法:在/etc/profie中添加一行,然后让配置生效#?source /etc/profile,再运行bin/hadoop namenode -format就不会报错 export HADOOP_HOME_WARN_SUPPRESS=1 【2.3.5】启动hadoop环境? 注停止时stop-all.sh # start-all.sh starting namenode,logging to /home/longhui/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-namenode-A22811459.out localhost: starting datanode,logging to /home/longhui/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-datanode-A22811459.out localhost: starting secondarynamenode,logging to /home/longhui/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-secondarynamenode-A22811459.out starting jobtracker,logging to /home/longhui/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-jobtracker-A22811459.out localhost: starting tasktracker,logging to /home/longhui/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-tasktracker-A22811459.out
【2.3.6】使用jps查看集群状态,除jps进程外,另外五个进程缺一不可。如下说明正常启动了 # jps 2352 TaskTracker 1940 DataNode 1802 NameNode 2465 Jps 2211 JobTracker 2106 SecondaryNameNode
【3】运行第一个自带的测试用例:计算PI的值 A22811459:/home/longhui/hadoop/hadoop-1.2.1 # hadoop jar hadoop-examples-1.2.1.jar pi 2 5 Number of Maps? = 2 Samples per Map = 5 Wrote input for Map #0 Wrote input for Map #1 Starting Job 16/12/15 14:06:04 INFO mapred.FileInputFormat: Total input paths to process : 2 16/12/15 14:06:04 INFO mapred.JobClient: Running job: job_201612151254_0001 16/12/15 14:06:05 INFO mapred.JobClient:? map 0% reduce 0% 16/12/15 14:06:10 INFO mapred.JobClient:? map 100% reduce 0% 16/12/15 14:06:18 INFO mapred.JobClient:? map 100% reduce 33% 16/12/15 14:06:19 INFO mapred.JobClient:? map 100% reduce 100% 16/12/15 14:06:19 INFO mapred.JobClient: Job complete: job_201612151254_0001 16/12/15 14:06:19 INFO mapred.JobClient: Counters: 30 16/12/15 14:06:19 INFO mapred.JobClient:?? Job Counters 16/12/15 14:06:19 INFO mapred.JobClient:???? Launched reduce tasks=1 16/12/15 14:06:19 INFO mapred.JobClient:???? SLOTS_MILLIS_MAPS=6864 16/12/15 14:06:19 INFO mapred.JobClient:???? Total time spent by all reduces waiting after reserving slots (ms)=0 16/12/15 14:06:19 INFO mapred.JobClient:???? Total time spent by all maps waiting after reserving slots (ms)=0 16/12/15 14:06:19 INFO mapred.JobClient:???? Launched map tasks=2 16/12/15 14:06:19 INFO mapred.JobClient:???? Data-local map tasks=2 16/12/15 14:06:19 INFO mapred.JobClient:???? SLOTS_MILLIS_REDUCES=8661 16/12/15 14:06:19 INFO mapred.JobClient:?? File Input Format Counters 16/12/15 14:06:19 INFO mapred.JobClient:???? Bytes Read=236 16/12/15 14:06:19 INFO mapred.JobClient:?? File Output Format Counters 16/12/15 14:06:19 INFO mapred.JobClient:???? Bytes Written=97 16/12/15 14:06:19 INFO mapred.JobClient:?? FileSystemCounters 16/12/15 14:06:19 INFO mapred.JobClient:???? FILE_BYTES_READ=50 16/12/15 14:06:19 INFO mapred.JobClient:???? HDFS_BYTES_READ=478 16/12/15 14:06:19 INFO mapred.JobClient:???? FILE_BYTES_WRITTEN=160889 16/12/15 14:06:19 INFO mapred.JobClient:???? HDFS_BYTES_WRITTEN=215 16/12/15 14:06:19 INFO mapred.JobClient:?? Map-Reduce Framework 16/12/15 14:06:19 INFO mapred.JobClient:???? Map output materialized bytes=56 16/12/15 14:06:19 INFO mapred.JobClient:???? Map input records=2 16/12/15 14:06:19 INFO mapred.JobClient:???? Reduce shuffle bytes=56 16/12/15 14:06:19 INFO mapred.JobClient:???? Spilled Records=8 16/12/15 14:06:19 INFO mapred.JobClient:???? Map output bytes=36 16/12/15 14:06:19 INFO mapred.JobClient:???? Total committed heap usage (bytes)=377028608 16/12/15 14:06:19 INFO mapred.JobClient:???? CPU time spent (ms)=3100 16/12/15 14:06:19 INFO mapred.JobClient:???? Map input bytes=48 16/12/15 14:06:19 INFO mapred.JobClient:???? SPLIT_RAW_BYTES=242 16/12/15 14:06:19 INFO mapred.JobClient:???? Combine input records=0 16/12/15 14:06:19 INFO mapred.JobClient:???? Reduce input records=4 16/12/15 14:06:19 INFO mapred.JobClient:???? Reduce input groups=4 16/12/15 14:06:19 INFO mapred.JobClient:???? Combine output records=0 16/12/15 14:06:19 INFO mapred.JobClient:???? Physical memory (bytes) snapshot=376963072 16/12/15 14:06:19 INFO mapred.JobClient:???? Reduce output records=0 16/12/15 14:06:19 INFO mapred.JobClient:???? Virtual memory (bytes) snapshot=1132392448 16/12/15 14:06:19 INFO mapred.JobClient:???? Map output records=4 Job Finished in 15.585 seconds Estimated value of Pi is 3.60000000000000000000
【4】
(编辑:应用网_阳江站长网)
【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!
|