基于VM技术与Docker技术搭建的Hadoop集群性能比较与分析

基于VM技术与Docker技术搭建的Hadoop集群性能比较与分析(论文14000字)
摘要:云计算是基于互联网的计算方式,共享的软硬件资源和信息能通过这种方式按需求提供给计算机和其他设备。随着云计算技术的发展,很多企业研发出了自己的云计算技平台。云计算广泛利用虚拟化技术,因为它们为了控制资源使用允许工作负载分离。但是额外的抽象层次的参与降低了工作性能,使得用户使用的性价比更低了。基于容器的虚拟化的新进展简化了部署应用程序的过程,并且提供一种隔离空间统一管理应用程序。本论文主要以基于传统虚拟机(VM)部署的Hadoop集群和基于Docker技术搭建的Hadoop集群为代表比较与分析它们的性能。主要分析了CPU性能,内存,网络资源和存储等方面,对结点性能进行了一定的评估,然后根据运行得到的数据对得到的性能的数据进行验证。有了这个分析结果,用户可以更为清晰地了解各个平台的特征,并且根据实际生产的需求合理地选择合适的平台。
关键词:虚拟机;云计算;Hadoop集群;Docker;性能分析
Performance Analysis of Hadoop Cluster Based on VMs and Docker
ABSTRACT:Cloud computing is based on the Internet, the sharing of software and hardware resources and information can be provided to the computer and other devices in this way. With the development of cloud computing technology, many enterprises put forward the related technology of different types of platform. Cloud computing is widely used in virtual machines (VM) because they are used to control the resources to allow for working load separation. But the extra level of abstraction reduces the performance of the work, making it not so worth forusers.Newer advances in container-based virtualization simplifies the deployment of applications while continuing to permit control of the resources allocated to different applications.This paper mainly based on the traditional virtual machine (VM) deployment of the Hadoop cluster and Docker based technology to build the Hadoop cluster as the representative of the comparison and analysis of their performance. The CPU performance, memory, network resources and storage are analyzed, and the performance of the node is calculated, and then the performance of the data is verified by the data obtained. With this analytical result, users can more clearly understand the characteristics of each platform, and a reasonable choice of appropriate platform. [资料来源:http://www.doc163.com]
Keywords:Cloud computing;Virtual machine(VM); Docker; Performance analysis;Hadoop cluster
[资料来源:www.doc163.com]

目录
1 绪论 1
1.1课题研究背景及研究的目的和意义 1
1.2国内外研究现状 1
1.2.1虚拟机 1
1.2.2 容器 1
1.3本课题主要研究内容 1
1.4论文结构 2
2 运行环境 2
2.1 Linux容器—Docker 2
2.2 虚拟机--KVM 3
2.3 本章小结 5
3 Hadoop集群 5
3.1 Hadoop简介 5
3.1.1 MapReduce 5
3.1.2 HDFS 6
3.2 基于Docker搭建Hadoop集群的方法 6
3.2.1 Docker的安装 7 [来源:http://Doc163.com]
3.2.2 安装Ubuntu镜像 7
3.2.3 安装Java 7
3.2.4 安装Hadoop 7
3.2.5配置环境变量 8
3.2.6配置Hadoop 8
3.2.7 格式化namenode 10
3.2.8 安装SSH 10
3.2.9保存镜像副本 10
3.2.10 构建集群 10
3.3 本章小结 11
4 实验 11
4.1 测量指标 11
4.1.1 CPU性能 11
4.1.2 I/O性能 11
4.1.3 内存性能 12
4.1.4 网络性能 12
4.2 Hadoop性能基准测试 12
4.2.1 UnixBench 12
4.2.2 IOzone 13
4.2.3 Netperf 13
4.3 对Docker和VM平台的数据分析 13
4.3.1 UnixBench 13
4.3.2 IOzone 14
4.3.3 Netperf 15
4.4 Hadoop结点性能的计算 16
4.4.1 WordCount程序 17
4.4.2 数据去重程序 17
4.4.3 数据排序程序 17
4.4.4 实验计算验证结果 17
4.5 本章小结 18
5 结论与展望 18
5.1结论 18
5.2不足之处及未来展望 19
参考文献 19
致谢 21 [资料来源:www.doc163.com]