摘要:本篇教程探讨了大数据技术之Using Sqoop 1.4.6 With Hadoop 2.7.4,希望阅读本篇文章以后大家有所收获,帮助大家对相关内容的理解更加深入。
本篇教程探讨了大数据技术之Using Sqoop 1.4.6 With Hadoop 2.7.4,希望阅读本篇文章以后大家有所收获,帮助大家对相关内容的理解更加深入。
本文主要描述Sqoop 1.4.6的安装配置以及使用。一、安装配置
1、Sqoop安装
[hadoop@hdp01 ~]$ wget //mirror.bit.edu.cn/apache/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz[hadoop@hdp01 ~]$ tar -xzf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz[hadoop@hdp01 ~]$ mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha /u01/sqoop--编辑Sqoop环境变量[hadoop@hdp01 ~]$ cd /u01/sqoop/conf[hadoop@hdp01 conf]$ cp sqoop-env-template.sh sqoop-env.sh[hadoop@hdp01 conf]$ vi sqoop-env.shexport HADOOP_COMMON_HOME=/u01/hadoopexport HADOOP_MAPRED_HOME=/u01/hadoopexport HBASE_HOME=/u01/hbaseexport HIVE_HOME=/u01/hiveexport ZOOCFGDIR=/u01/zookeeper/conf--注释掉configure-sqoop中的以下内容#if [ -z "${HCAT_HOME}" ]; then# if [ -d "/usr/lib/hive-hcatalog" ]; then# HCAT_HOME=/usr/lib/hive-hcatalog# elif [ -d "/usr/lib/hcatalog" ]; then# HCAT_HOME=/usr/lib/hcatalog# else# HCAT_HOME=${SQOOP_HOME}/../hive-hcatalog# if [ ! -d ${HCAT_HOME} ]; then# HCAT_HOME=${SQOOP_HOME}/../hcatalog# fi# fi#fi#if [ -z "${ACCUMULO_HOME}" ]; then# if [ -d "/usr/lib/accumulo" ]; then# ACCUMULO_HOME=/usr/lib/accumulo# else# ACCUMULO_HOME=${SQOOP_HOME}/../accumulo# fi#fi## Moved to be a runtime check in sqoop.#if [ ! -d "${HCAT_HOME}" ]; then# echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."# echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'#fi##if [ ! -d "${ACCUMULO_HOME}" ]; then# echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."# echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'#fi--编辑用户环境环境变量[hadoop@hdp01 ~]$ vi .bash_profileexport SQOOP_HOME=/u01/sqoopexport SQOOP_CONF_DIR=$SQOOP_HOME/confexport SQOOP_CLASSPATH=$SQOOP_CONF_DIRexport PATH=$PATH:$SQOOP_HOME/bin[hadoop@hdp01 ~]$ source .bash_profile--验证sqoop安装[hadoop@hdp01 ~]$ sqoop version2017-12-28 09:30:01,801 [myid:] - INFO [main:Sqoop@92] - Running Sqoop version: 1.4.6Sqoop 1.4.6git commit id c0c5a81723759fa575844a0a1eae8f510fa32c25Compiled by root on Mon Apr 27 14:38:36 CST 2015或者运行sqoop-version--拷贝jdbc驱动将MySQL、PostgreSQL以及Oracle的jdbc驱动拷贝到$SQOOP_HOME/lib
二、Sqoop使用
1、Sqoop测试各个jdbc驱动连接
1.1 Sqoop与MySQL的连接
[hadoop@hdp01 bin]$ sqoop list-tables --username root -P --connect jdbc:mysql://192.168.120.92:3306/smsqw?useSSL=false2017-12-28 09:38:19,587 [myid:] - INFO [main:Sqoop@92] - Running Sqoop version: 1.4.6Enter password: 2017-12-28 09:38:23,067 [myid:] - INFO [main:MySQLManager@69] - Preparing to use a MySQL streaming resultset.PhoneTestPhonehistory_storetbAreaprefixtbAreaprefix_baktbBilltbBilltmptbCattbContacttbDataPathtbDeliverMsgtbDeliverMsg2tbDesttbLocPrefixtbMessagetbPricetbReceivertbSSLogtbSendStatetbSendState2tbSmsSendStatetbTesttbUser
1.2 Sqoop与PostgreSQL的连接
[hadoop@hdp01 ~]$ sqoop list-tables --username rhnuser -P --connect jdbc:postgresql://192.168.120.93:5432/rhndb2017-12-28 09:40:24,842 [myid:] - INFO [main:Sqoop@92] - Running Sqoop version: 1.4.6Enter password: 2017-12-28 09:40:29,775 [myid:] - INFO [main:SqlManager@98] - Using default fetchSize of 1000rhnservergroupmembersrhntemplatestringrhnservergrouptypefeaturerhnserverhistoryqrtz_fired_triggers
1.3 Sqoop与Oracle的连接
[hadoop@hdp01 ~]$ sqoop list-tables --username spwuser -P --connect jdbc:oracle:thin:@192.168.120.121:1521/rhndb --driver oracle.jdbc.driver.OracleDriver2017-12-28 10:01:43,337 [myid:] - INFO [main:Sqoop@92] - Running Sqoop version: 1.4.6Enter password: 2017-12-28 10:01:43,425 [myid:] - INFO [main:SqlManager@98] - Using default fetchSize of 1000rhnservergroupmembersrhntemplatestringrhnservergrouptypefeaturerhnserverhistoryqrtz_fired_triggers
1.4 Sqoop与Hive的连接
基于PostgreSQL在hive上创建一个名为rhnpackagefile的表,但不导入数据,后面介绍数据导入。
[hadoop@hdp01 ~]$ sqoop create-hive-table --connect jdbc:postgresql://192.168.120.93:5432/rhndb --table rhnpackagefile --username rhnuser -P --hive-database hivedb2017-12-28 10:32:01,376 [myid:] - INFO [main:Sqoop@92] - Running Sqoop version: 1.4.6Enter password: 2017-12-28 10:32:04,699 [myid:] - INFO [main:BaseSqoopTool@1353] - Using Hive-specific delimiters for output. You can override2017-12-28 10:32:04,699 [myid:] - INFO [main:BaseSqoopTool@1354] - delimiters with --fields-terminated-by, etc.2017-12-28 10:32:04,819 [myid:] - INFO [main:SqlManager@98] - Using default fetchSize of 10002017-12-28 10:32:05,015 [myid:] - INFO [main:SqlManager@757] - Executing SQL statement: SELECT t.* FROM "rhnpackagefile" AS t LIMIT 12017-12-28 10:32:05,674 [myid:] - INFO [main:HiveImport@194] - Loading uploaded data into Hive2017-12-28 10:32:09,089 [myid:] - INFO [Thread-6:LoggingAsyncSink$LoggingThread@85] - SLF4J: Class path contains multiple SLF4J bindings.2017-12-28 10:32:09,090 [myid:] - INFO [Thread-6:LoggingAsyncSink$LoggingThread@85] - SLF4J: Found binding in [jar:file:/u01/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]2017-12-28 10:32:09,090 [myid:] - INFO [Thread-6:LoggingAsyncSink$LoggingThread@85] - SLF4J: Found binding in [jar:file:/u01/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]2017-12-28 10:32:09,090 [myid:] - INFO [Thread-6:LoggingAsyncSink$LoggingThread@85] - SLF4J: Found binding in [jar:file:/u01/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]2017-12-28 10:32:09,091 [myid:] - INFO [Thread-6:LoggingAsyncSink$LoggingThread@85] - SLF4J: Found binding in [jar:file:/u01/tez/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]2017-12-28 10:32:09,091 [myid:] - INFO [Thread-6:LoggingAsyncSink$LoggingThread@85] - SLF4J: Found binding in [jar:file:/u01/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]2017-12-28 10:32:09,091 [myid:] - INFO [Thread-6:LoggingAsyncSink$LoggingThread@85] - SLF4J: See //www.slf4j.org/codes.html#multiple_bindings for an explanation.2017-12-28 10:32:09,095 [myid:] - INFO [Thread-6:LoggingAsyncSink$LoggingThread@85] - SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]2017-12-28 10:32:11,996 [myid:] - INFO [Thread-6:LoggingAsyncSink$LoggingThread@85] - 2017-12-28 10:32:11,996 [myid:] - INFO [Thread-6:LoggingAsyncSink$LoggingThread@85] - Logging initialized using configuration in jar:file:/u01/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true2017-12-28 10:32:16,650 [myid:] - INFO [Thread-6:LoggingAsyncSink$LoggingThread@85] - OK2017-12-28 10:32:16,783 [myid:] - INFO [Thread-6:LoggingAsyncSink$LoggingThread@85] - Time taken: 3.433 seconds2017-12-28 10:32:17,248 [myid:] - INFO [main:HiveImport@242] - Hive import complete.
2、数据迁移
2.1 PostgreSQL☞Hive
[hadoop@hdp01 ~]$ sqoop import --connect jdbc:postgresql://192.168.120.93:5432/rhndb --table rhnpackagefile --username rhnuser -P --fields-terminated-by ',' --hive-import --hive-database hivedb --columns package_id,capability_id,device,inode,file_mode,username,groupname,rdev,file_size,mtime,checksum_id,linkto,flags,verifyflags,lang,created,modified --split-by modified -m 42017-12-28 11:24:46,666 [myid:] - INFO [main:Sqoop@92] - Running Sqoop version: 1.4.6Enter password: 2017-12-28 11:24:48,891 [myid:] - INFO [main:SqlManager@98] - Using default fetchSize of 10002017-12-28 11:24:48,894 [myid:] - INFO [main:CodeGenTool@92] - Beginning code generation2017-12-28 11:24:49,091 [myid:] - INFO [main:SqlManager@757] - Executing SQL statement: SELECT t.* FROM "rhnpackagefile" AS t LIMIT 12017-12-28 11:24:49,127 [myid:] - INFO [main:CompilationManager@94] - HADOOP_MAPRED_HOME is /u01/hadoopNote: /tmp/sqoop-hadoop/compile/ca09f6bb133fa32808220902aedc0437/rhnpackagefile.java uses or overrides a deprecated API.Note: Recompile with -Xlint:deprecation for details.2017-12-28 11:24:50,481 [myid:] - INFO [main:CompilationManager@330] - Writing jar file: /tmp/sqoop-hadoop/compile/ca09f6bb133fa32808220902aedc0437/rhnpackagefile.jar2017-12-28 11:24:50,493 [myid:] - WARN [main:PostgresqlManager@119] - It looks like you are importing from postgresql.2017-12-28 11:24:50,493 [myid:] - WARN [main:PostgresqlManager@120] - This transfer can be faster! Use the --direct2017-12-28 11:24:50,494 [myid:] - WARN [main:PostgresqlManager@121] - option to exercise a postgresql-specific fast path.2017-12-28 11:24:50,495 [myid:] - INFO [main:ImportJobBase@235] - Beginning import of rhnpackagefile2017-12-28 11:24:50,496 [myid:] - INFO [main:Configuration@1019] - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address2017-12-28 11:24:50,634 [myid:] - INFO [main:Configuration@1019] - mapred.jar is deprecated. Instead, use mapreduce.job.jar2017-12-28 11:24:51,160 [myid:] - INFO [main:Configuration@1019] - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps2017-12-28 11:24:51,506 [myid:] - INFO [main:TimelineClientImpl@123] - Timeline service address: //hdp01:8188/ws/v1/timeline/2017-12-28 11:24:51,696 [myid:] - INFO [main:AHSProxy@42] - Connecting to Application History server at hdp01.thinkjoy.tt/192.168.120.96:102012017-12-28 11:24:53,801 [myid:] - INFO [main:DBInputFormat@192] - Using read commited transaction isolation2017-12-28 11:24:53,805 [myid:] - INFO [main:Configuration@1019] - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps2017-12-28 11:24:53,805 [myid:] - INFO [main:DataDrivenDBInputFormat@147] - BoundingValsQuery: SELECT MIN("modified"), MAX("modified") FROM "rhnpackagefile"2017-12-28 11:25:14,854 [myid:] - WARN [main:TextSplitter@64] - Generating splits for a textual index column.2017-12-28 11:25:14,854 [myid:] - WARN [main:TextSplitter@65] - If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.2017-12-28 11:25:14,854 [myid:] - WARN [main:TextSplitter@67] - You are strongly encouraged to choose an integral split column.2017-12-28 11:25:14,903 [myid:] - INFO [main:JobSubmitter@396] - number of splits:62017-12-28 11:25:14,997 [myid:] - INFO [main:JobSubmitter@<span class="hljs-numbe
本文由职坐标整理发布,学习更多的相关知识,请关注职坐标IT知识库!
您输入的评论内容中包含违禁敏感词
我知道了
请输入正确的手机号码
请输入正确的验证码
您今天的短信下发次数太多了,明天再试试吧!
我们会在第一时间安排职业规划师联系您!
您也可以联系我们的职业规划师咨询:
版权所有 职坐标-一站式IT培训就业服务领导者 沪ICP备13042190号-4
上海海同信息科技有限公司 Copyright ©2015 www.zhizuobiao.com,All Rights Reserved.
沪公网安备 31011502005948号