• 欢迎关注微信公众号:九万里大数据
  • 请使用Ctrl+D收藏本站到书签栏
  • 手机也可访问本站 jwldata.com

Hive创建LZO压缩表和往LZO表插入数据

大数据技术 九万里大数据 2年前 (2021-07-26) 1427次浏览 0个评论 扫描二维码
文章目录[隐藏]

创建LZO Compressed Text Tables

使用Hive创建LZO压缩的Text格式的表

CREATE TABLE IF NOT EXISTS bank.account_lzo (
  `id_card` int,
  `tran_time` string,
  `name` string,
  `cash` int
  )
partitioned by(ds string)
STORED AS
INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"

查询建表语句

show create table bank.account_lzo

CREATE TABLE `bank.account_lzo`(
  `id_card` int, 
  `tran_time` string, 
  `name` string, 
  `cash` int)
PARTITIONED BY ( 
  `ds` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
STORED AS INPUTFORMAT 
  'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://nameservice1/user/hive/warehouse/bank.db/account_lzo'
TBLPROPERTIES (
  'transient_lastDdlTime'='1625551444')

往LZO表插入数据

SET mapreduce.output.fileoutputformat.compress=true;
SET hive.exec.compress.output=true;
SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec;

INSERT INTO bank.account_lzo partition(ds='2020-09-21') values (1000, '2020-09-21 14:30:00', 'Tom', 100);
INSERT INTO bank.account_lzo partition(ds='2020-09-20') values (1000, '2020-09-20 14:30:05', 'Tom', 50);
INSERT INTO bank.account_lzo partition(ds='2020-09-20') values (1000, '2020-09-20 14:30:10', 'Tom', -25);

创建index

After loading data into an LZO-compressed text table, index the files so that they can be split.

hadoop jar /opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/hadoop-lzo.jar com.hadoop.compression.lzo.DistributedLzoIndexer hdfs://nameservice1/user/hive/warehouse/bank.db/account_lzo/ds=2020-09-21
[root@jwldata ~]# hadoop fs -ls hdfs://nameservice1/user/hive/warehouse/bank.db/account_lzo/ds=2020-09-21                                     Found 2 items
-rwxrwx--x+  3 hive hive         83 2021-07-06 14:09 hdfs://nameservice1/user/hive/warehouse/bank.db/account_lzo/ds=2020-09-21/000000_0.lzo
-rwxrwx--x+  3 hive hive          8 2021-07-23 10:23 hdfs://nameservice1/user/hive/warehouse/bank.db/account_lzo/ds=2020-09-21/000000_0.lzo.index
[root@jwldata ~]# 
[root@jwldata ~]# hadoop fs -ls hdfs://nameservice1/user/hive/warehouse/bank.db/account_lzo/ds=2020-09-20                                     Found 2 items
-rwxrwx--x+  3 hive hive         82 2021-07-06 14:10 hdfs://nameservice1/user/hive/warehouse/bank.db/account_lzo/ds=2020-09-20/000000_0.lzo
-rwxrwx--x+  3 hive hive         83 2021-07-06 14:11 hdfs://nameservice1/user/hive/warehouse/bank.db/account_lzo/ds=2020-09-20/000000_0_copy_1.lzo

Impala查询LZO表

在使用Hive创建表和插入数据后,用impala刷新下元数据后,就可以impala查询表了。

INVALIDATE METADATA [table]

备注

  • Impala does not currently support LZO compression in Parquet files.
  • In Impala 2.0 and later, you can also use text data compressed in the gzip, bzip2, or Snappy formats. Because these compressed formats are not “splittable” in the way that LZO is, there is less opportunity for Impala to parallelize queries on them. Therefore, use these types of compressed data only for convenience if that is the format in which you receive the data. Prefer to use LZO compression for text data if you have the choice, or convert the data to Parquet using an INSERT … SELECT statement to copy the original data into a Parquet table.

参考文档

欢迎关注我的微信公众号“九万里大数据”,原创技术文章第一时间推送。
欢迎访问原创技术博客网站 jwldata.com,排版更清晰,阅读更爽快。


Hive创建LZO压缩表和往LZO表插入数据
 


本站文章,如未注明,均为原创 | 原创文章版权归九万里大数据所有,未经许可不得转载。
本文链接:Hive创建LZO压缩表和往LZO表插入数据
喜欢 (1)

您必须 登录 才能发表评论!