1)會有什么影響
(1)1個文件塊,占用namenode多大內(nèi)存150字節(jié)
1億個小文件*150字節(jié)
1 個文件塊 * 150字節(jié)
128G能存儲多少文件塊? 128 * 1024*1024*1024byte/150字節(jié) = 9億文件塊
2)怎么解決
(1)采用har歸檔方式,將小文件歸檔
hadoop archive -archiveName 20200701.har /user/hadoop/login/202007/01(源文件路徑) /user/hadoop/login/202007/01(目標文件路徑)
CREATE EXTERNAL TABLE login_har(
ldate string,
ltime string,
userid int,
name string)
PARTITIONED BY (
ym string,
d string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ' '
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://h60:9000/user/hadoop/login'
先對其父目錄建表,然后對年月日進行分區(qū)(PARTITIONED BY)
再手動修改其動態(tài)分區(qū) 即可:
alter table login_har add partition(ym='202007',d='01') LOCATION 'har:///flume/loginlog/202007/01/20200701.har';








暫無數(shù)據(jù)