首页 技术 正文
技术 2022年11月15日
0 收藏 448 点赞 3,390 浏览 4799 个字

在HDFS的文件默认生成文件大小1K,如何设置文件大小和数量

拷贝一份flume-conf.properties.template改名为hive-mem-size.properties
hive-mem-size.properties
  a1.sources = s1
  a1.channels = c1
  a1.sinks = k1
  # defined the source
  a1.sources.s1.type = exec
  a1.sources.s1.command = tail -F /opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/logs/hive.log
  a1.sources.s1.shell = /bin/sh -c
  # defined the channel
  a1.channels.c1.type = memory
  a1.channels.c1.capacity = 1000
  a1.channels.c1.transactionCapacity = 1000
  # defined the sink
  a1.sinks.k1.type = hdfs
  a1.sinks.k1.hdfs.path = /flume/hdfs/
  a1.sinks.k1.hdfs.fileType = DataStream
  a1.sinks.k1.hdfs.rollInterval = 0 # 依据时间进行roll,设置为0表示不启用
  a1.sinks.k1.hdfs.rollSize = 10240 # 依据大小进行roll,设置为10240表示文件大小在10k左右
  a1.sinks.k1.hdfs.rollCount = 0 # 依据event数目进行roll,设置为0表示不启用
  # The channel can be defined as follows.
  a1.sources.s1.channels = c1
  a1.sinks.k1.channel = c1
flmue目录下执行
  bin/flume-ng agent -c conf/ -n a1 -f conf/hive-mem-size.properties -Dflume.root.logger=INFO,console

使用Flume是为了将最新的数据或文件上传到HDFS上,那如果遇到分区表该如何解决

拷贝一份flume-conf.properties.template改名为hive-mem-part.properties
hive-mem-part.properties
  a1.sources = s1
  a1.channels = c1
  a1.sinks = k1
  # defined the source
  a1.sources.s1.type = exec
  a1.sources.s1.command = tail -F /opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/logs/hive.log
  a1.sources.s1.shell = /bin/sh -c
  # defined the channel
  a1.channels.c1.type = memory
  a1.channels.c1.capacity = 1000
  a1.channels.c1.transactionCapacity = 1000
  # defined the sink
  a1.sinks.k1.type = hdfs
  a1.sinks.k1.hdfs.useLocalTimeStamp = true # 注意使用时间时,本地时间戳设置为true
  a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H-%M/
  a1.sinks.k1.hdfs.fileType = DataStream
  # The channel can be defined as follows.
  a1.sources.s1.channels = c1
  a1.sinks.k1.channel = c1
flmue目录下执行
  bin/flume-ng agent -c conf/ -n a1 -f conf/hive-mem-part.properties -Dflume.root.logger=INFO,console
  这里与上面的文件大小有冲突,即设置了时间分区,肯定不能在特定时间内满足文件大小

Flume上传文件默认是以FlumeData开头,如何更改开头信息

拷贝一份flume-conf.properties.template改名为hive-mem-pre.properties
hive-mem-pre.properties
  a1.sources = s1
  a1.channels = c1
  a1.sinks = k1
  # defined the source
  a1.sources.s1.type = exec
  a1.sources.s1.command = tail -F /opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/logs/hive.log
  a1.sources.s1.shell = /bin/sh -c
  # defined the channel
  a1.channels.c1.type = memory
  a1.channels.c1.capacity = 1000
  a1.channels.c1.transactionCapacity = 1000
  # defined the sink
  a1.sinks.k1.type = hdfs
  a1.sinks.k1.hdfs.useLocalTimeStamp = true # 注意使用时间时,本地时间戳设置为true
  a1.sinks.k1.hdfs.filePrefix = hive-log
  a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H-%M/
  a1.sinks.k1.hdfs.fileType = DataStream
  # The channel can be defined as follows.
  a1.sources.s1.channels = c1
  a1.sinks.k1.channel = c1
flmue目录下执行
  bin/flume-ng agent -c conf/ -n a1 -f conf/hive-mem-pre.properties -Dflume.root.logger=INFO,console

企业中多台Flume如何解决磁盘IO问题

启动一个hadoop集群(官方图示为4台,这里使用三台),分别部署和配置flume机器
  hadoop09-linux-01.ibeifeng.com 10.0.0.108 collenct
  hadoop09-linux-02.ibeifeng.com 10.0.0.109 agent
  hadoop09-linux-03.ibeifeng.com 10.0.0.110 agent
选择一个agent,进入flume目录
拷贝一份flume-conf.properties.template改名为avro-agent-hive-file-hdfs.properties
avro-agent-hive-file-hdfs.properties
  a1.sources = s1
  a1.channels = c1
  a1.sinks = k1
  # defined the source
  a1.sources.s1.type = exec
  a1.sources.s1.command = tail -F /opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/logs/hive.log
  a1.sources.s1.shell = /bin/sh -c
  # defined the channel
  a1.channels.c1.type = memory
  a1.channels.c1.capacity = 1000
  a1.channels.c1.transactionCapacity = 1000
  # defined the sink
  a1.sinks.k1.type = avro
  a1.sinks.k1.hostname = hadoop09-linux-01.ibeifeng.com # 接收方的IP或hostname
  a1.sinks.k1.port = 50505
  # The channel can be defined as follows.
  a1.sources.s1.channels = c1
  a1.sinks.k1.channel = c1
  scp发送到另一台agent
  scp conf/avro-agent-hive-file-hdfs.properties hadoop09-linux-03.ibeifeng.com:/opt/cdh-5.6.3/apache-flume-1.5.0-cdh5.3.6-bin/conf/
  进入collenct机器下的flume下
  拷贝一份flume-conf.properties.template改名为avro-collenct-hive-file-hdfs.properties
  a1.sources = s1
  a1.channels = c1
  a1.sinks = k1
  # defined the source
  a1.sources.s1.type = avro
  a1.sources.s1.bind = hadoop09-linux-01.ibeifeng.com
  a1.sources.s1.port = 50505
  a1.sources.s1.
  # defined the channel
  a1.channels.c1.type = memory
  a1.channels.c1.capacity = 1000
  a1.channels.c1.transactionCapacity = 1000
  # defined the sink
  a1.sinks.k1.type = hdfs
  a1.sinks.k1.hdfs.filePrefix = avro
  a1.sinks.k1.hdfs.useLocalTimeStamp = true
  a1.sinks.k1.hdfs.path = /flume/hdfs
  a1.sinks.k1.hdfs.fileType = DataStream
  a1.sinks.k1.hdfs.rollInterval = 0
  a1.sinks.k1.hdfs.rollSize = 20480
  a1.sinks.k1.hdfs.rollCount = 0
  # The channel can be defined as follows.
  a1.sources.s1.channels = c1
  a1.sinks.k1.channel = c1
启动rpcbind服务
  再分别启动:
    bin/flume-ng agent -c conf/ -n a1 -f conf/avro-collenct-hive-file-hdfs.properties -Dflume.root.logger=INFO,console
    bin/flume-ng agent -c conf/ -n a1 -f conf/avro-agent-hive-file-hdfs.properties -Dflume.root.logger=INFO,console
    bin/flume-ng agent -c conf/ -n a1 -f conf/avro-agent-hive-file-hdfs.properties -Dflume.root.logger=INFO,console
测试

如何解决不同操作系统下Flume

搭建nfs服务器,挂载不同系统中的目录,直接使用
相关推荐
python开发_常用的python模块及安装方法
adodb:我们领导推荐的数据库连接组件bsddb3:BerkeleyDB的连接组件Cheetah-1.0:我比较喜欢这个版本的cheeta…
日期:2022-11-24 点赞:878 阅读:8,942
Educational Codeforces Round 11 C. Hard Process 二分
C. Hard Process题目连接:http://www.codeforces.com/contest/660/problem/CDes…
日期:2022-11-24 点赞:807 阅读:5,466
下载Ubuntn 17.04 内核源代码
zengkefu@server1:/usr/src$ uname -aLinux server1 4.10.0-19-generic #21…
日期:2022-11-24 点赞:569 阅读:6,281
可用Active Desktop Calendar V7.86 注册码序列号
可用Active Desktop Calendar V7.86 注册码序列号Name: www.greendown.cn Code: &nb…
日期:2022-11-24 点赞:733 阅读:6,095
Android调用系统相机、自定义相机、处理大图片
Android调用系统相机和自定义相机实例本博文主要是介绍了android上使用相机进行拍照并显示的两种方式,并且由于涉及到要把拍到的照片显…
日期:2022-11-24 点赞:512 阅读:7,728
Struts的使用
一、Struts2的获取  Struts的官方网站为:http://struts.apache.org/  下载完Struts2的jar包,…
日期:2022-11-24 点赞:671 阅读:4,765