Stay hungry Stay foolish

Kafka性能测试

Posted on By blue

目录



一、测试目的

使用Kafka自带的性能测试脚本,写入数据,观察效率。

使用Spark程序消费Kafka中的数据,观察效率。

关于Kafka测试脚本相关信息,可见附录。

二、测试环境

软件环境:

  • kafka_2.10-0.8.1.1

  • JDK:1.7.0_67

硬件环境:

  • 操作系统:centos 6.3 x86_64

  • CPU: 2颗 4核(Intel(R) Xeon(R) CPU E5-2609 v2 @ 2.50GHz)

  • 内存:62.9G

  • 硬盘:3.4T*12

其中,Kafak、JDK、及Kafka配置文件config/server.properties和目前31环境一致,硬件方面,31环境的CPU是2颗8核,内存为126G。

Kafka测试集群配置:

Spark测试集群配置:

三、创建Topic

在Kafka集群上,分别创建以下Topic,

创建命令:

[root@slave181 kafka_2.10-0.8.1.1]# bin/kafka-topics.sh --create --zookeeper master185:2181,master180:2181,slave181:2181 --replication-factor 1 --partitions 12 --topic testKafka_A
Created topic "testKafka_A".

[root@slave181 kafka_2.10-0.8.1.1]# bin/kafka-topics.sh --create --zookeeper master185:2181,master180:2181,slave181:2181 --replication-factor 1 --partitions 12 --topic testKafka_B
Created topic "testKafka_B".

[root@slave181 kafka_2.10-0.8.1.1]# bin/kafka-topics.sh --create --zookeeper master185:2181,master180:2181,slave181:2181 --replication-factor 1 --partitions 12 --topic testKafka_C
Created topic "testKafka_C".

[root@slave181 kafka_2.10-0.8.1.1]# bin/kafka-topics.sh --create --zookeeper master185:2181,master180:2181,slave181:2181 --replication-factor 1 --partitions 12 --topic testKafka_D
Created topic "testKafka_D".

[root@slave181 kafka_2.10-0.8.1.1]# bin/kafka-topics.sh --create --zookeeper master185:2181,master180:2181,slave181:2181 --replication-factor 1 --partitions 12 --topic testKafka_E
Created topic "testKafka_E".

[root@slave181 kafka_2.10-0.8.1.1]# bin/kafka-topics.sh --create --zookeeper master185:2181,master180:2181,slave181:2181 --replication-factor 1 --partitions 12 --topic testKafka_G
Created topic "testKafka_G".		

[root@slave186 kafka_2.10-0.8.1.1]# bin/kafka-topics.sh --create --zookeeper master185:2181,master180:2181,slave181:2181 --replication-factor 1 --partitions 48 --topic testKafka_H
Created topic "testKafka_H".

[root@slave181 kafka_2.10-0.8.1.1]# bin/kafka-topics.sh --create --zookeeper master185:2181,master180:2181,slave181:2181 --replication-factor 1 --partitions 24 --topic testKafka_I
Created topic "testKafka_I".

四、写入测试

1、1个topic、1个生产者写入测试

在slave186主机上,对testKafka_A写入1亿条数据,每条大小为700 字节(该数值选取过程见附录),

[root@slave186 kafka_2.10-0.8.1.1]# bin/kafka-producer-perf-test.sh --messages 100000000  --message-size 700  --batch-size 10000 --topics testKafka_A --threads 12 --broker-list slave181:9092,slave182:9092,slave183:9092,slave184:9092
start.time, end.time, compression, message.size, batch.size, total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
2015-10-15 11:03:54:163, 2015-10-15 11:30:05:187, 0, 700, 10000, 66757.20, 42.4928, 99999996, 63652.7488

写入结果:

2、4个topic,4个生产者写入测试

在slave186,slave187,slave188,slave189上同时对testKafka_B、testKafka_C、testKafka_D、testKafka_E进行写入,每个Topic写入25000000条数据,每条大小为700 字节

slave186:

[root@slave186 kafka_2.10-0.8.1.1]# bin/kafka-producer-perf-test.sh --messages 25000000  --message-size 700  --batch-size 10000 --topics testKafka_B --threads 12 --broker-list slave181:9092,slave182:9092,slave183:9092,slave184:9092
start.time, end.time, compression, message.size, batch.size, total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
2015-10-15 11:44:02:873, 2015-10-15 11:50:38:527, 0, 700, 10000, 16689.30, 42.1815, 24999996, 63186.5114

slave187:

[root@slave187 kafka_2.10-0.8.1.1]# bin/kafka-producer-perf-test.sh --messages 25000000  --message-size 700  --batch-size 10000 --topics testKafka_C --threads 12 --broker-list slave181:9092,slave182:9092,slave183:9092,slave184:9092
start.time, end.time, compression, message.size, batch.size, total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
2015-10-15 11:44:04:370, 2015-10-15 11:51:07:910, 0, 700, 10000, 16689.30, 39.4043, 24999996, 59026.2927

slave188:

[root@slave188 kafka_2.10-0.8.1.1]# bin/kafka-producer-perf-test.sh --messages 25000000  --message-size 700  --batch-size 10000 --topics testKafka_D --threads 12 --broker-list slave181:9092,slave182:9092,slave183:9092,slave184:9092
start.time, end.time, compression, message.size, batch.size, total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
2015-10-15 11:44:05:428, 2015-10-15 11:50:50:615, 0, 700, 10000, 16689.30, 41.1891, 24999996, 61699.8966

slave189:

[root@slave189 kafka_2.10-0.8.1.1]# bin/kafka-producer-perf-test.sh --messages 25000000  --message-size 700  --batch-size 10000 --topics testKafka_E --threads 12 --broker-list slave181:9092,slave182:9092,slave183:9092,slave184:9092
start.time, end.time, compression, message.size, batch.size, total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
2015-10-15 11:44:06:545, 2015-10-15 11:51:09:299, 0, 700, 10000, 16689.30, 39.4776, 24999996, 59136.0366

写入结果:

3、1个Topic,4个生产者写入测试 (Topic 12个分区)

在slave186,slave187,slave188,slave189上同时对testKafka_G进行写入,每个生产者写入25000000条数据,每条大小为700 字节

slave186:

[root@slave186 kafka_2.10-0.8.1.1]# bin/kafka-producer-perf-test.sh --messages 25000000  --message-size 700  --batch-size 10000 --topics testKafka_G --threads 12 --broker-list slave181:9092,slave182:9092,slave183:9092,slave184:9092
start.time, end.time, compression, message.size, batch.size, total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
2015-10-15 12:35:39:578, 2015-10-15 12:42:36:298, 0, 700, 10000, 16689.30, 40.0492, 24999996, 59992.3114

slave187:

[root@slave187 kafka_2.10-0.8.1.1]# bin/kafka-producer-perf-test.sh --messages 25000000  --message-size 700  --batch-size 10000 --topics testKafka_G --threads 12 --broker-list slave181:9092,slave182:9092,slave183:9092,slave184:9092
start.time, end.time, compression, message.size, batch.size, total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
2015-10-15 12:35:40:946, 2015-10-15 12:42:59:950, 0, 700, 10000, 16689.30, 38.0163, 24999996, 56947.0802

slave188:

[root@slave188 kafka_2.10-0.8.1.1]# bin/kafka-producer-perf-test.sh --messages 25000000  --message-size 700  --batch-size 10000 --topics testKafka_G --threads 12 --broker-list slave181:9092,slave182:9092,slave183:9092,slave184:9092
start.time, end.time, compression, message.size, batch.size, total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
2015-10-15 12:35:42:180, 2015-10-15 12:42:40:544, 0, 700, 10000, 16689.30, 39.8918, 24999996, 59756.5661

slave189:

[root@slave189 kafka_2.10-0.8.1.1]# bin/kafka-producer-perf-test.sh --messages 25000000  --message-size 700  --batch-size 10000 --topics testKafka_G --threads 12 --broker-list slave181:9092,slave182:9092,slave183:9092,slave184:9092
2015-10-15 12:35:43:420, 2015-10-15 12:43:06:148, 0, 700, 10000, 16689.30, 37.6965, 24999996, 56468.0707

写入结果:

4、1个Topic,4个生产者写入测试 (Topic 24个分区)

在slave186,slave187,slave188,slave189上同时对testKafka_I进行写入,每个生产者写入25000000条数据,每条大小为700 字节

slave186:

[root@slave186 kafka_2.10-0.8.1.1]# bin/kafka-producer-perf-test.sh --messages 25000000  --message-size 700  --batch-size 10000 --topics testKafka_I --threads 12 --broker-list slave181:9092,slave182:9092,slave183:9092,slave184:9092
start.time, end.time, compression, message.size, batch.size, total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
2015-10-15 14:01:15:964, 2015-10-15 14:08:40:376, 0, 700, 10000, 16689.30, 37.5537, 24999996, 56254.0975

slave187:

[root@slave187 kafka_2.10-0.8.1.1]# bin/kafka-producer-perf-test.sh --messages 25000000  --message-size 700  --batch-size 10000 --topics testKafka_I --threads 12 --broker-list slave181:9092,slave182:9092,slave183:9092,slave184:9092
start.time, end.time, compression, message.size, batch.size, total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
2015-10-15 14:01:17:025, 2015-10-15 14:09:26:515, 0, 700, 10000, 16689.30, 34.0953, 24999996, 51073.5582

slave188:

[root@slave188 kafka_2.10-0.8.1.1]# bin/kafka-producer-perf-test.sh --messages 25000000  --message-size 700  --batch-size 10000 --topics testKafka_I --threads 12 --broker-list slave181:9092,slave182:9092,slave183:9092,slave184:9092
start.time, end.time, compression, message.size, batch.size, total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
2015-10-15 14:01:17:925, 2015-10-15 14:08:59:764, 0, 700, 10000, 16689.30, 36.1366, 24999996, 54131.4094

slave189:

[root@slave189 kafka_2.10-0.8.1.1]# bin/kafka-producer-perf-test.sh --messages 25000000  --message-size 700  --batch-size 10000 --topics testKafka_I --threads 12 --broker-list slave181:9092,slave182:9092,slave183:9092,slave184:9092
start.time, end.time, compression, message.size, batch.size, total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
2015-10-15 14:01:18:635, 2015-10-15 14:09:16:365, 0, 700, 10000, 16689.30, 34.9346, 24999996, 52330.8061

写入结果:

5、一个Topic,4个生产者写入测试 (Topic 48个分区)

在slave186,slave187,slave188,slave189上同时对testKafka_H进行写入,每个生产者写入25000000条数据,每条大小为700 字节

slave186:

[root@slave186 kafka_2.10-0.8.1.1]# bin/kafka-producer-perf-test.sh --messages 25000000  --message-size 700  --batch-size 10000 --topics testKafka_H --threads 12 --broker-list slave181:9092,slave182:9092,slave183:9092,slave184:9092
start.time, end.time, compression, message.size, batch.size, total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
2015-10-15 12:49:10:340, 2015-10-15 12:58:32:970, 0, 700, 10000, 16689.30, 29.6630, 24999996, 44434.1681

slave187:

[root@slave187 kafka_2.10-0.8.1.1]# bin/kafka-producer-perf-test.sh --messages 25000000  --message-size 700  --batch-size 10000 --topics testKafka_H --threads 12 --broker-list slave181:9092,slave182:9092,slave183:9092,slave184:9092
start.time, end.time, compression, message.size, batch.size, total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
2015-10-15 12:49:08:780, 2015-10-15 12:59:06:129, 0, 700, 10000, 16689.30, 27.9389, 24999996, 41851.5742

slave188:

[root@slave188 kafka_2.10-0.8.1.1]# bin/kafka-producer-perf-test.sh --messages 25000000  --message-size 700  --batch-size 10000 --topics testKafka_H --threads 12 --broker-list slave181:9092,slave182:9092,slave183:9092,slave184:9092
start.time, end.time, compression, message.size, batch.size, total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
2015-10-15 12:49:07:105, 2015-10-15 12:58:40:283, 0, 700, 10000, 16689.30, 29.1171, 24999996, 43616.4612

slave189:

[root@slave189 kafka_2.10-0.8.1.1]# bin/kafka-producer-perf-test.sh --messages 25000000  --message-size 700  --batch-size 10000 --topics testKafka_H --threads 12 --broker-list slave181:9092,slave182:9092,slave183:9092,slave184:9092
start.time, end.time, compression, message.size, batch.size, total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
2015-10-15 12:49:05:769, 2015-10-15 12:58:55:336, 0, 700, 10000, 16689.30, 28.3077, 24999996, 42403.9948

写入结果:

6、写入压力测试

在6台主机上,同时对Topic:testKafka_G(12个分区)进行写入操作,每个生产者写入1亿条数据

slave186:

[root@slave186 kafka_2.10-0.8.1.1]# bin/kafka-producer-perf-test.sh --messages 100000000  --message-size 700  --batch-size 10000 --topics testKafka_G --threads 12 --broker-list slave181:9092,slave182:9092,slave183:9092,slave184:9092
start.time, end.time, compression, message.size, batch.size, total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
2015-10-15 15:20:28:535, 2015-10-15 15:47:43:588, 0, 700, 10000, 66757.20, 40.8288, 99999996, 61160.0945

slave187:

[root@slave187 kafka_2.10-0.8.1.1]# bin/kafka-producer-perf-test.sh --messages 100000000  --message-size 700  --batch-size 10000 --topics testKafka_G --threads 12 --broker-list slave181:9092,slave182:9092,slave183:9092,slave184:9092
start.time, end.time, compression, message.size, batch.size, total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
2015-10-15 15:20:26:500, 2015-10-15 15:49:21:028, 0, 700, 10000, 66757.20, 38.4872, 99999996, 57652.5695

slave188:

[root@slave188 kafka_2.10-0.8.1.1]# bin/kafka-producer-perf-test.sh --messages 100000000  --message-size 700  --batch-size 10000 --topics testKafka_G --threads 12 --broker-list slave181:9092,slave182:9092,slave183:9092,slave184:9092
start.time, end.time, compression, message.size, batch.size, total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
2015-10-15 15:20:24:390, 2015-10-15 15:48:42:076, 0, 700, 10000, 66757.20, 39.3225, 99999996, 58903.7054

slave189:

[root@slave189 kafka_2.10-0.8.1.1]# bin/kafka-producer-perf-test.sh --messages 100000000  --message-size 700  --batch-size 10000 --topics testKafka_G --threads 12 --broker-list slave181:9092,slave182:9092,slave183:9092,slave184:9092
start.time, end.time, compression, message.size, batch.size, total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
2015-10-15 15:20:22:960, 2015-10-15 15:50:31:239, 0, 700, 10000, 66757.20, 36.9175, 99999996, 55301.1985

master180:

[root@master180 kafka_2.10-0.8.1.1]# bin/kafka-producer-perf-test.sh --messages 100000000  --message-size 700  --batch-size 10000 --topics testKafka_G --threads 12 --broker-list slave181:9092,slave182:9092,slave183:9092,slave184:9092
start.time, end.time, compression, message.size, batch.size, total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
2015-10-15 15:20:19:769, 2015-10-15 15:48:24:705, 0, 700, 10000, 66757.20, 39.6200, 99999996, 59349.4329

master185:

[root@master185 kafka_2.10-0.8.1.1]# bin/kafka-producer-perf-test.sh --messages 100000000  --message-size 700  --batch-size 10000 --topics testKafka_G --threads 12 --broker-list slave181:9092,slave182:9092,slave183:9092,slave184:9092
start.time, end.time, compression, message.size, batch.size, total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
2015-10-15 15:20:21:791, 2015-10-15 15:50:04:557, 0, 700, 10000, 66757.20, 37.4459, 99999996, 56092.6089

汇总以上测试场景,

五、读取测试

由于Kafka提供的消费测试脚本不稳定,故采用Spark来消费数据测试。

以下两个Spark消费程序,运行分配的资源一致。

[root@master185 blue]# spark-submit --class com.fastweb.speed_count.SpeedApp --master spark://master185:7077 --driver-memory 2G --driver-cores 2  --executor-memory 5G   --total-executor-cores 45  speed_count-0.0.4-SNAPSHOT.jar

1、消费1个Topic

在Spark程序中,消费1个Topic(12个分区),并发16个Receiver线程拉取数据,将消费的数据存入到HDFS中。

以下是作业运行完成后的Stage:

2、同时消费4个Topic

在Spark程序中,同时消费4个Topic(每个Topic有12个分区),每个Topic个并发4个Receiver线程,合计16个Receiver线程拉取数据,将消费的数据存入到HDFS中。

以下是作业运行的Stage:

根据以上信息,进行统计:

六、测试结论

写入

  • 当生产者数量是一致的情况下,写一个Topic,或者同时写多个Topic的效率相差不大

  • Topic中分区数量增加,写入效率降低

  • 在集群中的网络、IO没有达到瓶颈时,生产者增加,写入效率增加

读取

  • Spark消费多个Topic,意味着消费更多的Kafka分区,Spark读取速度更快

另外,在Kafka集群写入、读取时,观察硬件资源使用情况,发现其主要消耗CPU、IO、网络带宽,对内存消耗不大


七、 附录

1、Apache kafka性能测试命令使用和构建kafka-perf

参考网址: http://www.aboutyun.com/thread-9905-1-1.html

在测试过程中,写入的参数是1亿条,脚本执行完成后,但每次都少写4条,该原因有待研究。

2、计算每条日志大小

取Impala中kssws表,导出100条日志数据放入文本文件中,

[root@slave30 ~]# impala-shell -o kssws_log_100.txt
[slave30:21000] > select log_content  from kssws where month_='201510' and day_='14' limit 100;
[slave30:21000] > 
[2]+  Stopped                 impala-shell -o kssws_log_100.txt

使用总大小除以100,

-rw-r--r--  1 root root   61672 Oct 14 22:46 kssws_log_100.txt

向上取整,得出700字节