HTTP压测工具wrk的实现原理

Published: 2021-02-27, Last Updated: 2025-02-26

注：本文分析的是 wrk v4.1.0 的源码。

wrk 是一个用 C 实现的 HTTP 压测工具，所有的参数都是通过命令行传递，没有配置文件，很容易使用；编译产物只有一个二进制文件，部署简单。

它的运行参数只有几个：

$ ./wrk
Usage: wrk <options> <url>
  Options:
    -c, --connections <N>  Connections to keep open
    -d, --duration    <T>  Duration of test
    -t, --threads     <N>  Number of threads to use

    -s, --script      <S>  Load Lua script file
    -H, --header      <H>  Add header to request
        --latency          Print latency statistics
        --timeout     <T>  Socket/request timeout
    -v, --version          Print version details

  Numeric arguments may include a SI unit (1k, 1M, 1G)
  Time arguments may include a time unit (2s, 2m, 2h)

另一方面， wrk 还支持通过 LuaJit 来定制每个测试用例，这点比 ab 强大。

与 Apache JMeter 大而全的功能相比， wrk 的统计数据简单了点，只有 Latency 和 QPS 两项（其中的 Stdev 是 standard deviation 的简写，即标准方差），也无法按照时间的推进看到整个曲线。

一次 wrk 运行的效果如下：

./wrk -t 2 -d 10s http://localhost:8000
Running 10s test @ http://localhost:8000
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    24.86ms   17.36ms  72.64ms   60.35%
    Req/Sec   207.41     21.92   262.00     70.00%
  4135 requests in 10.01s, 20.49MB read
Requests/sec:    413.02
Transfer/sec:      2.05MB

了解了 wrk 的基本使用之后，接下来我们来看一下 wrk 内部的实现原理。

wrk 内部采用多线程结合IO多路复用的模型，整体如下。

首先，每个线程有一个 epoll 来处理非阻塞的网络事件，主要有以下几个函数来负责处理：

connect_socket
socket_connected
socket_readable
socket_writable

其中 connect_socket 这个函数需要特别注意，它的功能是连接 HTTP 服务器。但它有一个问题，那就是在 thread_main 线程入口中把当前线程的所有连接都一次性创建好，多个压测请求会复用同一个 TCP 连接。

thread_main 相关代码片段如下：

    for (uint64_t i = 0; i < thread->connections; i++, c++) {
        c->thread = thread;
        c->ssl     = cfg.ctx ? SSL_new(cfg.ctx) : NULL;
        c->request = request;
        c->length  = length;
        c->delayed = cfg.delay;
        connect_socket(thread, c);
    }

通过 tcpdump 抓包并使用 Wireshark 查看，也能确认这两点。

除了网络相关操作外，每个线程还有一个定时器（ record_rate() 函数）用于把自己的数据记录到全局的统计数据中（ stats_record() 函数）。

统计数据结构基于数组的哈希表来设计，数组的下标为统计指标数值，值为它出现的次数。

相关结构体定义：

static struct {
    stats *latency;
    stats *requests;
} statistics;

typedef struct {
    uint64_t count;  // data 中被使用的
    uint64_t limit;
    uint64_t min;
    uint64_t max;
    uint64_t data[]; // 元素个数是 limit
} stats;

比如当 data 用于表示 QPS 时，其中的数据是：

index(QPS):     0    1    2    3    4    5
value(count): | 0 | 12 | 44 | 90 | 29 | 42 |

那么平均 QPS 就是 (1*12+2*44+3*90+4*29+5*42)/(12+44+90+29+42)=3.21

由于整个程序是多线程的，而 statistics 又是全局变量，因此需要有顺序更新机制来保证多线程的顺序访问。这里 wrk 采用 CAS 方式，而不是直接用锁，粒度更细：

__sync_fetch_and_add 增加对应下标的计数；
__sync_val_compare_and_swap 更细当前下标的 min max 边界，如果由于并发导致更新失败，会一直尝试直到成功；

总的来说， wrk 代码写得不错，简洁易读。功能上， wrk 具有简单易用等优点，但同时也要注意，它的所有连接都是在启动时建立的，压测的 HTTP 请求会复用 TCP 连接，与真实的用户场景可能不一样；另外它的统计数据也比较简单，缺少时间维度，这样就绘制不了按时间推移的曲线，不直观。

C Performance