5000并发nginx性能优化调试记录

简介：

这段时间我负责的省人民医院his系统正式在9月31日上线。在去年这个时候，his系统已经在省人民朝晖院区上线了，经过了一年的稳定运行，终于决定在10.1在省人民朝晖院区上线。这篇文章就是记录针对朝晖5000并发的nginx性能优化。

针对nginx系统优化分为俩块，系统优化和nginx优化

系统层面优化：

1、用户级进程的打开文件数限制

查询方法：

# 查看资源硬限制数
[root@nginx2 logs]# ulimit  -Hn
65535
# 查看资源软限制数
[root@nginx2 logs]# ulimit  -Sn
65535
# 查看资源软限制数
[root@nginx2 logs]#  ulimit -n
65535

临时修改：

1	ulimit -n 65535

永久修改：

vim /etc/security/limits.conf

* soft nofile 65535
* hard nofile 65535

2、TCP最大连接数（somaxconn）

1	echo 20480 > /proc/sys/net/core/somaxconn

该值不配置默认为128，但是这样系统重启后会失效，所以需要在/etc/sysctl.conf中添加

1	net.core.somaxconn = 20480

即可

3、TCP连接立即回收、回用（recycle、reuse）

1 2	echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle

4、不做TCP洪水抵御

1	echo 0 > /proc/sys/net/ipv4/tcp_syncookies

5、最后在/etc/sysctl.con中配置信息为：

net.core.somaxconn = 20480
net.core.rmem_default = 262144
net.core.wmem_default = 262144
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 4096 16777216
net.ipv4.tcp_wmem = 4096 4096 16777216
net.ipv4.tcp_mem = 786432 2097152 3145728
net.ipv4.tcp_max_syn_backlog = 16384
net.core.netdev_max_backlog = 20000
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_max_orphans = 131072
net.ipv4.tcp_syncookies = 0

之后执行

sysctl -p

nginx层面优化：

1、nginx 进程数，建议按照cpu 数目来指定，一般为它的倍数 (如,2个四核的cpu计为8)。

1	worker_processes 8;

也可以直接写 auto;

1	worker_processes auto;

2、增加worker_connections数量

1
2
3

events {
    worker_connections 20000;
}

3、nginx有一个worker_rlimit_nofile directive，可以用来设置系统可用的文件描述符。这与ulimit设置可用文件描述符的作用是一样的。如果它们都设置了可用文件描述符，那么worker_rlimit_nofile会覆盖ulimit的设置。

1	worker_rlimit_nofile = 30000;

4、nginx长连接优化

nginx长连接——keepalive

1）、从client到nginx的连接是长连接从nginx到server的连接是长连接

保持和client的长连接：

默认情况下，nginx已经自动开启了对client连接的keep alive支持（同时client发送的HTTP请求要求keep alive）。一般场景可以直接使用，但是对于一些比较特殊的场景，还是有必要调整个别参数（keepalive_timeout和keepalive_requests）。


http {
    keepalive_timeout  120s 120s;
    keepalive_requests 10000;
}

解释：
keepalive_timeout

第一个参数：设置keep-alive客户端连接在服务器端保持开启的超时值（默认75s）；值为0会禁用keep-alive客户端连接；

第二个参数：可选、在响应的header域中设置一个值“Keep-Alive: timeout=time”；通常可以不用设置；

注：keepalive_timeout默认75s，一般情况下也够用，对于一些请求比较大的内部服务器通讯的场景，适当加大为120s或者300s；

keepalive_requests

keepalive_requests指令用于设置一个keep-alive连接上可以服务的请求的最大数量，当最大请求数量达到时，连接被关闭。默认是100。

这个参数的真实含义，是指一个keep alive建立之后，nginx就会为这个连接设置一个计数器，记录这个keep alive的长连接上已经接收并处理的客户端请求的数量。

如果达到这个参数设置的最大值时，则nginx会强行关闭这个长连接，逼迫客户端不得不重新建立新的长连接。

大多数情况下当QPS(每秒请求数)不是很高时，默认值100凑合够用。但是，对于一些QPS比较高（比如超过10000QPS，甚至达到30000,50000甚至更高) 的场景，默认的100就显得太低。

简单计算一下，QPS=10000时，客户端每秒发送10000个请求(通常建立有多个长连接)，每个连接只能最多跑100次请求，意味着平均每秒钟就会有100个长连接因此被nginx关闭。

同样意味着为了保持QPS，客户端不得不每秒中重新新建100个连接。

因此，就会发现有大量的TIME_WAIT的socket连接(即使此时keep alive已经在client和nginx之间生效)。

因此对于QPS较高的场景，非常有必要加大这个参数，以避免出现大量连接被生成再抛弃的情况，减少TIME_WAIT。

2)、保持和server的长连接：

为了让nginx和后端server（nginx称为upstream）之间保持长连接

http {
    upstream  MASTER {
        server   129.9.10.104:8080  weight=1 max_fails=2 fail_timeout=30s;
        server   129.9.10.105:8080  weight=1 max_fails=2 fail_timeout=30s;
        keepalive 300;        // 这个很重要！，下面解释一有详细说明
    }
server {
        listen 8080 default_server;
        server_name "";
        location /  {
            proxy_pass http://MASTER;
            proxy_set_header Host  $Host;
            proxy_set_header x-forwarded-for $remote_addr;
            proxy_set_header X-Real-IP $remote_addr;
            add_header Cache-Control no-store;
            add_header Pragma  no-cache;
            proxy_http_version 1.1;         // 这两个最好也设置，解释二说明
            proxy_set_header Connection "";
        }
    }
}

解释一：

upstream中的keepalive设置的作用

keepalive 这个参数一定要小心设置，尤其对于QPS比较高的场景，推荐先做一下估算，根据QPS和平均响应时间大体能计算出需要的长连接的数量。

比如前面10000 QPS和100毫秒响应时间就可以推算出需要的长连接数量大概是1000.

然后将keepalive设置为这个长连接数量的10%到30%。比较懒的同学，可以直接设置为keepalive=1000之类的，一般都OK的了

解释二：

HTTP协议中对长连接的支持是从1.1版本之后才有的，因此最好通过proxy_http_version指令设置为”1.1”；

而”Connection” header应该被清理。

清理的意思，我的理解，是清理从client过来的http header，因为即使是client和nginx之间是短连接，nginx和upstream之间也是可以开启长连接的。这种情况下必须清理来自client请求中的”Connection” header。

最后总结：

当nginx连接出现大量TIME_WAIT的情况

1）导致 nginx端出现大量TIME_WAIT的情况有两种：
keepalive_requests设置比较小，高并发下超过此值后nginx会强制关闭和客户端保持的keepalive长连接；（主动关闭连接后导致nginx出现TIME_WAIT）

keepalive设置的比较小（空闲数太小），导致高并发下nginx会频繁出现连接数震荡（超过该值会关闭连接），不停的关闭、开启和后端server保持的keepalive长连接；

2）导致后端server端出现大量TIME_WAIT的情况：

nginx没有打开和后端的长连接，即：没有设置proxy_http_version 1.1;和proxy_set_header Connection “”;从而导致后端server每次关闭连接，高并发下就会出现server端出现大量TIME_WAIT

最后我的nginx配置文件

user  nginx;
worker_processes  4;

error_log  logs/error.log;
#error_log  logs/error.log  notice;
#error_log  logs/error.log  info;

#pid        logs/nginx.pid;

worker_rlimit_nofile 65535;

events {
    worker_connections  10240;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for" $upstream_addr';

	add_header Cache-Control no-cache;

    #access_log  logs/access.log  main;
    access_log  off;
    fastcgi_connect_timeout 600;
    fastcgi_send_timeout 600;
    fastcgi_read_timeout 600;
    fastcgi_buffer_size 64k;
    fastcgi_buffers 4 64k;
    fastcgi_busy_buffers_size 128k;
    fastcgi_temp_file_write_size 128k;
    proxy_connect_timeout 600s;
    proxy_send_timeout 600s;
    proxy_read_timeout 600s;
    client_max_body_size 1000m;

    sendfile        on;
    #tcp_nopush     on;

    #keepalive_timeout  0;
    keepalive_timeout  120;
    #keepalive_timeout  65;

    gzip  on;
	gzip_min_length 1k;
	gzip_buffers 4 16k;
	gzip_comp_level 2;
	gzip_types text/plain text/css text/javascript application/x-javascript application/json application/xml;

    map $http_upgrade $connection_upgrade{
        default upgrade;
        '' close;
    }

	map $http_browser_type $browser_type {
		default 0;
        cef 1;
    }
	include trail_ip.conf;
	include heren_upstream.conf;
	include heren_upstream_trail.conf;
	include /etc/nginx/conf.d/*.conf;

	server {
        listen       80;
        server_name  localhost;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_set_header Host $host:$server_port;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header REMOTE-HOST $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

        include heren_location_page.conf;
		include heren_dict_location.conf;
		include heren_location_bill.conf;
        include heren_location_drug.conf;
        include heren_location.conf;

        location /ngx_status {
            stub_status on;
            access_log off;
            allow 127.0.0.1;
            deny all;
        }

        location / {
            root   html;
            index  index.html index.htm;
        }

        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }
    }
}

以上。

End