-
为什么会出现随机 504,408 错误代码?(一)
问题描述
这是一个我们系统测试人员对我们的微服务请求的 Server 端做模拟断网的时候,发生的一种异常处理的情况。 由于不是在 overload 情况下,所以如果微服务请求的 Server 断网,正常处理逻辑,我们的微服务会回复 500 错误代码给微服务的请求端。但是测试人员发现,这种情况下,我们的微服务会出现回复 504,408 的错误代码给请求端的异常情况。
问题分析
我们的测试人员提供了下面可参考信息:
istio 回复 408,504 的具体错误信息 408 downstream duration timeout
和 504 upstream request timeout
,以及提供了 ss 命令的监控结果。
Mon May 31 06:20:36 UTC 2021
tcp ESTAB 0 5809 192.168.209.20:41914 10.139.17.19:opsmessaging
...
Mon May 31 06:22:12 UTC 2021
tcp ESTAB 0 55766 192.168.209.20:41914 10.139.17.19:opsmessaging
Mon May 31 06:22:17 UTC 2021
tcp ESTAB 0 57413 192.168.209.20:41914 10.139.17.19:opsmessaging
Mon May 31 06:22:22 UTC 2021
tcp ESTAB 0 57413 192.168.209.20:41914 10.139.17.19:opsmessaging
192.168.209.20 这个 IP 是我们的微服务的地址,10.139.17.19 则是断网的那个 Server 的地址。由于看到这个 tcp 连接是建立的 (ESTAB),所以看到的 SEND-Q 那么大的数值,表述我们发送的但是没有收到 ACK 的数据包。
问题复现
部署两个虚拟机
具体如何部署,参考我之前的这篇文章 本地 ubuntu18.04 VM 安装 k8s。
目前我是两台虚拟机,下面测试情况下 192.168.56.21 为服务端,192.168.56.22 为客户端。
编写测试程序
服务端程序
package main
import (
"fmt"
"log"
"net/http"
//"time"
"golang.org/x/net/http2"
"golang.org/x/net/http2/h2c"
)
func main() {
handlerGet := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
fmt.Println("receive a new request")
w.WriteHeader(http.StatusNoContent)
})
mux := http.NewServeMux()
mux.Handle("/", handlerGet)
h2s := &http2.Server{}
h1s := &http.Server{
Addr: ":10001",
Handler: h2c.NewHandler(mux, h2s),
}
log.Fatal(h1s.ListenAndServe())
}
客户端程序
package main
import (
"bytes"
"strings"
//"context"
"bufio"
"crypto/tls"
"log"
"net"
"net/http"
"os"
"strconv"
"time"
"golang.org/x/net/http2"
)
const uri = "http://192.168.56.21:10001"
func createHTTPClient() *http.Client {
var tr *http2.Transport
tr = &http2.Transport{
TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
AllowHTTP: true,
DialTLS: func(network, addr string, cfg *tls.Config) (net.Conn, error) {
d := &net.Dialer{
Timeout: time.Duration(2 * time.Second),
}
return d.Dial(network, addr)
},
//ReadIdleTimeout: time.Duration(15 * time.Second),
}
return &http.Client{
Transport: tr,
Timeout: time.Duration(2 * time.Second)}
}
func main() {
cli := createHTTPClient()
reader := bufio.NewReader(os.Stdin)
for {
text, _ := reader.ReadString('\n')
text = strings.Replace(text, "\n", "", -1)
if text == "" {
continue
}
log.Println("input text ", text)
body := []byte(text)
req, _ := http.NewRequest("POST", uri, bytes.NewBuffer(body))
if len(body) > 0 {
req.Header.Set("Content-Type", "application/json")
req.Header.Set("Content-Length", strconv.Itoa(len(body)))
}
log.Println("start time", time.Now())
_, err := cli.Do(req)
log.Println("end time", time.Now())
if err != nil {
log.Println("error", err)
}
}
}
模拟测试
预备操作
保持网络正常情况下:
- 开启服务端程序;
- 开启 ss 命令监控 socket 状态;
- 开启 tcpdump 查看数据包传输情况(由于 Server 端网络后面会被关闭,所以在 client 端监听)。
- 运行客户端程序,并开启
GODEBUG=http2debug=2
客户端输出日志如下:
shenjie@shenjie-VirtualBox:~/go/src/mockhttp2client$ go run client.go
2021/06/03 12:49:33 input text abc
2021/06/03 12:49:33 start time 2021-06-03 12:49:33.2222355 +0800 CST m=+3.233222828
2021/06/03 12:49:33 http2: Transport creating client conn 0xc00010a000 to 192.168.56.21:10001
2021/06/03 12:49:33 http2: Framer 0xc00011a000: wrote SETTINGS len=18, settings: ENABLE_PUSH=0, INITIAL_WINDOW_SIZE=4194304, MAX_HEADER_LIST_SIZE=10485760
2021/06/03 12:49:33 http2: Framer 0xc00011a000: wrote WINDOW_UPDATE len=4 (conn) incr=1073741824
2021/06/03 12:49:33 http2: Transport encoding header ":authority" = "192.168.56.21:10001"
2021/06/03 12:49:33 http2: Transport encoding header ":method" = "POST"
2021/06/03 12:49:33 http2: Transport encoding header ":path" = "/"
2021/06/03 12:49:33 http2: Transport encoding header ":scheme" = "http"
2021/06/03 12:49:33 http2: Transport encoding header "content-type" = "application/json"
2021/06/03 12:49:33 http2: Transport encoding header "content-length" = "3"
2021/06/03 12:49:33 http2: Transport encoding header "accept-encoding" = "gzip"
2021/06/03 12:49:33 http2: Transport encoding header "user-agent" = "Go-http-client/2.0"
2021/06/03 12:49:33 http2: Framer 0xc00011a000: wrote HEADERS flags=END_HEADERS stream=3 len=55
2021/06/03 12:49:33 http2: Framer 0xc00011a000: wrote DATA flags=END_STREAM stream=3 len=3 data="abc"
2021/06/03 12:49:33 http2: Framer 0xc00011a000: read SETTINGS len=24, settings: MAX_FRAME_SIZE=1048576, MAX_CONCURRENT_STREAMS=250, MAX_HEADER_LIST_SIZE=1048896, INITIAL_WINDOW_SIZE=1048576
2021/06/03 12:49:33 http2: Transport received SETTINGS len=24, settings: MAX_FRAME_SIZE=1048576, MAX_CONCURRENT_STREAMS=250, MAX_HEADER_LIST_SIZE=1048896, INITIAL_WINDOW_SIZE=1048576
2021/06/03 12:49:33 http2: Framer 0xc00011a000: wrote SETTINGS flags=ACK len=0
2021/06/03 12:49:33 http2: Framer 0xc00011a000: read SETTINGS flags=ACK len=0
2021/06/03 12:49:33 http2: Transport received SETTINGS flags=ACK len=0
2021/06/03 12:49:33 http2: Framer 0xc00011a000: read WINDOW_UPDATE len=4 (conn) incr=983041
2021/06/03 12:49:33 http2: Transport received WINDOW_UPDATE len=4 (conn) incr=983041
2021/06/03 12:49:33 http2: Framer 0xc00011a000: read HEADERS flags=END_STREAM|END_HEADERS stream=3 len=26
2021/06/03 12:49:33 http2: decoded hpack field header field ":status" = "204"
2021/06/03 12:49:33 http2: decoded hpack field header field "date" = "Thu, 03 Jun 2021 04:49:34 GMT"
2021/06/03 12:49:33 http2: Transport received HEADERS flags=END_STREAM|END_HEADERS stream=3 len=26
2021/06/03 12:49:33 http2: Framer 0xc00011a000: read WINDOW_UPDATE len=4 (conn) incr=3
2021/06/03 12:49:33 http2: Transport received WINDOW_UPDATE len=4 (conn) incr=3
2021/06/03 12:49:33 end time 2021-06-03 12:49:33.225610556 +0800 CST m=+3.236597863
ss 命令输出结果:
shenjie@shenjie-VirtualBox:~$ while true;do date;ss|grep 10001;sleep 5;done
2021 年 06 月 03 日 星期四 12:49:24 CST
2021 年 06 月 03 日 星期四 12:49:29 CST
2021 年 06 月 03 日 星期四 12:49:34 CST
tcp ESTAB 0 0 192.168.56.22:47238 192.168.56.21:10001
tcpdump 命令显示建立连接信息:
12:49:33.222612 IP shenjie-VirtualBox.47238 > 192.168.56.21.10001: Flags [S], seq 1563218660, win 64240, options [mss 1460,sackOK,TS val 1794053933 ecr 0,nop,wscale 7], length 0
12:49:33.223026 IP 192.168.56.21.10001 > shenjie-VirtualBox.47238: Flags [S.], seq 1065696065, ack 1563218661, win 65160, options [mss 1460,sackOK,TS val 292741325 ecr 1794053933,nop,wscale 7], length 0
12:49:33.223074 IP shenjie-VirtualBox.47238 > 192.168.56.21.10001: Flags [.], ack 1, win 502, options [nop,nop,TS val 1794053934 ecr 292741325], length 0
12:49:33.223642 IP shenjie-VirtualBox.47238 > 192.168.56.21.10001: Flags [P.], seq 1:65, ack 1, win 502, options [nop,nop,TS val 1794053935 ecr 292741325], length 64
12:49:33.223934 IP 192.168.56.21.10001 > shenjie-VirtualBox.47238: Flags [.], ack 65, win 509, options [nop,nop,TS val 292741326 ecr 1794053935], length 0
12:49:33.224130 IP shenjie-VirtualBox.47238 > 192.168.56.21.10001: Flags [P.], seq 65:129, ack 1, win 502, options [nop,nop,TS val 1794053935 ecr 292741326], length 64
12:49:33.224172 IP 192.168.56.21.10001 > shenjie-VirtualBox.47238: Flags [P.], seq 1:34, ack 65, win 509, options [nop,nop,TS val 292741326 ecr 1794053935], length 33
12:49:33.224186 IP shenjie-VirtualBox.47238 > 192.168.56.21.10001: Flags [.], ack 34, win 502, options [nop,nop,TS val 1794053935 ecr 292741326], length 0
12:49:33.224326 IP shenjie-VirtualBox.47238 > 192.168.56.21.10001: Flags [P.], seq 129:141, ack 34, win 502, options [nop,nop,TS val 1794053935 ecr 292741326], length 12
12:49:33.224335 IP 192.168.56.21.10001 > shenjie-VirtualBox.47238: Flags [P.], seq 34:56, ack 65, win 509, options [nop,nop,TS val 292741326 ecr 1794053935], length 22
12:49:33.224336 IP 192.168.56.21.10001 > shenjie-VirtualBox.47238: Flags [.], ack 129, win 509, options [nop,nop,TS val 292741326 ecr 1794053935], length 0
12:49:33.224698 IP 192.168.56.21.10001 > shenjie-VirtualBox.47238: Flags [P.], seq 56:104, ack 141, win 509, options [nop,nop,TS val 292741327 ecr 1794053935], length 48
模拟 Server 断网
-
在 Server 机器上 输入该命令进行断网
sudo ifdown enp0s8
-
客户端程序发送消息
此时我们可以看到如下日志:
客户端程序输出下面日志(由于我们设置了 Timeout=2s, 所以可以看到后面我们的 Stream 被 cancel 了,但是这个 Frame 已经写了,从 ss 命令输出结果中也能看到):
2021/06/03 12:49:47 input text abcdgagd
2021/06/03 12:49:47 start time 2021-06-03 12:49:47.924363864 +0800 CST m=+17.935351175
2021/06/03 12:49:47 http2: Transport encoding header ":authority" = "192.168.56.21:10001"
2021/06/03 12:49:47 http2: Transport encoding header ":method" = "POST"
2021/06/03 12:49:47 http2: Transport encoding header ":path" = "/"
2021/06/03 12:49:47 http2: Transport encoding header ":scheme" = "http"
2021/06/03 12:49:47 http2: Transport encoding header "content-type" = "application/json"
2021/06/03 12:49:47 http2: Transport encoding header "content-length" = "8"
2021/06/03 12:49:47 http2: Transport encoding header "accept-encoding" = "gzip"
2021/06/03 12:49:47 http2: Transport encoding header "user-agent" = "Go-http-client/2.0"
2021/06/03 12:49:47 http2: Framer 0xc00011a000: wrote HEADERS flags=END_HEADERS stream=5 len=10
2021/06/03 12:49:47 http2: Framer 0xc00011a000: wrote DATA flags=END_STREAM stream=5 len=8 data="abcdgagd"
2021/06/03 12:49:49 http2: Framer 0xc00011a000: wrote RST_STREAM stream=5 len=4 ErrCode=CANCEL
2021/06/03 12:49:49 RoundTrip failure: context deadline exceeded
2021/06/03 12:49:49 end time 2021-06-03 12:49:49.931512103 +0800 CST m=+19.942499411
2021/06/03 12:49:49 error Post "http://192.168.56.21:10001": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
ss 命令结果如下, 我们可以看到 SEND-Q 增加到 36 了。TODO: 这里为啥是涨了 36????
2021 年 06 月 03 日 星期四 12:49:49 CST
tcp ESTAB 0 36 192.168.56.22:47238 192.168.56.21:10001
tcpdump 输出如下: 看到我们发送的数据包 seq 150:186 没有收到 ACK,由于 TCP 重传机制,所以它在程序 timeout 之前会触发重传,但最后程序 cancel 了这个 stream,所以没有继续重传。
12:49:47.924574 IP shenjie-VirtualBox.47238 > 192.168.56.21.10001: Flags [P.], seq 150:169, ack 104, win 502, options [nop,nop,TS val 1794068635 ecr 292741327], length 19
12:49:47.924712 IP shenjie-VirtualBox.47238 > 192.168.56.21.10001: Flags [P.], seq 169:186, ack 104, win 502, options [nop,nop,TS val 1794068636 ecr 292741327], length 17
12:49:47.940110 IP shenjie-VirtualBox.47238 > 192.168.56.21.10001: Flags [P.], seq 169:186, ack 104, win 502, options [nop,nop,TS val 1794068651 ecr 292741327], length 17
12:49:48.152971 IP shenjie-VirtualBox.47238 > 192.168.56.21.10001: Flags [P.], seq 150:186, ack 104, win 502, options [nop,nop,TS val 1794068864 ecr 292741327], length 36
12:49:48.584546 IP shenjie-VirtualBox.47238 > 192.168.56.21.10001: Flags [P.], seq 150:186, ack 104, win 502, options [nop,nop,TS val 1794069295 ecr 292741327], length 36
12:49:49.416558 IP shenjie-VirtualBox.47238 > 192.168.56.21.10001: Flags [P.], seq 150:186, ack 104, win 502, options [nop,nop,TS val 1794070127 ecr 292741327], length 36
12:49:51.085043 IP shenjie-VirtualBox.47238 > 192.168.56.21.10001: Flags [P.], seq 150:186, ack 104, win 502, options [nop,nop,TS val 1794071796 ecr 292741327], length 36
12:49:54.376554 IP shenjie-VirtualBox.47238 > 192.168.56.21.10001: Flags [P.], seq 150:186, ack 104, win 502, options [nop,nop,TS val 1794075087 ecr 292741327], length 36
12:50:01.032263 IP shenjie-VirtualBox.47238 > 192.168.56.21.10001: Flags [P.], seq 150:186, ack 104, win 502, options [nop,nop,TS val 1794081743 ecr 292741327], length 36
12:50:14.345196 IP shenjie-VirtualBox.47238 > 192.168.56.21.10001: Flags [P.], seq 150:186, ack 104, win 502, options [nop,nop,TS val 1794095056 ecr 292741327], length 36
再次测试一遍,结果如下:
2021/06/03 12:49:57 input text defagagrg
2021/06/03 12:49:57 start time 2021-06-03 12:49:57.233483241 +0800 CST m=+27.244470546
2021/06/03 12:49:57 http2: Transport encoding header ":authority" = "192.168.56.21:10001"
2021/06/03 12:49:57 http2: Transport encoding header ":method" = "POST"
2021/06/03 12:49:57 http2: Transport encoding header ":path" = "/"
2021/06/03 12:49:57 http2: Transport encoding header ":scheme" = "http"
2021/06/03 12:49:57 http2: Transport encoding header "content-type" = "application/json"
2021/06/03 12:49:57 http2: Transport encoding header "content-length" = "9"
2021/06/03 12:49:57 http2: Transport encoding header "accept-encoding" = "gzip"
2021/06/03 12:49:57 http2: Transport encoding header "user-agent" = "Go-http-client/2.0"
2021/06/03 12:49:57 http2: Framer 0xc00011a000: wrote HEADERS flags=END_HEADERS stream=7 len=10
2021/06/03 12:49:57 http2: Framer 0xc00011a000: wrote DATA flags=END_STREAM stream=7 len=9 data="defagagrg"
2021/06/03 12:49:59 http2: Framer 0xc00011a000: wrote RST_STREAM stream=7 len=4 ErrCode=CANCEL
2021/06/03 12:49:59 RoundTrip failure: context deadline exceeded
2021/06/03 12:49:59 end time 2021-06-03 12:49:59.234699462 +0800 CST m=+29.245686766
2021/06/03 12:49:59 error Post "http://192.168.56.21:10001": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2021 年 06 月 03 日 星期四 12:49:59 CST
tcp ESTAB 0 99 192.168.56.22:47238 192.168.56.21:10001
我们看到程序中写了 HEADERS 和 Data,而从 ss 看到 SEND-Q 又增加了。
为了模拟 SEND-Q 增加到最大情况下,我修改了客户端程序,每次输入都开启 100 个 goroutine 去发送消息。
再多次输入数据后,是的 SEND-Q 达到最大,不再增加,测试观察到程序新上来的请求不能写数据了,并且阻塞在那里,不会 timeout 出来。
2021/06/03 13:04:00 http2: Framer 0xc000104000: wrote HEADERS flags=END_HEADERS stream=1191 len=8
2021/06/03 13:04:00 http2: Transport encoding header ":authority" = "192.168.56.21:10001"
2021/06/03 13:04:00 http2: Transport encoding header ":method" = "POST"
2021/06/03 13:04:00 http2: Transport encoding header ":path" = "/"
2021/06/03 13:04:00 http2: Transport encoding header ":scheme" = "http"
2021/06/03 13:04:00 http2: Transport encoding header "content-type" = "application/json"
2021/06/03 13:04:00 http2: Transport encoding header "content-length" = "96"
2021/06/03 13:04:00 http2: Transport encoding header "accept-encoding" = "gzip"
2021/06/03 13:04:00 http2: Transport encoding header "user-agent" = "Go-http-client/2.0"
2021/06/03 13:04:00 http2: Framer 0xc000104000: wrote HEADERS flags=END_HEADERS stream=1193 len=8
gghshthdhjyjfjagjalgapjagoalmdlfhaogfajgohanldkmflangufhaufodanflafjagoihanjdaghaugalgjagianjdal
2021/06/03 13:04:27 input text gghshthdhjyjfjagjalgapjagoalmdlfhaogfajgohanldkmflangufhaufodanflafjagoihanjdaghaugalgjagianjdal
2021/06/03 13:04:27 start time 2021-06-03 13:04:27.083829364 +0800 CST m=+119.981749071 in goroutine 30
2021/06/03 13:04:27 start time 2021-06-03 13:04:27.083909942 +0800 CST m=+119.981829639 in goroutine 0
2021/06/03 13:04:27 start time 2021-06-03 13:04:27.083934273 +0800 CST m=+119.981853968 in goroutine 1
2021 年 06 月 03 日 星期四 13:04:58 CST
tcp ESTAB 0 68891 192.168.56.22:47272 192.168.56.21:10001
2021 年 06 月 03 日 星期四 13:05:54 CST
tcp ESTAB 0 68891 192.168.56.22:47272 192.168.56.21:10001
这个时候基本上模拟出来 TCP 上 SEND-Q 满的情况,然后得到的结论是,此时发送出来的测试消息不能够被处理,并且一直阻塞,不能 timeout 出来。
而之所以真实环境上,请求端收到了 408,504 的错误代码,是因为我们在 VS 里面设置了 Timeout,所以如果请求端发送的请求由于微服务接收,但是在处理消息的时候,阻塞住了,导致了 Timeout 了。 这里下面会具体将为啥会回复 408 的错误代码,因为正常 Timeout,我们理解为,会回复 504 才对。对于 Istio 回复 504/408 的问题,我将在(二)中具体分析。
程序阻塞分析:
添加下面 log
//add log in net/http2/transport.go
func (cc *ClientConn) idleState() clientConnIdleState {
log.Println("start idleState")
cc.mu.Lock()
log.Println("end idleState")
defer cc.mu.Unlock()
return cc.idleStateLocked()
}
//add following logs in RoundTripOpt in net/http2/transport.go
log.Println("start t.connPool().GetClientConn")
cc, err := t.connPool().GetClientConn(req, addr)
log.Println("end t.connPool().GetClientConn")
if err != nil {
t.vlogf("http2: Transport failed to get client conn for %s: %v", addr, err)
return nil, err
}
//add logs in Write in net/http2/transport.go
func (sew stickyErrWriter) Write(p []byte) (n int, err error) {
if *sew.err != nil {
return 0, *sew.err
}
log.Println("start sew.w.Write")
n, err = sew.w.Write(p)
log.Println("end sew.w.Write")
*sew.err = err
return
}
发送直到阻塞:
2021/06/03 17:59:03 end in idleStateLocked freshConn false
2021/06/03 17:59:03 http2: Transport encoding header ":authority" = "192.168.56.21:10001"
2021/06/03 17:59:03 http2: Transport encoding header ":method" = "POST"
2021/06/03 17:59:03 http2: Transport encoding header ":path" = "/"
2021/06/03 17:59:03 http2: Transport encoding header ":scheme" = "http"
2021/06/03 17:59:03 http2: Transport encoding header "content-type" = "application/json"
2021/06/03 17:59:03 http2: Transport encoding header "content-length" = "2555"
2021/06/03 17:59:03 http2: Transport encoding header "accept-encoding" = "gzip"
2021/06/03 17:59:03 http2: Transport encoding header "user-agent" = "Go-http-client/2.0"
2021/06/03 17:59:03 start cc.writeHeaders
2021/06/03 17:59:03 http2: Framer 0xc000184460: wrote HEADERS flags=END_HEADERS stream=1969 len=8
2021/06/03 17:59:03 start sew.w.Write
下一条请求:
receive a new request
2021/06/03 17:59:35 start time 2021-06-03 17:59:35.070256042 +0800 CST m=+97.009212543 count 985
2021/06/03 17:59:35 start t.connPool().GetClientConn
2021/06/03 17:59:35 start idleState
再一条请求:
receive a new request
2021/06/03 17:59:42 start time 2021-06-03 17:59:42.93241995 +0800 CST m=+104.871376490 count 986
2021/06/03 17:59:42 start t.connPool().GetClientConn
所以分析下来,当 Write 的不了的时候,就不会释放 cc.mu,导致下一条请求拿不到 cc.mu 锁,从而阻塞了。 其实在这里,我加了 Pprof 去分析 goroutine 的调度栈,也发现了死锁的问题,但是还是加了点 log 看的更明显一点。
然后发现也有人遇到这个问题:x/net/http2: Blocked Write on single connection causes all future calls to block indefinitely。
番外
此时如果将 Sever 端网络重新开启 sudo ifup enp0s8
, 我们将看到,之前阻塞的消息能够被处理,从 ss 看到 SEND-Q 也被处理掉了。
2021 年 06 月 03 日 星期四 13:05:54 CST
tcp ESTAB 0 68891 192.168.56.22:47272 192.168.56.21:10001
2021 年 06 月 03 日 星期四 13:05:59 CST
tcp ESTAB 0 13 192.168.56.22:47280 192.168.56.21:10001
2021 年 06 月 03 日 星期四 13:06:04 CST
tcp ESTAB 0 0 192.168.56.22:47280 192.168.56.21:10001
问题定位
好了,问题复现出来了,我们抛开 istio 的处理,我们只需要关心如何修改程序去避免这个问题。
其实从上面的复现步骤,我们最终能发现,这个 Server 断网的操作,客户端并没有感知到,所以建立的 TCP 连接仍然是 ESTAB 的状态,这从而导致了上层 HTTP 并不能感知到这个 TCP 是坏的,它收到消息,仍然会往这个连接上发送,所以最终导致这个 TCP 上的 SEND-Q 满了,再上来的消息则没法再写,就一直阻塞在这里了。
所以这个问题的根本原因则是在 Server 端断网情况下,客户端并没有去关闭这个连接。所以如果想要修复这个问题,客户端程序则需要感知 Server 是否是正常的,或者客户端收到了某种情况下,可以主动去关闭这个连接。综合考虑下来,及参考 net/http, x/net/http2: TCP connection is not closed after http client request times out after missing RST (TCP) or GOAWAY(HTTP/2) frame 这个 issue,我们决定启用 Ping 消息,去感知这种异常状态。
解决方案
在客户端程序中添加 ReadIdleTimeout,从而开启 Ping 消息。
tr = &http2.Transport{
TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
AllowHTTP: true,
DialTLS: func(network, addr string, cfg *tls.Config) (net.Conn, error) {
d := &net.Dialer{
Timeout: time.Duration(2 * time.Second),
}
return d.Dial(network, addr)
},
ReadIdleTimeout: time.Duration(15 * time.Second),
}
这样我们就能够解决这个问题。
总结
这个问题从系统测试人员提供的信息上,我们大概能分析出来是我们的 HTTP2 客户端程序出现了问题,也就是对于 Server 端断网的情况下,我们的程序没有主动去关闭连接,或者说没有处理这种异常情况,导致连接的 SEND-Q 满了,还在发送消息。从而是后面的消息阻塞在那里,从而导致练级的问题。
该问题的出现,也是由于我们对 golang net/HTTP2 库的理解不够,对其中的很多参数并没有关注。