Keepalive is very important concept almost in all tcp connection like database etc, the same concept does support in gRPC as well.
Why we need keepalive in gRPC?
When a connection is idle for a while, if there were some other network component involved like load balancer or proxy, the connection may get disconnected.
In those kinds of cases, we will need to have a mechanism to reconnect to the server-side.
Configurations
In the grpc keepalive proposal, there are a lot of configurations that we can set, this time we will use golang as an example.
From the golang official pkg there were client side and server side parameters. Most of them are quite straightforward and easy to understand, but some configuration is still a bit ambiguous for the server side setting.
MaxConnectionAge vs MaxConnectionAgeGrace
First thing that I am not so sure about are these two configurations, from the description MaxConnectionAgeGrace is the time allowed for the rpc call before the connection was forcibly closed.
So I got one question, during `MaxConnectionAgeGrace` period, what happen if there are new requests?
Using the following setting to test with
- MaxConnectionAge — 10s
- MaxConnectionAgeGrace — 30s
Client Side
Server Side
running the server & client with `GODEBUG=http2debug=2` so it will print out internal logs.
Result
Server logs
2021/05/07 18:27:56 Received: world
2021/05/07 18:28:06 http2: Framer 0xc0001a0000: wrote GOAWAY len=8 LastStreamID=2147483647 ErrCode=NO_ERROR Debug=""
...
2021/05/07 18:28:16 http2: Framer 0xc00025a000: read DATA flags=END_STREAM stream=1 len=12 data="\x00\x00\x00\x00\a\n\x05world"
2021/05/07 18:28:16 Received: world
...
2021/05/07 18:28:26 returning world
...
2021/05/07 18:28:36 http2: Framer 0xc000346000: wrote GOAWAY len=8 LastStreamID=2147483647 ErrCode=NO_ERROR Debug=""
...
2021/05/07 18:28:45 http2: Framer 0xc0001a01c0: wrote GOAWAY len=8 LastStreamID=2147483647 ErrCode=NO_ERROR Debug=""
...
2021/05/07 18:28:46 returning world
Client logs
2021/05/07 18:27:56 http2: Framer 0xc0002a4000: wrote DATA flags=END_STREAM stream=1 len=12 data="\x00\x00\x00\x00\a\n\x05world"
...
2021/05/07 18:28:06 http2: Framer 0xc0002a4000: read GOAWAY len=8 LastStreamID=2147483647 ErrCode=NO_ERROR Debug=""
...
2021/05/07 18:28:16 http2: Framer 0xc0001f2000: wrote DATA flags=END_STREAM stream=1 len=12 data="\x00\x00\x00\x00\a\n\x05world"
...
2021/05/07 18:28:25 http2: Framer 0xc0002a41c0: read GOAWAY len=8 LastStreamID=2147483647 ErrCode=NO_ERROR Debug=""
...
2021/05/07 18:28:26 Greeting: Hello world
2021/05/07 18:28:36 http2: Framer 0xc00023c000: read GOAWAY len=8 ...
2021/05/07 18:28:46 Greeting: Hello world
We can see every 10 seconds the server side send a GOAWAY
to client connection and client reconnect after that, but since the previous request hasn’t finished the connection itself is still alive.
The call will take 30s and with MaxAge (10s) + Grace (30s), the call should have enough time to process.
As we can see in the log, the client side sends a request every 20s, and we can see both of them were done without any issues.
MaxAge with Client Side Keepalive
Another interesting problem that many people may face is grpc connection was force closed by transport closing
before MaxAge
, usually it was caused by Client Side Keepalive
In client side keepalive configuration, there was a Time
params that you can pass in, according to the description it is the client side ping interval, if you only set this params for example 5s, most likely your connection will close very regularly and randomly.
This is because there was a hidden restriction for ping interval, in the proposal it said within MinTime
, the server can only receive at max 3 pings.
Let’s try it out!
Server side
Client Side
Result
Client logs
2021/05/10 12:37:55 http2: Framer 0xc0000d4000: wrote SETTINGS len=0
2021/05/10 12:37:55 http2: Framer 0xc0000d4000: read SETTINGS len=6, settings: MAX_FRAME_SIZE=16384
2021/05/10 12:37:55 http2: Framer 0xc0000d4000: wrote SETTINGS flags=ACK len=0
2021/05/10 12:37:55 http2: Framer 0xc0000d4000: read SETTINGS flags=ACK len=0
2021/05/10 12:37:55 http2: Framer 0xc0000d4000: wrote HEADERS flags=END_HEADERS stream=1 len=74
2021/05/10 12:37:55 http2: Framer 0xc0000d4000: wrote DATA flags=END_STREAM stream=1 len=12 data="\x00\x00\x00\x00\a\n\x05world"
2021/05/10 12:37:55 http2: Framer 0xc0000d4000: read WINDOW_UPDATE len=4 (conn) incr=12
2021/05/10 12:37:55 http2: Framer 0xc0000d4000: read PING len=8 ping="\x02\x04\x10\x10\t\x0e\a\a"
2021/05/10 12:37:55 http2: Framer 0xc0000d4000: wrote PING flags=ACK len=8 ping="\x02\x04\x10\x10\t\x0e\a\a"
2021/05/10 12:38:05 http2: Framer 0xc0000d4000: wrote PING len=8 ping="\x00\x00\x00\x00\x00\x00\x00\x00"
2021/05/10 12:38:05 http2: Framer 0xc0000d4000: read PING flags=ACK len=8 ping="\x00\x00\x00\x00\x00\x00\x00\x00"
2021/05/10 12:38:15 http2: Framer 0xc0000d4000: wrote PING len=8 ping="\x00\x00\x00\x00\x00\x00\x00\x00"
2021/05/10 12:38:15 http2: Framer 0xc0000d4000: read PING flags=ACK len=8 ping="\x00\x00\x00\x00\x00\x00\x00\x00"
2021/05/10 12:38:25 http2: Framer 0xc0000d4000: wrote PING len=8 ping="\x00\x00\x00\x00\x00\x00\x00\x00"
2021/05/10 12:38:25 http2: Framer 0xc0000d4000: read PING flags=ACK len=8 ping="\x00\x00\x00\x00\x00\x00\x00\x00"
2021/05/10 12:38:35 http2: Framer 0xc0000d4000: wrote PING len=8 ping="\x00\x00\x00\x00\x00\x00\x00\x00"
2021/05/10 12:38:35 http2: Framer 0xc0000d4000: read PING flags=ACK len=8 ping="\x00\x00\x00\x00\x00\x00\x00\x00"
2021/05/10 12:38:35 http2: Framer 0xc0000d4000: read GOAWAY len=22 LastStreamID=1 ErrCode=ENHANCE_YOUR_CALM Debug="too_many_pings"
2021/05/10 12:38:35 could not greet: rpc error: code = Unavailable desc = transport is closing
We can see the client pinging the server every 10s while our setting is 3s, the reason is written in the config description:
// After a duration of this time if the client doesn't see any activity it
// pings the server to see if the transport is still alive.
// If set below 10s, a minimum value of 10s will be used instead.
As we can see at the 4th time the client pinging the server it throw a `ENHANCE_YOUR_CALM` error and send a GOAWAY
to the client, it make all the rpcs force closed and this was ignoring the Grace
Period.
Conclusion
In conclude, we know that the previous connection will still open but not accepting new request, after MaxAge
it will refresh to a new connection and plotting connection to the new connection instead.
Using keepalive in client-side sometime can be dangerous if you misconfigured, we need to pay extra attention and check the documentation carefully.