To demonstrate the capabilities of Synogate HashCache and highlight its impact on network appliances, we implemented a stateful firewall that uses Synogate HashCache for its connection tracking.
The stateful firewall with Synogate HashCache was implemented on a hostless Intel® Agilex™ 7 FPGA.
Four servers utilize their combined 192 hardware threads to generate and send random UDP packets (as used in QUIC, DNS, VoIP, …), i.e. “requests”, as quickly as possible.
This packet flow is passed through the FPGA, i.e. the stateful firewall, to another two servers which receive the packets and “respond” to them by reflecting them back.
These “responses” are passed back through the FPGA to the original four servers.
The randomized source and destination addresses of the “requests” ensure that every request is a new connection, a worst case scenario for stateful firewalls.
Requests must adhere to a certain port range to be allowed through by the firewall (stateless check) while responses are only allowed back if they belong a connection that was first established by a request packet (stateful check).
Because of the nature of UDP, the firewall is not aware when a connection is terminated.
The servers can be switched to produce illegal requests or illegal responses to verify the correct filtering behavior of the firewall.
The packet size can be modified to control the connection rate from 8 million per second to 140 million per second.
The demo shows that Synogate HashCache in the implemented configuration on a single device can handle around 120 million state table inserts (request packets) and 120 million state table updates (response packets) per second.
Since both operations are the same complexity, this demonstrated a peak rate of 240 million new connections per second while using commodity DRAM to hold 16 billion state entries.
||(sponsored by Intel®)
||Intel® Agilex™ 7
|(type and amount)
||3x 64GB DDR4 + 1x 16GB DDR4 (HPS channel)
||(EMIF clock domain)
|(of Synogate HashCache implementation)
||(according to Quartus, including transceivers but excluding DRAM and board components)
||(per entry: 8 bytes + ca. 6 bytes overhead)
|(entries in state table)
||(1x insert + 1x update per connection)
||120 million/s (x2)