Packet Processing

What network appliances see

Network traffic is transmitted as packets. Any network appliance sitting between endpoints will only see a stream of those packets, and since a network appliance is usually servicing many endpoints, these packets are usually interleaved for many different connections.

Any two packets belonging to the same connection can be potentially millions of packets apart. At the same time, many protocols allow content to be arbitrarily split across packets, making it difficult for network appliances to keep track of what each connection is doing or which state it is in.

Because of this, network processing can be roughly split into two categories: Stateless and stateful processing.

Stateless Processing

Processing network traffic in easy mode

Because of the difficulty of having relevant information of a connection spread across packets, many network appliances limit themselves to stateless processing: processing where decisions are made for each individual packet in isolation. This greatly simplifies the implementation: When a packet arrives, relevant information such as header fields can be extracted, used in the decision making process, and be immediately discarded when the decision has been executed.

However, since no information from prior packets is used, the capabilities of stateless processing are severely limited. Information about connection states is not available. TLS encrypted streams spanning multiple packets cannot be decrypted by the device. Most protocols can not be verified, translated, or terminated. The list goes on.

Stateless Capabilities:

  • Routing
  • Basic filtering
  • Load balancing without stable connection pinning
  • Echo service
  • No rate limiting
  • Nothing that is connection based

Stateful Processing

More capable but more difficult network processing

Contrary to that, in stateful processing, information or decisions relating to a connection (i.e. state) is stored and used to inform future decisions that pertain to the same connection. The stored state can be as simple as the connection state of a flow, but could also be the state of a TCP state machine or a regex parser.

Being able to exploit state unlocks a plethora of capabilities as it allows the network appliance to actively participate and speak or verify protocols, rather than looking at individual packets. But it also increases complexity, as state now needs to be managed and updated at very high rates. The component responsible for managing this added complexity, the state storage, is usually implemented in the form of a key-value store.

Stateful Capabilities:

  • Stateful Firewalls
  • Load Balancers
  • High fan-in or fan-out Gateways
  • VPN Aggregators
  • Protocol Parsers
  • Multi-stream Regex Filters
  • TCP/TLS Termination Devices
  • and many more

Key-Value Stores

The required component to build stateful network appliances

While such a key-value store is conceptually simple, implementing it such that it can sustain modern network bandwidths with commodity memory is quite difficult.

Modern and future network appliances are expected to handle 100-400 GBit/s of traffic. This translates to 10s-100s of millions of packets per second, which a key-value store must be able to look up in real time. In order to be resilient to DoS attacks or connection-heavy machine2machine communication, these implementations must also support inserts at similar rates. Finally, if active sessions are to be preserved even under heavy load or attack, the key-value store needs to also have a large storage capacity, as well as resilient replacement policies to decide which sessions to drop once the capacity is reached.

When exploring the state of the art, we found that none of the existing approaches offered these ideal properties for high speed, internet facing network appliances.

Required Key-Value Store Properties:

  • High Lookup Rate: (10s - 100s of millions/sec)
  • Resilient Policy: Guaranteed state retention time
  • High Insert Rate: (10s - 100s of millions/sec)
  • High Capacity: (100s - 1000s of millions)
  • Thread Scalability: The ability to trade area (more workers) for more performance
Type Special Policy Tree Hash-Tables
CAMContent-addressable memory, a very expensive type of memory that can do very fast lookups. RndRandom replacement, a replacement policy that is easy to implement but discards active entries surprisingly quickly. LRULeast recently used, a replacement policy that keeps active entries alive but is hard to parallelize. RBRed-black trees, a balanced binary search tree with guarantees on insert and retrieval speeds. Cu.Cuckoo hashing, a concept for resolving hash collisions in hash tables that allows for very fast lookups, but suffers from complicated inserts. Lin.P.Linear probing, a concept for resolving hash collisions in hash tables with decent lookup speed and a simple insert mechanism. Synogate HashCache
High Lookup Rate
(10s - 100s of millions/sec)
Resilient Policy Guaranteed retention time
(Guaranteed old/inactive)
High Insert Rate 240M/s
(10s - 100s of millions/sec)
High Capacity 16G
(100s - 1000s of millions)
Thread Scalability
(Scales well with parallelization)

Synogate HashCache

Our solution for implementing network appliances without sacrificing speed or DoS resilience

Because of this lack of an ideal state storage solution, we developed Synogate HashCache.

Synogate HashCache is the key-value store that satisfies the demanding requirements of high speed, internet facing network appliances. Its key ingredients are twofold:

  • A novel algorithm that allows high lookup but also high insert speeds, while guaranteeing that active connections are not discarded from the state table.
  • An efficient RTL implementation for FPGAs or ASICs that can scale to high speeds while simultaneously using commodity DRAM for the state table, enabling very large numbers of concurrent connections.

Synogate HashCache is a complete solution for state management inside hardware accelerators. State insertion does not require any CPU or host intervention, allowing complete offloading into the accelerator (FPGA or ASIC), and even hostless setups.

Any questions so far? Get in touch

Benefits:

  • Very high lookup and insertion rates
  • Resilient to DoS attacks
  • Can utilize commodity memory (DRAM)
  • Can scale easily to future bandwidth demands

Demo Setup

What we showed live at FPGA Conference Europe 2023

To demonstrate the capabilities of Synogate HashCache and highlight its impact on network appliances, we implemented a stateful firewall that uses Synogate HashCache for its connection tracking.

The stateful firewall with Synogate HashCache was implemented on a hostless Intel® Agilex™ 7 FPGA. Four servers utilize their combined 192 hardware threads to generate and send random UDP packets (as used in QUIC, DNS, VoIP, …), i.e. “requests”, as quickly as possible. This packet flow is passed through the FPGA, i.e. the stateful firewall, to another two servers which receive the packets and “respond” to them by reflecting them back. These “responses” are passed back through the FPGA to the original four servers.

The randomized source and destination addresses of the “requests” ensure that every request is a new connection, a worst case scenario for stateful firewalls. Requests must adhere to a certain port range to be allowed through by the firewall (stateless check) while responses are only allowed back if they belong a connection that was first established by a request packet (stateful check). Because of the nature of UDP, the firewall is not aware when a connection is terminated.

The servers can be switched to produce illegal requests or illegal responses to verify the correct filtering behavior of the firewall. The packet size can be modified to control the connection rate from 8 million per second to 140 million per second.

The demo shows that Synogate HashCache in the implemented configuration on a single device can handle around 120 million state table inserts (request packets) and 120 million state table updates (response packets) per second. Since both operations are the same complexity, this demonstrated a peak rate of 240 million new connections per second while using commodity DRAM to hold 16 billion state entries.

Demo Specs:

Device (sponsored by Intel®)
(FPGA) Intel® Agilex™ 7
External Memory (4x4 channels)
(type and amount) 3x 64GB DDR4 + 1x 16GB DDR4 (HPS channel)
Utilization (ALMs)
(of FPGA) 6.8 %
Frequency (EMIF clock domain)
(of Synogate HashCache implementation) 333 MHz
Power Draw (according to Quartus, including transceivers but excluding DRAM and board components)
(estimated maximum) 35 W
Capacity (per entry: 8 bytes + ca. 6 bytes overhead)
(entries in state table) 16 billion
Speed (1x insert + 1x update per connection)
(connection rate) 120 million/s (x2)

High Insert Rate

How our demo firewall compares to the best on the market in terms of insert speed

Because of this high insert rate, Synogate HashCache keeps accepting new connections even under DoS attacks.

Comparing to flagship network appliances on the market, the stateful firewall using Synogate HashCache outperforms the best that money can buy by orders of magnitude.

In all fairness, these network appliances are fully featured, production ready devices while our stateful firewall is just a demonstrator. But outperforming 1000-2000W multi-CPU & multi-FPGA boxes with a single < 100W hostless FPGA board at 6.8 % utilization shows the potential that Synogate HashCache can have in network appliances.

The fact that these network appliances have much higher stateless processing speeds underlines the industry’s need for a key-value store such as Synogate HashCache.

Large State Table

How our demo firewall compares to the best on the market in terms of capacity

The 16 billion state entries, together with Synogate HashCache’s DoS resilient replacement strategy, guarantee that even under a worst case DoS attack scenario, every connection is kept alive as long as it experiences activity every 8 seconds.

The benefit of being able to utilize commodity DRAM for state storage (of which almost arbitrary amounts can be attached) becomes evident when comparing to network appliances on the market, where Synogate HashCache again outperforms its competition by orders of magnitude.

Interested?

We are more than happy to discuss your project and how we can help.

Call us: +49-30-62932062

Contact us and we will set up an online meeting with our engineers.

You can also schedule a demo directly:

Book Meeting

As a little treat for reading the entire presentation, here is a one-pager with the key facts about Synogate HashCache for you to download:

Product Brief

Get in touch!


The development of the Synogate HashCache IP-Core in general and this demo in particular is being sponsored by the German Federal Ministry of Education and Research:
Logo of the German Federal Ministry of Education and Research