When these NIDSes, especially hardware NIDS, process packets of Transmission Control Protocol TCP, they need a preprocessor to reassemble discrete TCP packets in a flow to strengthen the
Introduction
Motivation
Nowadays network is vital to almost every organization E-Commerce is an instance and it is growing rapidly based on the web infrastructure E-Government has been deployed in some countries and continually expanded; and many companies or schools use network to communicate with their staff and customers Because of such importance of network, the security of the network is a serious issue to be solved The network can be the most vulnerable part of an organization, and it should be protected from many crimes Statistics show that the information crime has increased dramatically for recent years, they are widely known as the Cybercrime Interpol reports the cost of Cybercrime or Computer crime [9] worldwide reach $ 8 billion in 2007 and 2008 This type of crime always uses a computer and a network [10] to carry out their illegal intrusion or simply to disable a server by DoS (Denial of Service) attack These intrusions are based on many types of protocols However, the Transmission Control Protocol (TCP) is the most popular, the authors in [4] showed that 85% of network traffic is TCP This explains why many Cybercrimes use TCP packets to attack to a server In order to prevent an information system from these intrusions, some Network Intrusion Detection / Prevention Systems (NIDS/NIPS) have been being developed to prevent attacks on an organization However, because of the nature of TCP protocol, packets can reach a destination in the original sequence or a different sequence If the intrusion patterns are included in a single packet, they can be detected by traditional NIDS/NIPS; but if the intrusion patterns expand over packets, and these packets do not arrive in the original sequence (out-of-sequence), they cannot be detected Thus these out-of-sequence TCP packets should be re-ordered before it
Author: Tran Huy Vu enters an NIDS/NIPS Moreover, because the network speed can reach 1Gbps or more, and there can be a large number of concurrent connections, keeping track of all of these connections can lead to memory exhausting Study [4] shows that full TCP reassembly requires large amount of memory, up to 2GB for each 1Gbps connection Therefore, it is necessary to develop a TCP Reassembly Engine (TCPRE), which supports high throughput (more than 1Gbps), monitors a large number of concurrent connections and use memory efficiently
In addition, FPGA is now the solution for many hardware related problem First introduced in late 1980s, initially the function of FPGA is mainly prototyping of hardware designs From then on, FPGA has been developing quite rapidly Recently, many high-speed FPGAs has been released, they can fulfill the requirements of many hardware designs In addition to the high speed of new FPGA, they have some other excellent properties; FPGAs can be easily and quickly reconfigured, they are very good for parallel processing and pipelining, and FPGA is a low-cost solution
Because of the advantages of FPGA, there are trends to use FPGA to solve network problems in the world Several researches has been proceeded to reassemble out-of- sequence TCP packets using FPGA [1, 2, 3, 4, 5] These researches are classified in three types which use three different methods, (1) dropping out-of-sequence packets, (2) buffering out-of-sequence packets and (3) out-of-sequence matching (for TCP stream scanning) All these systems use FPGA to implement the design
Figure 1-1 Out-of-sequence packets passing an NIDS This http: //www attack is an pattern NIDS This should be reassembled as “this is an attack pattern”
Statement of problem
The rapid growth of network, in which the TCP/IP protocol is the most widely used, motivates the development of network applications The releases of many modern FPGA offer high-speed, high throughput solution for these network applications Many of these applications require TCP packets to be reassembled before so that the applications are more efficient and stronger NIDS is such an application The NIDS systems can deploy a “deep packet inspection” function This includes both static matching and Perl Compatible Regular Expression (PCRE) matching Therefore, this thesis aims to building a TCP preprocessor system with a special technique to support both matching schemes efficiently The main function of this TCP Preprocessor is to analyze the packet protocol and send supported packets, such as UDP, TCP…, to application circuit (NIDS) It also re-orders TCP packets and reassembles TCP flows before passing them to the application circuit (NIDS) as shown in Figure 1-2 Besides it also manages the traffic on the line, and somehow makes the preprocessor transparent to the user
Figure 1-2 Deployment model of the
Contribution
This thesis introduces new method TCP reassembly which takes the advantages of both reassembly techniques, the technique of buffering out-of-sequence packets and the technique of our-of-sequence matching above The following are its contributions:
It proposes a new method of TCP reassembly, supports both TCP re-ordering and flow reassembly
It proposes a new data structure to manage the reassembly memory efficiently which supports buffering multi-hole connections
Preprocessor is implemented on FPGA platform This Preprocessor supports hundreds thousands of concurrent connections and tens of thousands of multi- hole connections
Organization
The thesis is organized as follows
Chapter 2 states some background knowledge about network, the TCP/IP protocol and the TCP reassembly, Network Intrusion Detection system is also mentioned
Chapter 3 briefly describes the related researches in the world
Chapter 4 explains our method of TCP reassembly and the data structure which is used in this thesis
Chapter 5 presents the implementation of our technique on targeted hardware platform
Chapter 6 is our experimental result and evaluation
Chapter 7 is our conclusion and the future work
Background
Transmission Control Protocol
In networking, when a data travels from one machine to another, it is processed in many level of protocol before actually propagates on physical wires These protocols together form the communication model There are two main models in networking, the OSI model and the TCP/IP model The latter is the most widely used model all over the world This model has four levels, and there are several protocols in each level as illustrated in Figure 2 When data is sent from one level to the next lower level, it is packed with additional information so that it can be unpacked in the same level at the destination machine
The lowest level is close to the physical layer (optic fiber, twisted pair cable, co-axial cable …), its function is to encode and send the data from the internet layer to the transmission media, or receive and decode the data from transmission media to the
{Ethernet, Token Ring, Frame Relay, ATM,…}
Figure 2-1 TCP/IP model and packing data of TCP packet
Author: Tran Huy Vu internet layer Ethernet, Token Ring, ATM… belong to this layer, these protocols help transmitting data from one machine to another machines in the same network Among these protocols, the Ethernet protocol is the most widely used Based on the network layer, some protocols are developed IP (Internet Protocol) is an example; it allows the data to be transmitted from one machine in a network to another machine in other network The data is packed with an IP header before it is sent to the Network layer as in Figure 2-2 The Version is always 4 for IPv4 (IPv6 has different header format)
IHL, TOS and Total Length are IP Header Length, Type Of Service and the length of IP packet correspondingly When an IP packet travel from a network with large MTU (Maximum Transmission Unit) to a network with smaller MTU, the size of the packet can exceed the MTU; in this case, the IP packet must be fragmented into smaller packet so that they can be transmitted in the network The 3-bit Flags field indicates whether the packet is fragmented or not The Flags value of 2 means the packet is not fragmented The value of 4 means the packet is fragmented; in this case, the Identification field identifies the sub-packet, and the Fragment Offset field shows the offset of the first data byte of the packet from the fist data byte in the original packet
TTL means Time To Live, it is the maximum number of stations that the packet can travel through Protocol indicates the protocol in the next higher layer Header Checksum is the 16-bit checksum value calculated for the header only Each machine communicate via IP is addressed by a 32-bit IP address (IPv4) or 128-bit IP address (IPv6) Options field is optional information and can be omitted; it is padding with zero to be 32-bit aligned
Version IHL TOS Total Length
Source IP Address Destination IP Address
In TCP/IP model, the TCP protocol in located in transport layer, and it is constructed based on Internet Protocol (IP), a connectionless protocol However, TCP is connection-oriented It requires both terminals to setup a connection before communicate with each other via this connection The header of a TCP packet is illustrated in Figure 2-3 Each connection is identified by the source machine IP address and the destination machine IP address in the IP header, the source port and destination port in TCP header, these port fields represent the applications to process the data The data is divided into smaller parts (if necessary) so that each of them can be packed in an IP packet These IP packets are sent to the Network layer and then the physical media corresponding to the original order of each part The order of these packets is expressed by a 32-bit sequence number The Acknowledgement number is usually the next sequence number the destination machine expects to receive Offset field of the first data byte from the start of the header There are many flags in the Flags field; however, there are three flags which are frequently used The SYN, ACK and FIN flags, the meaning of these flags will be explained later The Window size field is used for flow control activity and it will be explained in the next part too
Urgent Pointer is an offset from the sequence number to the last urgent data byte, this field is meaningful only if the URG flags in Flags field are set Options fields is optional information and can be omitted, it is padded with zero to be 32-bit aligned
Offset Resv Flags Window size
Based on the network infrastructure, a packet can travel through many routers, and thus it can be dropped at any router due to errors The source machine has to detect this situation to retransmit any dropped packet To ensure the destination machine to receive all packets, and the efficient retransmission of packets, the flow control mechanisms are introduced There are two main mechanisms: Go-back-N ARQ (Auto Repeat Request) and Selective Repeat ARQ as described in Figure 2-4 In Go-Back-N ARQ mechanism, if the transmitter cannot receive the acknowledgement of the packet within a reasonable time after transmission, it automatically retransmits the packet and all successive packets This method is simple, but it can cause the network traffic to be over loaded because of many retransmitted packets The mechanism of Selective Repeat requires the receiver to have the capability of buffering packets The transmitter can keep track of a number of packets, equals to window size, and it retransmits only packets, which are not acknowledge correctly This mechanism uses the bandwidth more efficiently; however, the number of bytes to be buffered can reach the maximum of Window size Moreover, the Window size can be left shifted up to 14 bits, so the maximum number of byte to be buffered can reach 1GB for each direction of a connection
Figure 2-4 the Flow control mechanism
Before transmitting data from a machine to another machine using TCP protocol, a connection must be established between the two machines The procedure to establish a connection is call three way hand shaking as described in the Figure 2-5 The client first sends a TCP packet with SYN flag set, the sequence number is set to a random number called Initial Sequence Number (ISN) and wait for the server to send the TCP packet with both SYN and ACK flag set, the acknowledgement number is set to ISN_A + 1, the sequence number is set to another random number ISN_B and wait for the client to send the TCP packet with ACK flag set, the acknowledgement number is set to ISN_B_1 At this time, both client and server can send or receive data
TCP reassembly
Because of transmission errors and the retransmission mechanisms, packets can reach a receiver in the order different from the original order The first TCP packet (the SYN packet) in a connection is always in the right order, so the SYN packet is in-sequence
A TCP packet is called in-sequence if its sequence number is the next expected sequence number (ACK number) of the last in-sequence packet, or else it is called out-
Figure 2-5 Three way hand shaking in TCP connection
Author: Tran Huy Vu of-sequence We call one or more consecutive missed TCP packets as a TCP hole or simply a hole, and we call consecutive successive TCP packets as a TCP segment or simply a segment as illustrated in Figure 2-6 In a connection, there can be one or more concurrent holes, and one hole can be made up from one or more missed packets
When a hole is created, there are five situations that a packet fills in a hole The Figure 2.7a is the situation when a packet is in-sequence, but it does not fill the whole hole, it only makes the hole narrower In the Figure 2.7b, the packet is in-sequence, and it fulfill the hole; therefore the first out-of-sequence become in-sequence, and should be processed by the application An out-of-sequence packet can be pre-pended or appended to a segment as in Figure 2.7c and 2.7d correspondingly, and it only makes the hole narrower In the Figure 2.7e, a packet is out-of-sequence, and it fulfills a hole, and it is adjacent to both segments; in this case, the two segments and the packet become only one segment a) b) c) d) e)
Coming packet In-sequence packets hole segment
Figure 2-6 out-of-order TCP packets passing an NIDS
Figure 2-7 Five situations of TCP hole filling up
SYN Packet0 Packet1 Packet2 Packet3 Packet4 Packet5 Packet6 Packet7
Out-of-sequence hole segment
Only re-order TCP packets does not help the NIDS engine detect attack patterns which expand over packets These packets must be reassembled to make a TCP flow logically continuous to NIDS engine If the application uses a FSM, assembling interleaved packets, which are in the right order, can be obtained by storing and restoring the FSM state of application circuit at the beginning and the end of a segment correspondingly
However, many applications do not deploy a FSM; in this case, the flow reassembly can be carried out by load and store overlapped data at the edges of packets.
Network intrusion detection system
An Intrusion Detection System (IDS) is a software or hardware system which monitors a network and attempts to detect any illegal intrusion activities [10] An IDS can be a Network-based Intrusion Detection System (NIDS), a Host-based Intrusion Detection System (HIDS) or a Network-based Intrusion Protection System (NIPS) An NIDS is usually installed on a backbone network [12, 13] to monitor all from/to the protected network as illustrated in Figure 2-8
Figure 2-8 NIDS and the deployment model
A HIDS is usually installed on any host of a network which needs to be protect [12, 13], the name also indicates that a HIDS only monitors the host on which it is installed, and it does not monitor the entire network
Figure 2-9 HIDS and deployment model
An NIPS is more powerful than an NIDS An NIDS only monitors the traffic on the network; it does not make changes to the traffic An NIPS, on the contrary, does not only monitor the traffic, but also drops or redirects the data once that data is judged as intrusion activity
Recently, IDSs attract many researchers Initially, IDSs were usually softwares running on servers or personal computer It was good because the network speed was not very high at that time However, the network infrastructure grows rapidly; the network
Author: Tran Huy Vu speed reaches tens of Giga-bit-per-second (Gbps) now The software cannot tolerate such high speed; therefore, several hard IDSs/IPSs are introduced to protect a network at line rate The following are some software and hardware IDSs currently being developed
Snort is an open source light weight IDS which is developed by Sourcefire [11] First introduced in 1998 as a sniffer, Snort has been developing continually with more powerful functions Snort is now the most widely deployed IDS with millions of downloads It can operate on many platforms such as Windows, Linux, Solaris, MacOS… Snort is easy to use; it can be configured to operate as an NIDS or NIPS
The operation of Snort is mainly based on the predefined rule set; the architecture of Snort is described in Figure 2-10
Sniffer: capturing all packets from the network
Preprocessor: reassembling TCP flow, defragment the fragmented IP packets
Detection engine: matching the header and the content of packets with the rule set
Alert/ Logging: logging packets or generating alerts
As stated above, Operation of Snort is based on the rule set A rule in the Snort rule set is well-formated so that not only expert can compose the rule, but a normal person can also write a specific rule for his purposes The syntax of the rule is quite simple as in Figure 2-11 alert udp $EXTERNAL_NET any -> $HOME_NET 5060 (msg:“VOIP-SIP MultiTech INVITE field buffer overflow attempt”; content:”INVITE”; depth:6; nocase; pcre:”/^INVITE\s[^\s\r\n]{60}/smi”; reference:bugtraq, 15711; reference:cve, 2005-4050; classtype:attempted-user; sid:11981; rev:4;)
Though a snort rule can have many fields, only the content and pcre fields are mentioned The content keyword indicates the static pattern, in this case the pattern is INVITE The keyword tells the Snort to scan the entire payload for the text INVITE inside the payload; if there is any, Snort will issue an alert message The keyword pcre indicate the regular expression to be scan in the payload The regular expression is written in Perl-Like format, so it is called Perl-Like Regular Expression (PCRE)[] If there is any text matched the pcre, Snort will issue an alert as well
2.3.2 NIDS project at Faculty of Computer Science and Engineering
Though Snort is a good IDS, it is not reliable when operating in high speed network, for example Gigabit lines Several hardware solutions for NIDS are proposed to meet the requirements of high speed network At Faculty of Computer Science and Engineering, University of Technology, there is a research project to implement an NIDS on FPGA platform This NIDS also uses the Snort rule to detect intrusion patterns It deploys both static pattern matching and PCRE matching The detection engine includes two main parts, the packet classification module classifies packets based on the header and Snort rules, and the content inspection module matches the payload of packets with Snort rules
The packet classification module uses Cuckoo hashing method to classify packets The 5-tuple record, {source IP, destination IP, source port, destination port, protocol}, of the header is used to calculate the hash value This module classifies UDP and TCP packets only
To carry out the static pattern matching, it scans the input data and matches each byte with the pattern using Cuckoo hashing For the long patterns, it uses the method proposed by Dr Tran Ngoc Thinh [8] In this method the long patterns is split into smaller sub-patterns with the length from 1 to 16 characters These sub-patterns are marked to distinguish the prefix, infix or short patterns
The PCRE matching sub-engine is introduced in [9], it uses the approach of NFA with many Sub-RegEx Units for matching character in PCRE and many CRBs (Constraint Repetition Block) for matching repetition operators in PCRE
These matching engines can now detect intrusion patterns in an individual packet only, because when it finishes matching a packet, if there is another packet from other flow arrives, all matching status of the old packet will be lost Besides, neither of them have an explicit FSM, so the system cannot store and restore the FSM when the next packet of the same flow arrives Therefore, a preprocessor supports this system need to apply another technique to reassemble flows, for example storing overlapped data between two consecutive packets.
NetFPGA board
NetFPGA is a low cost, reconfigurable hardware platform which is optimized for high- speed networking and developed by Stanford University It is equipped with an FPGA chip and several Gigabit Ethernet interfaces Figure 2-11is a NetFPGA board
The following is the specification of the board NetFPGA 1G:
Xilinx FPGA: Spartan to control PCI interface and to program the Virtex chip
2.4.2 Design with the reference design
The NetFPGA package supplies the user many reference designs such as reference router, reference nic, dram controller, Ethernet mac … When designing with NetFPGA board, the user can save much time by using these reference design The structure of a reference project is as following:
src: contains all verilog code to be synthesized
synth: contains XCO file, Makefile, and to implement the design
sw: contains all software programs
include: contains all header files or files that define macro
Related works
The TCP processor [4]
As stated above, if the source machine cannot receive the acknowledgement of a TCP packet, it will retransmit the packet automatically The TCP Processor in [4] uses this retransmission mechanism to reorder the out-of-sequence packets It drops all out-of- sequence packets Because the destination machine cannot receive the packet, it will not acknowledge to the source machine; therefore, the source machine will retransmit the missed packet and all out-of-sequence packets regardless of what the flow control mechanism is In this way, the flow control mechanism will be forced to Go-Back-N
The advantages of this approach are simplicity, memory saving, but it causes the network traffic to be heavily loaded, and prevent the destination terminal from efficient acknowledgement The authors chose this approach because a statistical result in [7] shows that only about 5% of TCP packets are out-of-sequence However, though the percentage of out-of-sequence connection is little, these connections are usually long connections; the number of retransmitted packets can be very large
Out-of-order TCP stream scanning [3]
In this research, the authors design a TCP stream scanning engine which does not require the packets to be re-ordered There are two schemes which are introduced in the paper, the Two-edge buffering scheme and the One-edge buffering scheme In Two-edge buffering scheme, the system stores l-1 data byte at both the starting edge and ending edge of each TCP fragment; assumed l is the longest length of patterns If the preceding packet of the fragment arrives, the matching engine will scan the packet and then the starting edge of the fragment which is buffered in memory If the succeeding packet of the fragment arrives, the scanning engine will scan the ending edge of the fragment and then the packet In one edge buffering scheme, the system stores l-1 data bytes at the starting edge of each TCP fragment, and the final state of the Finite State Machine (FSM) of the matching engine Similar to the Two-edge buffering scheme, the preceding packet of a fragment is scanned first and then the starting edge of the fragment But when the succeeding packet of the fragment arrives, the system only restores the FSM and then scans the packet However, this technique can only be applied to static scanning engine because the maximum length of a pattern is priory known In practice, many applications use both static pattern and regular expression (RE) The length of a string, which matches an RE, cannot be priory known; Therefore, this technique is not applicable to RE The advantages of this method are that the system does not need to re-order packets This system also solve the problem of packet normalization, this is to prevent the situation of inconsistent retransmission of packets
Robust TCP reassembly for backbone traffic [1]
In another approach [1], the authors use a buffer for each out-of-sequence connection
The size of the buffer is fixed and every out-of-sequence connection has only one buffer If an out-of-sequence TCP packet comes, its sequence number is used to
Author: Tran Huy Vu compute the offset from the start of the buffer to store the packet This method is not efficient because a large packet cannot be contained in a buffer, but a tiny packet can waste a lot of memory in the buffer This method may require a lot of memory, and does not support large number of concurrent out-of-sequence connections For example, the system has to hold just 64 thousands connections simultaneously, the percent of out-of-sequence connections is 5%, and the recommended buffer size is 64KB, so the total necessary memory is about 205MB.
TCP reassembly for Sachet IDS [5]
Using the similar approach of buffer, the TCP reassembly in [5] uses a linked list to store out-of-sequence packets, but the control information of a packet is store in SRAM Moreover, the data structure of reassembly memory is a linked list of separate packets This structure is not memory-efficient, because the system has to reserve a memory block which is large enough to store the largest packet; in this case the largest Ethernet packet is 1500 bytes This buffer is reserve for only one packet, so two or more small packets cannot share the same buffer This system supports quite few connections simultaneously It requires 1 MB SRAM and 93.75 MB DRAM to hold 64K packets, so the number of connections can be fewer.
Robust TCP stream reassembly of Sarang Dharmapurikar and Vern Paxson [2] - 20 -
Sarang Dharmapurikar and Vern Paxson in [2] limited the number of holes in a connection to only one hole; it means that if a packet arrives and creates another hole in the same connections, it will be dropped The system also uses linked list to store out-of-sequence packets, which are all stored in DRAM In this system, the memory is divided into blocks; each packet can be stored in more than 2 blocks, 1 block can
Author: Tran Huy Vu contain more than 2 packets, so the memory utilization is more efficient The reason that they decide to supports only one hole is based on their studies that more than 95% of out-of-sequence connections contain only one hole However, newer studies in [1] show that number of 2-hole connections are 7.3% of out-of-sequence connections in CAIDA_10G, and even 17.8% in WA_1G In this design, the buffer information is compressed to be stored in the connection record, so it is difficult to scale up the system, such as supporting multi-hole connections In this research, the packet normalization is also solved, this is one of things makes this system robust
Method of TCP reassembly
Method for re-ordering TCP packets
To buffer the out-of-sequence packets, our system also uses a linked list of memory blocks to store the payload of packets in the same segment Several small packets can be stored in a single block, and a very large packet can be store in more than 1 block linked together Because all packets in a segment are locally ordered, the payload of each packet can be consecutively stored with the payload of the previous packet as illustrated in Figure 4-1
Figure 4-1 A segment of packet S, packet S+1, packet S+2 in a linked list
We use an array, called segment array, to manage the segments; each element of the array contains information of the linked list corresponding to a segment Each out-of- sequence connection is allocated with only one segment array, and normal connections have no segment array The resulted structure is described in Figure 4-2 In this way, we can take the advantages of DRAM because read and write operations on DRAM are proceeded in burst In our system, we decide to support more than 1 concurrent hole in a single connection The maximum concurrent holes in a connection equals to the array size A practical array size is 4, so the system can support up to 4 concurrent holes in a connection Studies [1] show that about 99% out-of-sequence connections have less than 4 concurrent holes If a connection has too many holes, buffering all out-of- sequence packets is not as efficient as dropping the packets so that the source machine will retransmit packets to fill some holes Therefore, limiting the number of holes to the maximum number is more practical than supporting unlimited concurrent holes in a connection In the Figure 4-2, a 4-hole out-of-sequence connection is stored in reassembly memory; the first linked list including Blk.3 and Blk.3 stores the first segment of the connection, the second linked list including Blk.2 stores the second segment and so on …
Figure 4-2 Data structure of reassembly memory with a 4-hole out-of-sequence connection
Blk.7 Seg.0 Seg.1 Seg.2 Seg.3
Each segment array element contains following information
Start seq is the sequence number of the first byte of the segment
Next seq is the next expected sequence number of the segment For example, the start sequence number is 10 and the length of the segment is 20, so next sequence number will be 30
Head is address of the first byte of the segment in DRAM
Tail is address of the last byte of the segment in DRAM
For fast retrieving of each segment, we construct the segment array in a special way
All sequence fields are consecutive, all address fields are consecutive, and so the sequence fields and the address fields of a segment are not consecutive as described in Figure 4-3 With this data structure, all the sequence fields, Start seq.0 to Start seq.3 and Next seq.0 to Next seq.3, can be read in one DRAM access; so the sequence number of the arriving packet can be compared quickly with these sequence fields to determine the segment to insert the packet or the segment to be read out Moreover, this data structure help the system operate on the sequence fields and the address fields
Start Seq.0 Next Seq.0 Start Seq.1 Next Seq.1 Start Seq.2 Next Seq.2 Start Seq.3 Next Seq.3
Head 0 Tail 0 Head 1 Tail 1 Head 2 Tail 2 Head 3 Tail 3 Seg.0
Figure 4-3 Structure of the segment array Each element is divided and store at different place
Author: Tran Huy Vu separately When a packet comes, the first stage of the system accesses sequence fields quickly, determines the action to be issued, and requests the second stage to proceed the action This operation can be done in one DRAM access The real operation on the packet payload, which can require much time, is carried out by the second stage independently Because of this independence, the reassembly operation is pipelined, and does not impact on the throughput of the system
In order to manage the data in a segment efficiently We divide memory space into blocks; the structure of a block is simple as in Figure 4-4
Data len is the number of valid data bytes stored in the block
Next ptr is the address of next block in the same linked list The address of the linked list and the Next ptr are not necessarily aligned with the block address; this situation occurs when the payload of a packet is pre-pended to an existing linked list
The payload of each segment is stored in a linked list of blocks If the payload of a packet is larger than the block size, it is stored in 2 or more blocks, these two blocks are linked together by using the Next ptr of the first block pointing to the second
Figure 4-4Structure of a memory block If a packet does not used a whole block, others packets in the same segment can fill in the block
Author: Tran Huy Vu block If the payload of a packet is smaller than the block size, the next packet in the same segment can fill in that block
One of the main functions of our system is to reassemble TCP packets in the same connection; therefore it needs to manage the status of each connection In order to fulfill this function, the system also maintains a connection record for each connection
The connection record is used to trace the status of a connection When a packet arrives, the system reads the corresponding connection record, compares sequence number of the packet and the current expected sequence number of the connection to determine whether the packet is out-of-sequence or not Moreover, each connection record must also store the address of its out-of-sequence buffer and the state of application FSM We pack all necessary information of a connection into 32-byte record, called connection record as in Figure 4-5 Actually, the total size of a connection record is 24 bytes, so there are 8 unused bytes reserved for future use
Source IP, Dest IP, Source Port, Dest Port are IP addresses and TCP ports in a connection correspondingly These 4 fields are used as identifiers of a connection; however, the total length of these 4 fields is 96 bit, it cannot be used as the address to access the memory Hashing technique is used to calculate the address of the connection record When a packet arrives, a 18-bit hash value is computed on these 4 fields This hash value is used as the address to access
Src IP Src Port Dst Port
Figure 4-5 Structure of a connection record
Author: Tran Huy Vu connection record in DRAM Because of hash collision, the 4 fields of the packets are then compared to the 4 fields of the connection record to verify if the connection record belongs to the current connection
Sequence field is the next expected sequence number, which is in in-sequence
For example, packet #0 and packet #1 arrive, packet #2 does arrives due to error, packet #3 arrives The sequence number of packet #0 is 1, the length is 10 The sequence number of packet #1 is 11, the length is 20 The sequence number of packet #3 is 51, the length is 20 In this case, the sequence field stored in the connection record is 31, it is the next expected number of the packet #1
Flags field contains EST flag, SYN flag and ACK flag EST flag indicates whether the record is valid or not SYN flag indicates the source machine have sent SYN packet, and ACK flag indicates the destination machine have sent ACK packet in the 3-way handshake scheme When the first SYN packet of a new connection arrives, if the connection record is available (EST flag is 0), EST flag and SYN flag is set to 1, and the 4 fields of the packet are written to the connection record If a packet with ACK flag arrives and the EST flag of the connection record is 1, the ACK flag in the connection record is set to 1 If a SYN packet arrives, and hash collision occurs, the next condition is checked If the connection, which occupies the connection record, has not finished the 3- way hand shaking, the new connection will occupy the connection record; otherwise, the packet of the new connection is dropped
Buffer address is the address of a segment array as describes above Because only out-of-sequence packets need to be buffered in reassembly memory, so if a connection is not out-of-sequence, the Buffer address field is set to null When the first out-of-sequence of a connection arrives, the system allocates a new segment array, and the address of the segment array is written to Buffer address
When the segment array of a connection does not store any data, it is released, and the Buffer address is set to null
App FSM is the state of the application circuit When the application circuit switches from a connection to another (context switch), its FSM is stored to app FSM field of the current connection, and restored by the app FSM of the new connection
Method for reassembling TCP flow
The reassembly of TCP flow is to make the packets in the same TCP flow logically consecutive as described in chapter 2 This is necessary because attack patterns can expand over packets If the NIDS use an algorithm like Aho-Corrasick which implements an explicit FSM, the system needs to store only the FSM at the end of a packet and restore the FSM when the next packet arrives But if the NIDS does not
Figure 4-9 Releasing of e segment array
Segment array null null null null
Block 4 Block 3 vu nm lkji zyxw ts rqpo seg.0 null null null hgfe
Author: Tran Huy Vu implements an explicit FSM like the NIDS at HCMUT (refer to chapter 2), the system has to apply another technique to reassemble TCP flows In this thesis, reassembling TCP flows is carried out by the combination of both re-ordering TCP packets and modified Two-edge scheme [3] The original Two-edge scheme requires storing the first and the last l-1 bytes of the payload as shown in Figure 4-10
However, if TCP packets are ordered, the situation in the Figure 4-10 b) never occurs, so the first l-1 bytes of payload are not need to be stored In the system in this thesis, all TCP packets passing through the system have been ordered, so it needs to store the last l-1 bytes of payload only The activity of buffering and loading edge data is described in the Figure 4-11
Figure 4-10 The original one-edge buffering scheme, l = 6; a) the successive packet arrives, b) the preceded packet arrives ponmlkji packet 1 l-1 l-1 packet 0 hgfedcba scan packet 0 + edge data kji packet 2 xwvutsrq scan edge data + packet 2 pon a) b) header qponmlkjihgfe Edging dcba header buffer qponmlkjihgfe dcba qpon
Figure 4-11 Modified Two-edge buffering scheme for ordered
System implementation
The Input Controller module
{Src.IP, Dst.IP, Src.Port, Dst Port…}
Reassembler Src.IP, Sst.IP,
Packets Packets payload Req, cmd
Figure 5-1 Block diagram of the Preprocessor
The Input controller acts as an extractor, it receives packets from the network, and forwards the packets to the Packet manager; the packets must be buffered to wait for other modules check the reassembly buffer Input controller also extracts information from the packet header concurrently as illustrated in Figure 5-1 It first checks each packet, if the packet is an IP packet; it sets three signals to indicate the validation of protocol, source IP address and destination IP address It then checks for TCP and UDP packets If the packet is UDP or TCP packet, some other signals are set to indicate the validation of source port, destination port If the packet is a TCP packet, the flag, sequence, acknowledgement validation signals are set as well The Input controller also calculates the TCP checksum for the packet, and informs the checksum error to the Flow controller
When implemented on NetFPGA board, the Input controller actually receives packets from a pre-designed receive FIFO This FIFO is connected to a Tri-mode Ethernet Mac module which handles the interface with Ethernet ports The interface of the FIFO is as the following:
rx_ll_data: 8-bit data
rx_ll_src_rdy_n: Active-low signal, this signal indicates the availability of rx_ll_data
rx_ll_dst_rdy_n: Active-low signal, this signal informs the FIFO to output the next data
rx_ll_sof_n: Active-low signal, it indicates the start-of-frame flag
rx_ll_eof_n: Active-low signal, it indicates the end-of-frame flag
The Packet Manager module
The Packet manager simply functions as a FIFO to buffer the packets from the Input controller, or send the packets to the Output controller as in Figure 5-3 However, it has to be capable of removing a whole packet according to the request from the Output controller To solve this problem, the Packet manager does not use a normal FIFO, but it uses a simple dual -port block ram to implement the packet FIFO Two pointers are used as the head and tail of a circular FIFO The input data is written to the block ram at the address pointed by tail, and then tail is increased by 1 The output data is read at the address pointed by head, and then head is increased by 1 The other FIFO is to store the length of packets in the packets FIFO If the Packet manager is requested to drop a packet, it reads a number from length FIFO, and simply increases the head pointer by this length.
The Flow Controller module
Cmd, parameter Reassembler Connection record
Figure 5-4 The Flow Controller req
Output controller Packet manager Packets
The function of Flow Controller is to manage connection records, and send instructions to Reassembler to operate on the reassembly memory It receives 4 fields in the header (source IP, destination IP, source port, destination port) from Input controller; it calculates hash value from the 4 fields and uses this value as the address to accesses the connection record via Memory controller If hash collision occurs, its operation is described in chapter 4 If the 4 fields of the packet match the 4 fields of the connection record, It then compares the sequences to decide an appropriate action on the packet There may be 4 types of actions: dropping a packet, forwarding a packet, buffering a packet and sending a packet to application
If the sequence number of the packet equals to the sequence number stored in the connection record, the packet is judged as in-sequence The Flow controller will request the Reassembler to send the packet to application circuit It also checks the sequence fields in the segment array to determine if there is any sequence number in segment array equals to the next expected sequence number of the packet If there is any segment consecutive with the packet, the Flow controller sends extra parameter to Reassembler to indicate which segment to be read out
If the sequence number of the packet smaller than the sequence number in the connection record, it is judged as a retransmission and the Flow controller requests Reassembler to forward it to the network
If the sequence number of the packet is larger than the sequence number in the connection record, it is considered an out-of-sequence packet If the buffer address field in the connection record is null, Flow controller will allocate a new segment array The Flow controller read the segment array to check if the sequence number of the packet is consecutive with any segment If it finds any, it will request Reassembler to insert the payload of the packet to that segment;
Author: Tran Huy Vu the extra parameter indicates which segment to be inserted If it cannot find any segment that matches the packet, the first null segment will be chosen to insert the packet payload If there is not any place for the packet, the packet is dropped
If the checksum result indicates an error in the packet, it is dropped, because the destination will also drops it, and the source machine will retransmit the packet
The Reassembler module
Reassembler manages reassembly memory The management of reassembly memory is described in Chapter 4 Reassembler receives commands from Flow controller and requests Output controller to read out or remove a packet packets If it receives a command to send a packet to application circuit, the extra parameters indicate whether
Memory controller Edge data Out-of- sequence data Reassembler req Output controller packet cmd
Author: Tran Huy Vu to read out a segment from DRAM or not If it receives a command to buffer a packet, the extra parameters indicate the position to insert the packet It uses a FIFO (Rsm.FIFO) to keep packets to be buffered, and a FIFO (Dsm.FIFO) to keep packets read out from memory Another FIFO (App.FIFO) is used to keep packets from the Output controller When a packet is completely read out from Pkt FIFO and sent to application, the Dsm.FIFO is read if it has any data Reassembler can request Output Controller to drop a packet if that packet creates more than 2 holes in the corresponding connection, or if that packet does not pass the checksum.
The Memory controller module
The main function of Memory controller is to arbitrate read and write requests from
Flow Controller and Reassembler; it also controls the allocating or releasing of a memory blocks The real packet payload is stored in memory blocks; size of memory block is 1KB The segment array size is 32 bytes Memory Controller maintains 2 FIFOs to store addresses of memory blocks and segment arrays correspondingly One FIFO is used for allocating and releasing of memory blocks, the other is used for allocating and releasing of segment arrays When a memory block or a segment array is requested, Memory Controller reads a corresponding FIFO, and return the address which has been read out When a memory block or segment array is released, the address of the block or segment array is written to the corresponding FIFO In future, if
DDR2 Memory controller req cmd
Author: Tran Huy Vu we change to another platform with more memory, we can implement a FIFO in DRAM and use a FIFO in Block Ram as cache of DRAM
To interface with the DDR2 SDRAM on the NetFPGA board, the Memory controller uses a pre-design module, the ddr_controller in the reference_design of NetFPGA package.
The Output controller module
Output Controller requests Packet Buffer to remove a packet or read out a packet A packet can be sent to both network and Reassemler, or sent to network only Any in- sequence packet is sent to both network and the Reassembler Out-of-sequence packets are sent to Reassembler only, and retransmitted packets are sent to network only For any packets that do not pass the checksum, Output Controller request Packet Buffer to remove that packet from the buffer
Reassembler req cmd packet Packet manager req cmd packet
Evaluation of the TCP Reassemble Engine
Deployment model
The Figure 6-1 describes the deployment model of a full system, in which the preprocessor and the NIDS are integrated to a NetFPGA board This model requires a
Figure 6-1 Deployment model of the preprocessor and NIDS
Author: Tran Huy Vu full system test which will be available after all component of the system has finished
However, individually testing the Preprocessor can be done by a simpler model as in Figure 6-2
Using the testing model in Figure 6-2, the system can re-order out-of-sequence packets and store and load the edge data For batch testing, we also simulate our design with many data patterns Figure 6-3 shows the incoming packet has 32 ending bytes of
, and the Figure 6-4 shows that the next in-sequence packet is pre- pended with these 32 bytes This is for the edge storing testing The Post-Route simulation gives the correct result when we simulate our design in many data patterns
Figure 6-3 The incoming paket, rx_ll_data holds the data
Figure 6-2 Individual test for the
Figure 6-4 The payload of the output packet is inserted with the last 32 bytes of the previous packet
Experimental result
Based on experimental we compare our system to other systems In this section, we do not compare our system with the system in [3] because that system use the method of out-of-order matching, and thus do not need to reorder out-of-sequence packets We assume the network traffic conforms to the CAIDA_10G in [1] These statistics are recorded by the Cooperative Association for Internet Data Analysis in 2009
Table 6-1 shows that our system can support 96.9% of out-of-sequence connections compared to 89.6% in system in [2] On the other hand, our system can easily scale up to support 4-hole connections; in that case the ratio of supported out-of-sequence connections can exceed 98.8% Theoretically, the fixed length buffer method [1] and simple linked list method [5] can support more than 4 concurrent holes in a single connection However, the number of connections with more than 4 holes is very small, so dropping packets which create more than 4 holes in a connection is more practical
Table 6-1 Percentage of supported connection types of the TCP Reassembly
Number of holes percent age
Simple linked list [5] Our system
Table 6-2 Memory utilization of the TCP Reassembly Engine and other systems for single-hole connections only
No of out of sequence packets in a single hole connection
Simple linked list [5] Our system
In Table 2, we compare the reassembly memory utilization of our system with other systems Because the design in [2] supports only single-hole connections, we compare the memory utilization for single-hole connections only The first column is the number of out-of-sequence packets in a single-hole connection Based on the statistical data in [1], the mean packet size is 441 bytes, so the memory utilization of each system is calculated corresponding to this mean packet size In fixed length buffer method [1], each out-of-sequence connection has a fixed length buffer Based on the experimental result of the authors, the minimum size of buffer is 16KB, so buffering 64K
Author: Tran Huy Vu connections requires 1024MB of memory In 1-hole linked list method [2], the page size is 2KB, so the memory requirement is calculated as in Table 6-2 If this system uses the page size of 1KB, the memory requirements is a little smaller than the memory requirement of our system, about 2MB However, this system cannot handle connections with more than 1 hole In simple linked list method [5], the system uses linked list of packets, so the system has to reserve a large space enough to store a biggest packet For example, the maximum packet size is 1500B, so the memory requirement can be calculated as in Table 6-2 The Table 6-2 shows that our system uses memory more efficiently than the others
Figure 6-5 Maximum throughput of the system when percentage of out-of- sequence packets is 0%
0 500 1000 1500 2000 edge length = 8 edge length = 16 edge length = 32
Figure 6-6 Throughput of the system with clock rate5MHz when percentage of out-of-sequence packets is 0%
Figure 6-7 Maximum throughput of the system when percentage of out-of- sequence packets is 5%
0 500 1000 1500 2000 edge length = 8 edge length = 16 edge length = 32
0 200 400 600 800 1000 1200 1400 1600 edge length = 8 edge length = 16 edge length = 32
Figure 6-8 Throughput of the system with clock rate = 125MHz when percentage of out-of-sequence packets is 5%
Figure 6-9 Maximum throughput of the system when percentage of out-of- sequence packets is 10%
0 200 400 600 800 1000 1200 1400 1600 edge length = 8 edge length = 16 edge length = 32
0 200 400 600 800 1000 1200 1400 1600 edge length = 8 edge length = 16 edge length = 32
Figure 6-10 Throughput of the system with clock rate = 125MHz when percentage of out-of-sequence packets is 10%
The throughput of the TCP Reassembly Engine depends on the average length of packets, the length of edge data, the percentage of out-of-sequence packets, as well as the capability of the application circuit to receive multiple bytes in a clock cycle At present, the targeted NIDS application is a system developed by another group at the University of Technology; it can receive only 1 byte in a clock cycle, so it is the bottle neck of the system With short packets, the ratio of the edge length and the packet length is large, it means that the throughput varies in a wider range, so the system are tested with more short packets than long packets The payload length of packets in these charts are 10, 20, 50, 100, 500, 1000 and 1400 bytes The edge length is chosen from 8, 16 and 32 bytes because the current system supports maximum 32 bytes of edge data, we are considering increasing this number when we can change to a new platform with more memory The percentage of out-of-sequence packets (P) is typically smaller than 10% Currently the throughput is measured by implementing two
0 200 400 600 800 1000 1200 1400 1600 edge length = 8 edge length = 16 edge length = 32
Author: Tran Huy Vu counters, the first counter is used to count the number of clock ticks to receive all packets, the second counter counts the number of clock ticks to transmit all packets to network The throughput is calculated by this equation: T = counter0*1000/counter1
This is a rough number; the exact number will be measured by sending packets to the full system, increase the speed until the system start to drops packets The Figure 6-3 to Figure 6-8 displays the maximum throughput and the onboard throughput of the system at P = 0%, 5% and 10% The Figure shows that, the longer the packet is the higher throughput the system can support On the other hand, the shorter the edge data is the higher throughput the system can support
Figure 6-11 Number of rules with different length
The Figure 6-6 describes the number of rules with different length in the rule set which is supported by the NIDS at HCMUT Currently this NIDS uses a sub-set of Snort rule set The Figure 6-6 shows that the number of rules with the length less than 32 is more than 80% the rule set Moreover, the Figure 6-3 to 6-5 shows that the edge length of 32 does not affect the throughput very much However, this number should be decided with the full system test This full system test will be done right after the full system has finished
Conclusion and future work
In this paper, we present a technique of TCP reassembly We focus on multi-linked list method to manage the memory of out-of-sequence packets, which is efficient and easy to scale up in the future Our system can hold about 256K concurrent connections and 46K out-of-sequence connections with only 64MB DRAM Our architecture supports connections with multi concurrent holes, and it can support up to 99% of out-of- sequence connections Moreover, if we need to change the system to support more concurrent holes in a connection, we just need to change the size of the segment array
In the future, we plant to implement some DoS prevention function to be integrated to our TCP preprocessor We also improve the mechanism to deal with the hash collision in the Flow Controller Besides, we intend to implement another function to avoid inconsistent re-transmission to make our system robust to attacker
[1] Ruan Yuan, Yang Weibing, Chen Mingyu, Zhao Xiaofang, Fan Jianping – Robust TCP Reassembly with a Hardware-based Solution for Backbone Traffic, the Fifth IEEE International Conference on Networking, Architecture and Storage, 2010, pp.439-447
[2] Dharmapurikar, S., & Paxson – Robust TCP Reassembly in the Presence of Adversaries, Proceedings of the 14th conference on USENIX Security Symposium Volume 14, 2005, pp.65-80
[3] Sugawara Y., Inaba M., Hiraki, K – High-speed and Memory Efficient TCP Stream Scanning Using FPGA, Field Programmable Logic and Applications International Conference, 2005, pp.45-50
[4] David V Schuehler - Techniques for Processing TCP/IP Flow Content in Network Switches at Gigabit Line Rates, Doctoral Dissertation, December 2004
[5] Palak Agarwal – Tcp Stream reassembly and web base gui for Sachet IDS, Master Thesis, Department of Computer Science and Engineering, Indian Institute of Technology, Kanpur, India, 2007
[6] Hao Chen, Yu Chen, Douglas H.Summerville – A survey on the Application of FPGAs for Network Infrastructure Security, the IEEE Communications Surveys and Tutorials, 2010, pp.1-21
[7] S Jaiswal, G Iannaccone, C Diot, J Kurose, and D Towsley Measurement and classification of out-of-sequence packets in a tier-1 IP backbone Technical Report CS Dept Tech Report 02-17, UMass, May 2002, pp.54-66
[8] T.N.Thinh and S.Kittitornkun Massively parallel Cuckoo pattern matching applied for NIDS/NIPS Fifth IEEE International Symposium on Electronic Design, Test and Application, 2010, pp.217-221
[9] Master thesis of T.T.Hieu, L.H.Long, V.T.Tai, Research, design and implement a Regular Expression processing system on FPGA for the Network Intrusion Detection System NIDS
[10] http://www.interpol.int/Crime-areas/Cybercrime/Cybercrime [11] http://en.wikipedia.org/wiki/Computer_crime
[12] http://www.real-time.com/linuxsolutions/nids.html [13] http://ciscosecurity.org.ua/1587051672/ch10lev1sec2.html [14] http://www.sourcefire.com/security-technologies/snort on Information and Communication Technology 2011 (SoICT 2011), Hanoi, Vietnam, October
2 Tran Huy Vu, Tran Ngoc Thinh, Nguyen Quoc Tuan, Nguyen Tran Huu Nguyen, “AN EFFICIENT TCP REASSEMBLY TECHNIQUE ON FPGA” International Conference on
Advanced Computing and Applications 2011 (ACOMP 2011), Ho Chi Minh city, Vietnam,
3 Tran Huy Vu, Tran Ngoc Thinh, Nguyen Quoc Tuan, Nguyen Tran Huu Nguyen, “AN EFFICIENT TCP REASSEMBLY TECHNIQUE ON FPGA” Vietnamese Academy of Science and Technology, Journal of Science and Technology, ISSN0866708x, Vol 49, No.4A, 2011, pp
Name Tran Huy Vu Date and Place of Birth 09 Dec.1986
Dept of Computer Engineering, Faculty of Computer Science & Engineering, Ho Chi Minh City University of Technology (HCMUT)
Title of position held Lecturer
Address: Block A3, 268 Ly Thuong Kiet Street, District 10, Hochiminh City, Vietnam
Tel 84-8-3856489- ext 5843 Fax E-mail: vutran@cse.hcmut.edu.vn
2 Educational Qualifications 2-1 Academic Qualification (Repeat as necessary)
Degree: Bachelor of Engineering Year: 2009 Field: Computer Engineering Name of Institution: Ho Chi Minh city University of Technology (HCMUT)
Number of years of experience in the field related to the project: 3 years Field of specialization: chip design