1. Trang chủ
  2. » Công Nghệ Thông Tin

Tài liệu Understanding NETWORK INTERNALS LINUX pptx

1.1K 2.7K 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Cấu trúc

  • Table of Contents

  • Preface

    • The Audience for This Book

    • Background Information

    • Organization of the Material

      • What Is Not Covered

    • Conventions Used in This Book

    • Using Code Examples

    • We’d Like to Hear from You

    • Safari Enabled

    • Acknowledgments

  • Part I

  • Introduction

    • Basic Terminology

    • Common Coding Patterns

      • Memory Caches

      • Caching and Hash Tables

      • Reference Counts

      • Garbage Collection

      • Function Pointers and Virtual Function Tables (VFTs)

      • goto Statements

      • Vector Definitions

      • Conditional Directives (#ifdef and family)

      • Compile-Time Optimization for Condition Checks

      • Mutual Exclusion

      • Conversions Between Host and Network Order

      • Catching Bugs

      • Statistics

      • Measuring Time

    • User-Space Tools

    • Browsing the Source Code

      • Dead Code

    • When a Feature Is Offered as a Patch

  • Critical Data Structures

    • The Socket Buffer: sk_buff Structure

      • Networking Options and Kernel Structures

      • Layout Fields

      • General Fields

      • Feature-Specific Fields

      • Management Functions

        • Allocating memory: alloc_skb and dev_alloc_skb

        • Freeing memory: kfree_skb and dev_kfree_skb

        • Data reservation and alignment: skb_reserve, skb_put, skb_push, and skb_pull

        • The skb_shared_info structure and the skb_shinfo function

        • Cloning and copying buffers

        • List management functions

    • net_device Structure

      • Identifiers

      • Configuration

        • Interface types and ports

        • Promiscuous mode

      • Statistics

      • Device Status

      • List Management

      • Link Layer Multicast

      • Traffic Management

      • Feature Specific

      • Generic

      • Function Pointers

    • Files Mentioned in This Chapter

  • User-Space-to-Kernel Interface

    • Overview

    • procfs Versus sysctl

      • procfs

      • sysctl: Directory /proc/sys

        • Examples of ctl_table initialization

        • Registering a file in /proc/sys

        • Core networking files and directories

    • ioctl

    • Netlink

    • Serializing Configuration Changes

  • Part II

  • Notification Chains

    • Reasons for Notification Chains

    • Overview

    • Defining a Chain

    • Registering with a Chain

    • Notifying Events on a Chain

    • Notification Chains for the Networking Subsystems

      • Wrappers

      • Examples

    • Tuning via /proc Filesystem

    • Functions and Variables Featured in This Chapter

    • Files and Directories Featured in This Chapter

  • Network Device Initialization

    • System Initialization Overview

    • Device Registration and Initialization

    • Basic Goals of NIC Initialization

    • Interaction Between Devices and Kernel

      • Hardware Interrupts

        • Interrupt types

        • Interrupt sharing

        • Organization of IRQs to handler mappings

    • Initialization Options

    • Module Options

    • Initializing the Device Handling Layer: net_dev_init

      • Legacy Code

    • User-Space Helpers

      • kmod

      • Hotplug

        • /sbin/hotplug

    • Virtual Devices

      • Examples of Virtual Devices

      • Interaction with the Kernel Network Stack

    • Tuning via /proc Filesystem

    • Functions and Variables Featured in This Chapter

    • Files and Directories Featured in This Chapter

  • The PCI Layer and Network Interface Cards

    • Data Structures Featured in This Chapter

    • Registering a PCI NIC Device Driver

    • Power Management and Wake-on-LAN

    • Example of PCI NIC Driver Registration

    • The Big Picture

    • Tuning via /proc Filesystem

    • Functions and Variables Featured in This Chapter

    • Files and Directories Featured in This Chapter

  • Kernel Infrastructure for Component Initialization

    • Boot-Time Kernel Options

      • Registering a Keyword

      • Two-Pass Parsing

      • .init.setup Memory Section

      • Use of Boot Options to Configure Network Devices

    • Module Initialization Code

      • Old Model: Conditional Code

      • New Model: Macro-Based Tagging

    • Optimized Macro-Based Tagging

      • Initialization Macros for Device Initialization Routines

    • Boot-Time Initialization Routines

      • xxx_initcall Macros

        • Example of __initcall and __exitcall routines: modules

        • Example of dependency between initialization routines

        • Legacy code

    • Memory Optimizations

      • __init and __exit Macros

      • xxx_initcall and __exitcall Sections

      • Other Optimizations

      • Dynamic Macros’ Definition

    • Tuning via /proc Filesystem

    • Functions and Variables Featured in This Chapter

    • Files and Directories Featured in This Chapter

  • Device Registration and Initialization

    • When a Device Is Registered

    • When a Device Is Unregistered

    • Allocating net_device Structures

    • Skeleton of NIC Registration and Unregistration

    • Device Initialization

      • Device Driver Initializations

      • Device Type Initialization: xxx_setup Functions

      • Optional Initializations and Special Cases

    • Organization of net_device Structures

      • Lookups

    • Device State

      • Queuing Discipline State

      • Registration State

    • Registering and Unregistering Devices

      • Split Operations: netdev_run_todo

      • Device Registration Status Notification

        • netdev_chain notification chain

        • RTnetlink link notifications

    • Device Registration

      • register_netdevice Function

    • Device Unregistration

      • unregister_netdevice Function

      • Reference Counts

        • Function netdev_wait_allrefs

    • Enabling and Disabling a Network Device

    • Updating the Device Queuing Discipline State

      • Interactions with Power Management

        • Suspending a device

        • Resuming a device

      • Link State Change Detection

        • Scheduling and processing link state change events

        • Linkwatch flags

    • Configuring Device-Related Information fromUserSpace

      • Ethtool

        • Drivers that do not support ethtool

      • Media Independent Interface (MII)

    • Virtual Devices

    • Locking

    • Tuning via /proc Filesystem

    • Functions and Variables Featured in This Chapter

    • Files and Directories Featured in This Chapter

  • Part III

  • Interrupts and Network Drivers

    • Decisions and Traffic Direction

    • Notifying Drivers When Frames Are Received

      • Polling

      • Interrupts

      • Processing Multiple Frames During an Interrupt

      • Timer-Driven Interrupts

      • Combinations

      • Example

    • Interrupt Handlers

      • Reasons for Bottom Half Handlers

      • Bottom Halves Solutions

      • Concurrency and Locking

      • Preemption

      • Bottom-Half Handlers

        • Bottom-half handlers in kernel 2.2

        • Bottom-half handlers in kernel 2.4 and above: the introduction of the softirq

      • Tasklets

      • Softirq Initialization

      • Pending softirq Handling

        • __do_softirq function

      • Per-Architecture Processing of softirq

      • ksoftirqd Kernel Threads

        • Starting the threads

      • Tasklet Processing

      • How the Networking Code Uses softirqs

    • softnet_data Structure

      • Fields of softnet_data

      • Initialization of softnet_data

  • Frame Reception

    • Interactions with Other Features

    • Enabling and Disabling a Device

    • Queues

    • Notifying the Kernel of Frame Reception: NAPI and netif_rx

      • Introduction to the New API (NAPI)

      • net_device Fields Used by NAPI

      • net_rx_action and NAPI

      • Old Versus New Driver Interfaces

      • Manipulating poll_list

    • Old Interface Between Device Drivers and Kernel: First Part of netif_rx

      • Initial Tasks of netif_rx

      • Managing Queues and Scheduling the Bottom Half

    • Congestion Management

      • Congestion Management in netif_rx

      • Average Queue Length and Congestion-Level Computation

    • Processing the NET_RX_SOFTIRQ: net_rx_action

      • Backlog Processing: The process_backlog Poll Virtual Function

      • Ingress Frame Processing

        • Handling special features

  • Frame Transmission

    • Enabling and Disabling Transmissions

      • Scheduling a Device for Transmission

      • Queuing Discipline Interface

        • qdisc_restart function

      • dev_queue_xmit Function

        • Queueful devices

        • Queueless devices

      • Processing the NET_TX_SOFTIRQ: net_tx_action

        • Watchdog timer

  • General and Reference Material About Interrupts

    • Statistics

    • Tuning via /proc and sysfs Filesystems

    • Functions and Variables Featured in This Part of the Book

    • Files and Directories Featured in This Part of the Book

  • Protocol Handlers

    • Overview of Network Stack

      • The Big Picture

      • Link Layer Choices for Ethernet (LLC and SNAP)

      • How the Network Stack Operates

    • Executing the Right Protocol Handler

      • Special Media Encapsulation

    • Protocol Handler Organization

    • Protocol Handler Registration

    • Ethernet Versus IEEE 802.3 Frames

      • Setting the Packet Type

      • Setting the Ethernet Protocol and Length

      • Logical Link Control (LLC)

        • The IPX case

        • Linux’s LLC implementation

        • Processing ingress LLC frames

      • Subnetwork Access Protocol (SNAP)

    • Tuning via /proc Filesystem

    • Functions and Variables Featured in This Chapter

    • Files and Directories Featured in This Chapter

  • Part IV

  • Bridging: Concepts

    • Repeaters, Bridges, and Routers

    • Bridges Versus Switches

    • Hosts

    • Merging LANs with Bridges

    • Bridging Different LAN Technologies

    • Address Learning

      • Broadcast and Multicast Addresses

      • Aging

    • Multiple Bridges

      • Bridging Loops

      • Loop-Free Topologies

      • Defining a Loop-Free Topology

  • Bridging: The Spanning Tree Protocol

    • Basic Terminology

    • Example of Hierarchical Switched L2 Topology

    • Basic Elements of the Spanning Tree Protocol

      • Root Bridge

      • Designated Bridges

      • Spanning Tree Ports

        • Port states

        • Port roles

    • Bridge and Port IDs

    • Bridge Protocol Data Units (BPDUs)

      • Configuration BPDU

      • Priority Vector

      • When to Transmit Configuration BPDUs

      • BPDU Aging

    • Defining the Active Topology

      • Root Bridge Selection

      • Root Port Selection

      • Designated Port Selection

      • Examples of STP in Action

    • Timers

      • Avoiding Temporary Loops

    • Topology Changes

      • Short Aging Timer

      • Letting All Bridges Know About a Topology Change

      • Example of a Topology Change

    • BPDU Encapsulation

    • Transmitting Configuration BPDUs

    • Processing Ingress Frames

      • Ingress BPDUs

      • Ingress Configuration BPDUs

    • Convergence Time

    • Overview of Newer Spanning Tree Protocols

      • Rapid Spanning Tree Protocol (RSTP)

      • Multiple Spanning Tree Protocol (MSTP)

  • Bridging: Linux Implementation

    • Bridge Device Abstraction

    • Important Data Structures

    • Initialization of Bridging Code

    • Creating Bridge Devices and Bridge Ports

    • Creating a New Bridge Device

    • Bridge Device Setup Routine

    • Deleting a Bridge

    • Adding Ports to a Bridge

      • Deleting a Bridge Port

    • Enabling and Disabling a Bridge Device

    • Enabling and Disabling a Bridge Port

    • Changing State on a Bridge Port

    • The Big Picture

    • Forwarding Database

      • Lookups

      • Reference Counts

      • Adding, Updating, and Removing Entries

      • Aging

    • Handling Ingress Traffic

      • Data Frames Versus BPDUs

      • Processing Data Frames

    • Transmitting on a Bridge Device

    • Spanning Tree Protocol (STP)

      • Key Spanning Tree Routines

      • Bridge IDs and Port IDs

      • Enabling the Spanning Tree Protocol on a Bridge Device

      • Processing Ingress BPDUs

      • Transmitting BPDUs

      • Configuration Updates

      • Root Bridge Selection

        • Becoming the root bridge

        • Giving up the root bridge role

      • Timers

      • Handling Topology Changes

    • netdevice Notification Chain

  • Bridging: Miscellaneous Topics

    • User-Space Configuration Tools

      • Handling Configuration Changes

      • Old Interface Versus New Interface

      • Creating Bridge Devices and Bridge Ports

      • Configuring Bridge Devices and Ports

    • Tuning via /proc Filesystem

    • Tuning via /sys Filesystem

    • Statistics

    • Data Structures Featured in This Part of the Book

      • bridge_id Structure

      • net_bridge_fdb_entry Structure

      • net_bridge_port Structure

      • net_bridge Structure

    • Functions and Variables Featured in This Part of the Book

    • Files and Directories Featured in This Part of the Book

  • Part V

  • Internet Protocol Version 4 (IPv4): Concepts

    • IP Protocol: The Big Picture

    • IP Header

    • IP Options

      • “End of Option List” and “No Operation” Options

      • Source Route Option

      • Record Route Option

      • Timestamp Option

      • Router Alert Option

    • Packet Fragmentation/Defragmentation

      • Effect of Fragmentation on Higher Layers

      • IP Header Fields Used by Fragmentation/Defragmentation

      • Examples of Problems with Fragmentation/Defragmentation

        • Retransmissions

        • Associating fragments with their IP packets

        • Example of IP ID generation

        • Example of unsolvable defragmentation problem: NAT

      • Path MTU Discovery

    • Checksums

      • APIs for Checksum Computation

      • Changes to the L4 Checksum

  • Internet Protocol Version 4 (IPv4): Linux Foundations and Features

    • Main IPv4 Data Structures

      • Checksum-Related Fields from sk_buff and net_device Structures

        • net_device structure

        • sk_buff structure

    • General Packet Handling

      • Protocol Initialization

      • Interaction with Netfilter

      • Interaction with the Routing Subsystem

      • Processing Input IP Packets

      • The ip_rcv_finish Function

    • IP Options

      • Option Processing

      • Option Parsing

        • Option: strict and loose Source Routing

        • Option: Record Route

        • Option: Timestamp

        • Option: Router Alert

        • Handling parsing errors

  • Internet Protocol Version 4 (IPv4): Forwarding and Local Delivery

    • Forwarding

      • ICMP Redirect

      • ip_forward Function

      • ip_forward_finish Function

      • dst_output Function

    • Local Delivery

  • Internet Protocol Version 4 (IPv4): Transmission

    • Key Functions That Perform Transmission

      • Multicast Traffic

      • Relevant Socket Data Structures for Local Traffic

      • The ip_queue_xmit Function

        • Setting the route

        • Building the IP header

      • The ip_append_data Function

        • Basic memory allocation and buffer organization for ip_append_data

        • Memory allocation and buffer organization for ip_append_data withScatterGather I/O

        • Key routines for handling fragmented buffers

        • Further handling of the buffers

        • Setting the context

        • Getting ready for fragment generation

        • Copying data into the fragments: getfrag

        • Buffer allocation

        • Main loop

        • L4 checksum

      • The ip_append_page Function

      • The ip_push_pending_frames Function

      • Putting Together the Transmission Functions

      • Raw Sockets

    • Interface to the Neighboring Subsystem

  • Internet Protocol Version 4 (IPv4): Handling Fragmentation

    • IP Fragmentation

      • Functions Involved with IP Fragmentation

      • The ip_fragment Function

      • Slow Fragmentation

      • Fast Fragmentation

    • IP Defragmentation

      • Organization of the IP Fragments Hash Table

      • Key Issues in Defragmentation

      • Functions Involved with Defragmentation

      • New ipq Instance Initialization

      • The ip_defrag Function

      • The ip_frag_queue Function

        • Handling overlaps

        • L4 checksum

      • Garbage Collection

      • Hash Table Reorganization

  • Internet Protocol Version 4 (IPv4): Miscellaneous Topics

    • Long-Living IP Peer Information

      • Initialization

      • Lookups

      • How the IP Layer Uses inet_peer Structures

      • Garbage Collection

    • Selecting the IP Header’s ID Field

    • IP Statistics

    • IP Configuration

      • Main Functions That Manipulate IP Addresses and Configuration

      • Change Notification: rtmsg_ifa

      • inetaddr_chain Notification Chain

      • IP Configuration via ip

      • IP Configuration via ifconfig

    • IP-over-IP

    • IPv4: What’s Wrong with It?

    • Tuning via /proc Filesystem

    • Data Structures Featured in This Part of the Book

      • iphdr Structure

      • ip_options Structure

      • ipcm_cookie Structure

      • ipq Structure

      • inet_peer Structure

      • ipstats_mib Structure

      • in_device Structure

      • in_ifaddr Structure

      • ipv4_devconf Structure

      • ipv4_config Structure

      • cork Structure

      • skb_frag_t Structure

    • Functions and Variables Featured in This Part oftheBook

    • Files and Directories Featured in This Part of the Book

  • Layer Four Protocol and Raw IP Handling

    • Available L4 Protocols

    • L4 Protocol Registration

      • Registration: inet_add_protocol and inet_del_protocol

    • L3 to L4 Delivery: ip_local_deliver_finish

      • Raw Sockets and Raw IP

      • Delivering Raw Input Datagrams to the Recipient Application

      • IPsec

    • IPv4 Versus IPv6

    • Tuning via /proc Filesystem

    • Functions and Variables Featured in This Chapter

    • Files and Directories Featured in This Chapter

  • Internet Control Message Protocol (ICMPv4)

    • ICMP Header

    • ICMP Payload

    • ICMP Types

      • ICMP_ECHO and ICMP_ECHOREPLY

      • ICMP_DEST_UNREACH

      • ICMP_SOURCE_QUENCH

      • ICMP_REDIRECT

      • ICMP_TIME_EXCEEDED

      • ICMP_PARAMETERPROB

      • ICMP_TIMESTAMP and ICMP_TIMESTAMPREPLY

      • ICMP_INFO_REQUEST and ICMP_INFO_REPLY

      • ICMP_ADDRESS and ICMP_ADDRESSREPLY

    • Applications of the ICMP Protocol

      • ping

      • traceroute

    • The Big Picture

    • Protocol Initialization

    • Data Structures Featured in This Chapter

      • icmphdr Structure

      • icmp_control Structure

      • icmp_bxm Structure

    • Transmitting ICMP Messages

      • Transmitting ICMP Error Messages

      • Replying to Ingress ICMP Messages

      • Rate Limiting

      • Implementation of Rate Limiting

    • Receiving ICMP Messages

      • Processing ICMP_ECHO and ICMP_ECHOREPLY Messages

      • Processing the Common ICMP Messages

      • Processing ICMP_REDIRECT Messages

      • Processing ICMP_TIMESTAMP and ICMP_TIMESTAMPREPLY Messages

      • Processing ICMP_ADDRESS and ICMP_ADDRESSREPLY Messages

    • ICMP Statistics

    • Passing Error Notifications to the Transport Layer

    • Tuning via /proc Filesystem

    • Functions and Variables Featured in This Chapter

    • Files and Directories Featured in This Chapter

  • Part VI

  • Neighboring Subsystem: Concepts

    • What Is a Neighbor?

    • Reasons That Neighboring Protocols Are Needed

      • When L3 Addresses Need to Be Translated to L2 Addresses

      • Shared Medium

      • Why Static Assignment of Addresses Is Not Sufficient

      • Special Cases

      • Solicitation Requests and Replies

    • Linux Implementation

      • Neighboring Protocols

    • Proxying the Neighboring Protocol

      • Conditions Required by the Proxy

    • When Solicitation Requests Are Transmitted and Processed

    • Neighbor States and Network Unreachability Detection (NUD)

      • Reachability

      • Transitions Between NUD States

        • Basic states

        • Derived states

        • Initial state

      • Reachability Confirmation

  • Neighboring Subsystem: Infrastructure

    • Main Data Structures

    • Common Interface Between L3 Protocols andNeighboring Protocols

      • Initialization of neigh->ops

      • Initialization of neigh->output and neigh->nud_state

        • Common state changes: neigh_connect and neigh_suspect

        • Routines used for neigh->output

      • Updating a Neighbor’s Information: neigh_update

        • neigh_update optimization

        • Initial neigh_update operations

        • Changes of link layer address

        • Notifications to arpd

    • General Tasks of the Neighboring Infrastructure

      • Caching

      • Timers

    • Reference Counts on neighbour Structures

    • Creating a neighbour Entry

      • The neigh_create Function’s Parameters

      • Neighbor Initialization

    • Neighbor Deletion

      • Garbage Collection

        • Synchronous cleanup: the neigh_forced_gc function

        • Asynchronous cleanup: the neigh_periodic_timer function

    • Acting As a Proxy

      • Delayed Processing of Solicitation Requests

      • Per-Device Proxying and Per-Destination Proxying

    • L2 Header Caching

      • Methods Provided by the Device Driver

      • Link Between Routing and L2 Header Caching

      • Cache Invalidation and Updating

    • Protocol Initialization and Cleanup

    • Interaction with Other Subsystems

      • Events Generated by the Neighboring Layer

      • Events Received by the Neighboring Layer

        • Updates via neigh_ifdown

        • Updates via neigh_changeaddr (netdevice notification chain)

    • Interaction Between Neighboring Protocols and L3 Transmission Functions

    • Queuing

      • Ingress Queuing

      • Egress Queuing

  • Neighboring Subsystem: Address Resolution Protocol (ARP)

    • ARP Packet Format

      • Destination Address Types for ARP Packets

    • Example of an ARP Transaction

    • Gratuitous ARP

      • Change of L2 Address

      • Duplicate Address Detection

      • Virtual IP

    • Responding from Multiple Interfaces

    • Tunable ARP Options

      • Compile-Time Options

      • /proc Options

        • ARP_ANNOUNCE

        • ARP_IGNORE

        • ARP_FILTER

        • Medium ID

    • ARP Protocol Initialization

      • The arp_tbl Table

    • Initialization of a neighbour Structure

      • Basic Initialization Sequence

      • Virtual Functions in the ops Field

      • Start of the arp_constructor Function

      • Devices That Do Not Need ARP

      • Devices That Need ARP

    • Transmitting and Receiving ARP Packets

      • Transmitting ARP Packets: Introduction to arp_send

      • Solicitations

        • ARP_ANNOUNCE and selection of source IP address

    • Processing Ingress ARP Packets

      • Initial Common Processing

      • Processing ARPOP_REQUEST Packets

        • Passive learning and ARP optimization

        • Requests with zero addresses

      • Processing ARPOP_REPLY Packets

      • Final Common Processing

    • Proxy ARP

      • Destination NAT (DNAT)

      • Proxy ARP Server as Router

    • Examples

    • External Events

      • Received Events

      • Generated Events

      • Wake-on-LAN Events

    • ARPD

      • Kernel Side

      • User-Space Side

    • Reverse Address Resolution Protocol (RARP)

    • Improvements in ND (IPv6) over ARP (IPv4)

  • Neighboring Subsystem: Miscellaneous Topics

    • System Administration of Neighbors

      • Common Routines

      • New-Generation Tool: IPROUTE2’s ip Command

      • Old-Generation Tool: net-tools’s arp Command

    • Tuning via /proc Filesystem

      • The /proc/sys/net/ipv4/neigh Directory

        • Initialization of global and per-device directories

        • Directory creation

      • The /proc/sys/net/ipv4/conf Directory

    • Data Structures Featured in This Part of the Book

      • neighbour Structure

      • neigh_table Structure

      • neigh_parms Structure

      • neigh_ops Structure

      • hh_cache Structure

      • neigh_statistics Structure

      • Data Structures Featured in This Part of the Book

    • Files and Directories Featured in This Part of the Book

  • Part VII

  • Routing: Concepts

    • Routers, Routes, and Routing Tables

      • Nonrouting Multihomed Hosts

      • Varieties of Routing Configurations

      • Questions Answered in This Part of the Book

    • Essential Elements of Routing

      • Scope

        • Use of the scope

      • Default Gateway

      • Directed Broadcasts

      • Primary and Secondary Addresses

        • Old-generation configuration: aliasing interfaces

        • Relationship between aliasing devices and primary/secondary status

    • Routing Table

      • Special Routes

      • Route Types and Actions

      • Routing Cache

      • Routing Table Versus Routing Cache

      • Routing Cache Garbage Collection

        • Examples of events that can expire cache entries

        • Examples of eligible cache victims

    • Lookups

      • Longest Prefix Match

    • Packet Reception Versus Packet Transmission

  • Routing: Advanced

    • Concepts Behind Policy Routing

      • Lookup with Policy Routing

      • Routing Table Selection

    • Concepts Behind Multipath Routing

      • Next Hop Selection

      • Cache Support for Multipath

        • Weighted random algorithm

        • Device round-robin algorithm

      • Per-Flow, Per-Connection, and Per-Packet Distribution

        • Equalizer algorithm

    • Interactions with Other Kernel Subsystems

      • Routing Table Based Classifier

        • Configuring policy realms

        • Configuring route realms

        • Computing the routing tag

      • Policy Routing and Firewall-Based Classifier

    • Routing Protocol Daemons

    • Verbose Monitoring

    • ICMP_REDIRECT Messages

      • Shared Media

      • Transmitting ICMP_REDIRECT Messages

      • Processing Ingress ICMP_REDIRECT Messages

    • Reverse Path Filtering

  • Routing: Linux Implementation

    • Kernel Options

      • Basic Options

      • Advanced Options

      • Recently Dropped Options

    • Main Data Structures

      • Lists and Hash Tables

    • Route and Address Scopes

      • Route Scopes

      • Address Scopes

      • Relationship Between Route and Next-Hop Scopes

    • Primary and Secondary IP Addresses

    • Generic Helper Routines and Macros

    • Global Locks

    • Routing Subsystem Initialization

    • External Events

      • Helper Routines

      • Changes in IP Configuration

        • Adding an IP address

        • Removing an IP address

      • Changes in Device Status

        • Impacts on the routing tables

        • Impacts on the policy database

        • Impacts on the IP configuration

    • Interactions with Other Subsystems

      • Netlink Notifications

      • Policy Routing and Firewall-Based Classifier

      • Routing Protocol Daemons

  • Routing: The Routing Cache

    • Routing Cache Initialization

    • Hash Table Organization

    • Major Cache Operations

      • Cache Locking

      • Cache Entry Allocation and Reference Counts

      • Adding Elements to the Cache

      • Binding the Route Cache to the ARP Cache

      • Cache Lookup

        • Ingress lookup

        • Egress lookup

    • Multipath Caching

      • Registering a Caching Algorithm

      • Interface Between the Routing Cache and Multipath

      • Helper Routines

      • Common Elements Between Algorithms

      • Random Algorithm

      • Weighted Random Algorithm

      • Round-Robin Algorithm

      • Device Round-Robin Algorithm

    • Interface Between the DST and Calling Protocols

      • IPsec Transformations and the Use of dst_entry

      • External Events

    • Flushing the Routing Cache

    • Garbage Collection

      • Synchronous Cleanup

      • rt_garbage_collect Function

      • Asynchronous Cleanup

      • Expiration Criteria

      • Deleting DST Entries

      • Variables That Tune and Control Garbage Collection

    • Egress ICMP REDIRECT Rate Limiting

  • Routing: Routing Tables

    • Organization of Routing Hash Tables

      • Organization of Per-Netmask Tables

        • Basic structures for hash table organization

        • Dynamic resizing of per-netmask hash tables

      • Organization of fib_info Structures

        • Dynamic resizing of global hash tables

      • Organization of Next-Hop Router Structures

      • The Two Default Routing Tables: ip_fib_main_table andip_fib_local_table

    • Routing Table Initialization

    • Adding and Removing Routes

      • Adding a Route

      • Deleting a Route

      • Garbage Collection

    • Policy Routing and Its Effects on Routing Table Definitions

      • Variable and Structure Definitions

      • Double Definitions for Functions

  • Routing: Lookups

    • High-Level View of Lookup Functions

    • Helper Routines

    • The Table Lookup: fn_hash_lookup

      • Semantic Matching on Subsidiary Criteria

        • Criteria for rejecting routes

        • Return value from fib_semantic_match

    • fib_lookup Function

    • Setting Functions for Reception and Transmission

      • Initialization of Function Pointers for Ingress Traffic

      • Initialization of Function Pointers for Egress Traffic

      • Special Cases

    • General Structure of the Input and Output Routing Routines

    • Input Routing

      • Creation of a Cache Entry

      • Preferred Source Address Selection

      • Local Delivery

      • Forwarding

      • Routing Failure

    • Output Routing

      • Search Key Initialization

      • Selecting the Source IP Address

      • Local Delivery

      • Transmission to Other Hosts

      • Interaction Between Multipath and Default Gateway Selection

      • Default Gateway Selection

      • fn_hash_select_default Function

    • Effects of Multipath on Next Hop Selection

      • Multipath Caching

    • Policy Routing

      • fib_lookup with Policy Routing

      • Default Gateway Selection with Policy Routing

    • Source Routing

    • Policy Routing and Routing Table Based Classifier

      • Storing the Realms

      • Helper Routines

      • Computing the Routing Tag

  • Routing: Miscellaneous Topics

    • User-Space Configuration Tools

      • Configuring Routing with IPROUTE2

        • Correspondence between IPROUTE2 user commands and kernel functions

        • inet_rtm_newroute and inet_rtm_delroute functions

      • Configuring Routing with net-tools

      • Change Notifications

      • Routes Inserted by the Kernel: The fib_magic Function

    • Statistics

    • Tuning via /proc Filesystem

      • The /proc/sys/net/ipv4 Directory

      • The /proc/sys/net/ipv4/route Directory

      • The /proc/sys/net/ipv4/conf Directory

        • Special subdirectories

        • Use of the special subdirectories

        • File descriptions

      • The /proc/net and /proc/net/stat Directories

    • Enabling and Disabling Forwarding

    • Data Structures Featured in This Part of the Book

      • fib_table Structure

      • fn_zone Structure

      • fib_node Structure

      • fib_alias Structure

      • fib_info Structure

      • fib_nh Structure

      • fib_rule Structure

      • fib_result Structure

      • rtable Structure

      • dst_entry Structure

      • dst_ops Structure

      • flowi Structure

      • rt_cache_stat Structure

      • ip_mp_alg_ops Structure

    • Functions and Variables Featured in This Part oftheBook

    • Files and Directories Featured in This Part of the Book

  • Index

Nội dung

www.it-ebooks.info www.it-ebooks.info Understanding LINUX NETWORK INTERNALS www.it-ebooks.info Other Linux resources from O’Reilly Related titles Linux in a Nutshell Linux Network Administrator’s Guide Running Linux Linux Device Drivers Understanding the Linux Kernel Building Secure Servers with Linux LPI Linux Certification in a Nutshell Learning Red Hat Linux Linux Server Hacks TM Linux Security Cookbook Managing RAID on Linux Linux Web Server CD Bookshelf Building Embedded Linux Systems Linux Books Resource Center linux.oreilly.com is a complete catalog of O’Reilly’s books on Linux and Unix and related technologies, including sample chapters and code examples. ONLamp.com is the premier site for the open source web plat- form: Linux, Apache, MySQL, and either Perl, Python, or PHP. Conferences O’Reilly brings diverse innovators together to nurture the ideas that spark revolutionary industries. We specialize in document- ing the latest tools and systems, translating the innovator’s knowledge into useful skills for those in the trenches. Visit conferences.oreilly.com for our upcoming events. Safari Bookshelf (safari.oreilly.com) is the premier online refer- ence library for programmers and IT professionals. Conduct searches across more than 1,000 books. Subscribers can zero in on answers to time-critical questions in a matter of seconds. Read the books on your Bookshelf from cover to cover or sim- ply flip to the page you need. Try it today with a free trial. www.it-ebooks.info Understanding LINUX NETWORK INTERNALS Christian Benvenuti Beijing • Cambridge • Farnham • Köln • Paris • Sebastopol • Taipei • Tokyo www.it-ebooks.info Understanding Linux Network Internals by Christian Benvenuti Copyright © 2006 O’Reilly Media, Inc. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (safari.oreilly.com). For more information, contact our corporate/insti- tutional sales department: (800) 998-9938 or corporate@oreilly.com. Editor: Andy Oram Production Editor: Philip Dangler Cover Designer: Karen Montgomery Interior Designer: David Futato Printing History: December 2005: First Edition. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. The Linux series designations, Understanding Linux Network Internals, images of the American West, and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. [M] ISBN: 978-0-596-00255-8 [5/08] www.it-ebooks.info v Table of Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Part I. General Background 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Basic Terminology 3 Common Coding Patterns 4 User-Space Tools 18 Browsing the Source Code 19 When a Feature Is Offered as a Patch 20 2. Critical Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 The Socket Buffer: sk_buff Structure 22 net_device Structure 43 Files Mentioned in This Chapter 57 3. User-Space-to-Kernel Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Overview 58 procfs Versus sysctl 60 ioctl 67 Netlink 70 Serializing Configuration Changes 71 www.it-ebooks.info vi | Table of Contents Part II. System Initialization 4. Notification Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Reasons for Notification Chains 75 Overview 77 Defining a Chain 78 Registering with a Chain 78 Notifying Events on a Chain 79 Notification Chains for the Networking Subsystems 81 Tuning via /proc Filesystem 82 Functions and Variables Featured in This Chapter 83 Files and Directories Featured in This Chapter 83 5. Network Device Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 System Initialization Overview 84 Device Registration and Initialization 86 Basic Goals of NIC Initialization 86 Interaction Between Devices and Kernel 87 Initialization Options 93 Module Options 93 Initializing the Device Handling Layer: net_dev_init 94 User-Space Helpers 96 Virtual Devices 100 Tuning via /proc Filesystem 103 Functions and Variables Featured in This Chapter 104 Files and Directories Featured in This Chapter 105 6. The PCI Layer and Network Interface Cards . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Data Structures Featured in This Chapter 106 Registering a PCI NIC Device Driver 108 Power Management and Wake-on-LAN 109 Example of PCI NIC Driver Registration 110 The Big Picture 112 Tuning via /proc Filesystem 114 Functions and Variables Featured in This Chapter 114 Files and Directories Featured in This Chapter 115 www.it-ebooks.info Table of Contents | vii 7. Kernel Infrastructure for Component Initialization . . . . . . . . . . . . . . . . . . . . 116 Boot-Time Kernel Options 116 Module Initialization Code 122 Optimized Macro-Based Tagging 125 Boot-Time Initialization Routines 128 Memory Optimizations 130 Tuning via /proc Filesystem 134 Functions and Variables Featured in This Chapter 134 Files and Directories Featured in This Chapter 135 8. Device Registration and Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 When a Device Is Registered 137 When a Device Is Unregistered 138 Allocating net_device Structures 138 Skeleton of NIC Registration and Unregistration 140 Device Initialization 141 Organization of net_device Structures 145 Device State 147 Registering and Unregistering Devices 149 Device Registration 154 Device Unregistration 156 Enabling and Disabling a Network Device 159 Updating the Device Queuing Discipline State 161 Configuring Device-Related Information from User Space 166 Virtual Devices 169 Locking 171 Tuning via /proc Filesystem 171 Functions and Variables Featured in This Chapter 172 Files and Directories Featured in This Chapter 173 Part III. Transmission and Reception 9. Interrupts and Network Drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Decisions and Traffic Direction 178 Notifying Drivers When Frames Are Received 178 Interrupt Handlers 183 softnet_data Structure 206 www.it-ebooks.info viii | Table of Contents 10. Frame Reception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 Interactions with Other Features 211 Enabling and Disabling a Device 211 Queues 212 Notifying the Kernel of Frame Reception: NAPI and netif_rx 212 Old Interface Between Device Drivers and Kernel: First Part of netif_rx 219 Congestion Management 225 Processing the NET_RX_SOFTIRQ: net_rx_action 228 11. Frame Transmission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 Enabling and Disabling Transmissions 241 12. General and Reference Material About Interrupts . . . . . . . . . . . . . . . . . . . . . 261 Statistics 261 Tuning via /proc and sysfs Filesystems 262 Functions and Variables Featured in This Part of the Book 263 Files and Directories Featured in This Part of the Book 265 13. Protocol Handlers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 Overview of Network Stack 266 Executing the Right Protocol Handler 274 Protocol Handler Organization 278 Protocol Handler Registration 279 Ethernet Versus IEEE 802.3 Frames 281 Tuning via /proc Filesystem 293 Functions and Variables Featured in This Chapter 293 Files and Directories Featured in This Chapter 294 Part IV. Bridging 14. Bridging: Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 Repeaters, Bridges, and Routers 297 Bridges Versus Switches 299 Hosts 300 Merging LANs with Bridges 300 Bridging Different LAN Technologies 302 Address Learning 302 Multiple Bridges 305 www.it-ebooks.info [...]... first introduced to the beautiful world of networking, I started playing with the tools available on Linux I also had the fortune to work for a UNESCO center in Italy where I helped develop their networking courses, based entirely on Linux boxes That gave me access to a good lab equipped with all sorts of network devices and documentation, plus plenty of Linux enthusiasts to learn from and to collaborate... most of the networking features I cover At the Netfilter home page, http://www.netfilter.org, you can find some interesting documentation about its kernel internals Network filesystems Several network filesystems are implemented in the kernel, among them NFS (versions 2, 3, and 4), SMB, Coda, and Andrew You can read a detailed description of the Virtual File System layer in Understanding the Linux Kernel,... lack of space, I had to select a subset of the Linux networking features to cover No selection would make everyone happy, but I think I covered the core of the networking code, and with the knowledge you can gain with this book, you will find it easier to study on your own any other networking feature of the kernel In this book, I decided to focus on the networking code, from the interface between device... documentation about the networking code of the Linux kernel and the availability of good books for other parts of the kernel, I decided to try filling in the gap—or at least part of it I hope this book will give you the starting documentation that I would have loved to have had years ago I believe that this book, together with O’Reilly’s other two kernel books (Understanding the Linux Kernel and Linux Device... and kernel preemption This makes the networking code of the Linux kernel a very good gym in which to train and keep your networking knowledge in shape Moreover, if you are like me and want to learn everything, you will find enough details in this book to keep you satisfied for quite a while Background Information Some knowledge of operating systems would help The networking code, like any other component... more than ever before, networking is a hot topic Any electronic gadget in its latest generation embeds some kind of networking capability The Internet continues to broaden in its population and opportunities It should not come as a surprise that a robust, freely available, and feature-rich operating system like Linux is well accepted by many producers of embedded devices Its networking capabilities... implemented in the Linux kernel Besides the two wellknown ones, UDP and TCP, Linux has the newer Stream Control Transmission Protocol (SCTP) A good description of the implementation of those protocols would require a new book of this size, all on its own Traffic Control This is the Quality of Service (QoS) layer of Linux, another interesting and powerful component of the kernel’s networking code Traffic... between dedicated hardware and general-purpose CPUs However, Linux can definitely compete with low-end commercial products that are entirely software-based Of course, simple extensions to the Linux kernel allow vendors to use Linux on hybrid systems as well (software and hardware); it is only a matter of writing the necessary device drivers Linux is also often used as the operating system of choice for... The code samples are covered by a dual BSD/GPL license We appreciate, but do not require, attribution An attribution usually includes the title, author, publisher, and ISBN For example: Understanding Linux Network Internals, by Christian Benvenuti Copyright 2006 O’Reilly Media, Inc., 0-59600255-6.” xxii | Preface This is the Title of the Book, eMatter Edition www.it-ebooks.info Copyright © 2008 O’Reilly... see your contribution to the Linux kernel being used by potentially millions of users? There is only one drawback: if your contribution is really appreciated, you may not be able to cope with the numerous emails of thanks or requests for help The momentum for Linux has been growing continually over the past years, and apparently it can only keep growing I first encountered Linux at the University of . www.it-ebooks.info www.it-ebooks.info Understanding LINUX NETWORK INTERNALS www.it-ebooks.info Other Linux resources from O’Reilly Related titles Linux in a Nutshell Linux Network Administrator’s. Guide Running Linux Linux Device Drivers Understanding the Linux Kernel Building Secure Servers with Linux LPI Linux Certification in a Nutshell Learning Red Hat Linux Linux

Ngày đăng: 22/02/2014, 09:20

w