diff options
| author | Jacob McDonnell <jacob@jacobmcdonnell.com> | 2026-04-25 19:59:05 -0400 |
|---|---|---|
| committer | Jacob McDonnell <jacob@jacobmcdonnell.com> | 2026-04-25 19:59:05 -0400 |
| commit | 1f19f33e45791ea59aed048796fc68672c6723a5 (patch) | |
| tree | 54625fba89e91d1c2177801ec635e8528bba937f /static/freebsd/man4/netmap.4 3.html | |
| parent | ac5e55f5f2af5b92794c2aded46c6bae85b5f5ed (diff) | |
docs: Removed Precompiled HTML
Diffstat (limited to 'static/freebsd/man4/netmap.4 3.html')
| -rw-r--r-- | static/freebsd/man4/netmap.4 3.html | 1015 |
1 files changed, 0 insertions, 1015 deletions
diff --git a/static/freebsd/man4/netmap.4 3.html b/static/freebsd/man4/netmap.4 3.html deleted file mode 100644 index e0cbff0e..00000000 --- a/static/freebsd/man4/netmap.4 3.html +++ /dev/null @@ -1,1015 +0,0 @@ -<table class="head"> - <tr> - <td class="head-ltitle">NETMAP(4)</td> - <td class="head-vol">Device Drivers Manual</td> - <td class="head-rtitle">NETMAP(4)</td> - </tr> -</table> -<div class="manual-text"> -<section class="Sh"> -<h1 class="Sh" id="NAME"><a class="permalink" href="#NAME">NAME</a></h1> -<p class="Pp"><code class="Nm">netmap</code> — <span class="Nd">a - framework for fast packet I/O</span></p> -</section> -<section class="Sh"> -<h1 class="Sh" id="SYNOPSIS"><a class="permalink" href="#SYNOPSIS">SYNOPSIS</a></h1> -<p class="Pp"><code class="Cd">device netmap</code></p> -</section> -<section class="Sh"> -<h1 class="Sh" id="DESCRIPTION"><a class="permalink" href="#DESCRIPTION">DESCRIPTION</a></h1> -<p class="Pp"><code class="Nm">netmap</code> is a framework for extremely fast - and efficient packet I/O for userspace and kernel clients, and for Virtual - Machines. It runs on <span class="Ux">FreeBSD</span>, Linux and some - versions of Windows, and supports a variety of <code class="Nm">netmap - ports</code>, including</p> -<dl class="Bl-tag"> - <dt><code class="Nm">physical NIC ports</code></dt> - <dd>to access individual queues of network interfaces;</dd> - <dt><code class="Nm">host ports</code></dt> - <dd>to inject packets into the host stack;</dd> - <dt><code class="Nm">VALE ports</code></dt> - <dd>implementing a very fast and modular in-kernel software - switch/dataplane;</dd> - <dt><code class="Nm">netmap pipes</code></dt> - <dd>a shared memory packet transport channel;</dd> - <dt><code class="Nm">netmap monitors</code></dt> - <dd>a mechanism similar to <a class="Xr">bpf(4)</a> to capture traffic</dd> -</dl> -<p class="Pp">All these <code class="Nm">netmap ports</code> are accessed - interchangeably with the same API, and are at least one order of magnitude - faster than standard OS mechanisms (sockets, bpf, tun/tap interfaces, native - switches, pipes). With suitably fast hardware (NICs, PCIe buses, CPUs), - packet I/O using <code class="Nm">netmap</code> on supported NICs reaches - 14.88 million packets per second (Mpps) with much less than one core on 10 - Gbit/s NICs; 35-40 Mpps on 40 Gbit/s NICs (limited by the hardware); about - 20 Mpps per core for VALE ports; and over 100 Mpps for - <code class="Nm">netmap pipes</code>. NICs without native - <code class="Nm">netmap</code> support can still use the API in emulated - mode, which uses unmodified device drivers and is 3-5 times faster than - <a class="Xr">bpf(4)</a> or raw sockets.</p> -<p class="Pp">Userspace clients can dynamically switch NICs into - <code class="Nm">netmap</code> mode and send and receive raw packets through - memory mapped buffers. Similarly, <code class="Nm">VALE</code> switch - instances and ports, <code class="Nm">netmap pipes</code> and - <code class="Nm">netmap monitors</code> can be created dynamically, - providing high speed packet I/O between processes, virtual machines, NICs - and the host stack.</p> -<p class="Pp"><code class="Nm">netmap</code> supports both non-blocking I/O - through <a class="Xr">ioctl(2)</a>, synchronization and blocking I/O through - a file descriptor and standard OS mechanisms such as - <a class="Xr">select(2)</a>, <a class="Xr">poll(2)</a>, - <a class="Xr">kqueue(2)</a> and <a class="Xr">epoll(7)</a>. All types of - <code class="Nm">netmap ports</code> and the <code class="Nm">VALE - switch</code> are implemented by a single kernel module, which also emulates - the <code class="Nm">netmap</code> API over standard drivers. For best - performance, <code class="Nm">netmap</code> requires native support in - device drivers. A list of such devices is at the end of this document.</p> -<p class="Pp">In the rest of this (long) manual page we document various aspects - of the <code class="Nm">netmap</code> and <code class="Nm">VALE</code> - architecture, features and usage.</p> -</section> -<section class="Sh"> -<h1 class="Sh" id="ARCHITECTURE"><a class="permalink" href="#ARCHITECTURE">ARCHITECTURE</a></h1> -<p class="Pp"><code class="Nm">netmap</code> supports raw packet I/O through a - <a class="permalink" href="#port"><i class="Em" id="port">port</i></a>, - which can be connected to a physical interface - (<a class="permalink" href="#NIC"><i class="Em" id="NIC">NIC</i></a>), to - the host stack, or to a <code class="Nm">VALE</code> switch. Ports use - preallocated circular queues of buffers - (<a class="permalink" href="#rings"><i class="Em" id="rings">rings</i></a>) - residing in an mmapped region. There is one ring for each transmit/receive - queue of a NIC or virtual port. An additional ring pair connects to the host - stack.</p> -<p class="Pp">After binding a file descriptor to a port, a - <code class="Nm">netmap</code> client can send or receive packets in batches - through the rings, and possibly implement zero-copy forwarding between - ports.</p> -<p class="Pp">All NICs operating in <code class="Nm">netmap</code> mode use the - same memory region, accessible to all processes who own - <span class="Pa">/dev/netmap</span> file descriptors bound to NICs. - Independent <code class="Nm">VALE</code> and <code class="Nm">netmap - pipe</code> ports by default use separate memory regions, but can be - independently configured to share memory.</p> -</section> -<section class="Sh"> -<h1 class="Sh" id="ENTERING_AND_EXITING_NETMAP_MODE"><a class="permalink" href="#ENTERING_AND_EXITING_NETMAP_MODE">ENTERING - AND EXITING NETMAP MODE</a></h1> -<p class="Pp">The following section describes the system calls to create and - control <code class="Nm">netmap</code> ports (including - <code class="Nm">VALE</code> and <code class="Nm">netmap pipe</code> ports). - Simpler, higher level functions are described in the - <a class="Sx" href="#LIBRARIES">LIBRARIES</a> section.</p> -<p class="Pp">Ports and rings are created and controlled through a file - descriptor, created by opening a special device</p> -<div class="Bd Bd-indent"><code class="Li">fd = - open("/dev/netmap");</code></div> -and then bound to a specific port with an -<div class="Bd Bd-indent"><code class="Li">ioctl(fd, NIOCREGIF, (struct nmreq - *)arg);</code></div> -<p class="Pp"><code class="Nm">netmap</code> has multiple modes of operation - controlled by the <var class="Vt">struct nmreq</var> argument. - <var class="Va">arg.nr_name</var> specifies the netmap port name, as - follows:</p> -<dl class="Bl-tag"> - <dt id="OS"><a class="permalink" href="#OS"><code class="Dv">OS network - interface name (e.g., 'em0', 'eth1', ...</code></a>)</dt> - <dd>the data path of the NIC is disconnected from the host stack, and the file - descriptor is bound to the NIC (one or all queues), or to the host - stack;</dd> - <dt id="valeSSS:PPP"><a class="permalink" href="#valeSSS:PPP"><code class="Dv">valeSSS:PPP</code></a></dt> - <dd>the file descriptor is bound to port PPP of VALE switch SSS. Switch - instances and ports are dynamically created if necessary. - <p class="Pp">Both SSS and PPP have the form [0-9a-zA-Z_]+ , the string - cannot exceed IFNAMSIZ characters, and PPP cannot be the name of any - existing OS network interface.</p> - </dd> -</dl> -<p class="Pp">On return, <var class="Va">arg</var> indicates the size of the - shared memory region, and the number, size and location of all the - <code class="Nm">netmap</code> data structures, which can be accessed by - mmapping the memory</p> -<div class="Bd Bd-indent"><code class="Li">char *mem = mmap(0, arg.nr_memsize, - fd);</code></div> -<p class="Pp">Non-blocking I/O is done with special <a class="Xr">ioctl(2)</a> - <a class="Xr">select(2)</a> and <a class="Xr">poll(2)</a> on the file - descriptor permit blocking I/O.</p> -<p class="Pp">While a NIC is in <code class="Nm">netmap</code> mode, the OS will - still believe the interface is up and running. OS-generated packets for that - NIC end up into a <code class="Nm">netmap</code> ring, and another ring is - used to send packets into the OS network stack. A <a class="Xr">close(2)</a> - on the file descriptor removes the binding, and returns the NIC to normal - mode (reconnecting the data path to the host stack), or destroys the virtual - port.</p> -</section> -<section class="Sh"> -<h1 class="Sh" id="DATA_STRUCTURES"><a class="permalink" href="#DATA_STRUCTURES">DATA - STRUCTURES</a></h1> -<p class="Pp">The data structures in the mmapped memory region are detailed in - <code class="In"><<a class="In">sys/net/netmap.h</a>></code>, which is - the ultimate reference for the <code class="Nm">netmap</code> API. The main - structures and fields are indicated below:</p> -<dl class="Bl-tag"> - <dt id="struct"><a class="permalink" href="#struct"><code class="Dv">struct - netmap_if (one per interface</code></a>)</dt> - <dd> - <div class="Bd Pp Li"> - <pre>struct netmap_if { - ... - const uint32_t ni_flags; /* properties */ - ... - const uint32_t ni_tx_rings; /* NIC tx rings */ - const uint32_t ni_rx_rings; /* NIC rx rings */ - uint32_t ni_bufs_head; /* head of extra bufs list */ - ... -};</pre> - </div> - <p class="Pp">Indicates the number of available rings - (<span class="Pa">struct netmap_rings</span>) and their position in the - mmapped region. The number of tx and rx rings - (<span class="Pa">ni_tx_rings</span>, - <span class="Pa">ni_rx_rings</span>) normally depends on the hardware. - NICs also have an extra tx/rx ring pair connected to the host stack. - <i class="Em">NIOCREGIF</i> can also request additional unbound buffers - in the same memory space, to be used as temporary storage for packets. - The number of extra buffers is specified in the - <var class="Va">arg.nr_arg3</var> field. On success, the kernel writes - back to <var class="Va">arg.nr_arg3</var> the number of extra buffers - actually allocated (they may be less than the amount requested if the - memory space ran out of buffers). <span class="Pa">ni_bufs_head</span> - contains the index of the first of these extra buffers, which are - connected in a list (the first uint32_t of each buffer being the index - of the next buffer in the list). A <code class="Dv">0</code> indicates - the end of the list. The application is free to modify this list and use - the buffers (i.e., binding them to the slots of a netmap ring). When - closing the netmap file descriptor, the kernel frees the buffers - contained in the list pointed by <span class="Pa">ni_bufs_head</span> , - irrespectively of the buffers originally provided by the kernel on - <i class="Em">NIOCREGIF</i>.</p> - </dd> - <dt id="struct~2"><a class="permalink" href="#struct~2"><code class="Dv">struct - netmap_ring (one per ring</code></a>)</dt> - <dd> - <div class="Bd Pp Li"> - <pre>struct netmap_ring { - ... - const uint32_t num_slots; /* slots in each ring */ - const uint32_t nr_buf_size; /* size of each buffer */ - ... - uint32_t head; /* (u) first buf owned by user */ - uint32_t cur; /* (u) wakeup position */ - const uint32_t tail; /* (k) first buf owned by kernel */ - ... - uint32_t flags; - struct timeval ts; /* (k) time of last rxsync() */ - ... - struct netmap_slot slot[0]; /* array of slots */ -}</pre> - </div> - <p class="Pp" id="slots">Implements transmit and receive rings, with - read/write pointers, metadata and an array of - <a class="permalink" href="#slots"><i class="Em">slots</i></a> - describing the buffers.</p> - </dd> - <dt id="struct~3"><a class="permalink" href="#struct~3"><code class="Dv">struct - netmap_slot (one per buffer</code></a>)</dt> - <dd> - <div class="Bd Pp Li"> - <pre>struct netmap_slot { - uint32_t buf_idx; /* buffer index */ - uint16_t len; /* packet length */ - uint16_t flags; /* buf changed, etc. */ - uint64_t ptr; /* address for indirect buffers */ -};</pre> - </div> - <p class="Pp">Describes a packet buffer, which normally is identified by an - index and resides in the mmapped region.</p> - </dd> - <dt id="packet"><a class="permalink" href="#packet"><code class="Dv">packet - buffers</code></a></dt> - <dd>Fixed size (normally 2 KB) packet buffers allocated by the kernel.</dd> -</dl> -<p class="Pp">The offset of the <span class="Pa">struct netmap_if</span> in the - mmapped region is indicated by the <span class="Pa">nr_offset</span> field - in the structure returned by <code class="Dv">NIOCREGIF</code>. From there, - all other objects are reachable through relative references (offsets or - indexes). Macros and functions in - <code class="In"><<a class="In">net/netmap_user.h</a>></code> help - converting them into actual pointers:</p> -<p class="Pp"></p> -<div class="Bd Bd-indent"><code class="Li">struct netmap_if *nifp = - NETMAP_IF(mem, arg.nr_offset);</code></div> -<div class="Bd Bd-indent"><code class="Li">struct netmap_ring *txr = - NETMAP_TXRING(nifp, ring_index);</code></div> -<div class="Bd Bd-indent"><code class="Li">struct netmap_ring *rxr = - NETMAP_RXRING(nifp, ring_index);</code></div> -<p class="Pp"></p> -<div class="Bd Bd-indent"><code class="Li">char *buf = NETMAP_BUF(ring, - buffer_index);</code></div> -</section> -<section class="Sh"> -<h1 class="Sh" id="RINGS,_BUFFERS_AND_DATA_I/O"><a class="permalink" href="#RINGS,_BUFFERS_AND_DATA_I/O">RINGS, - BUFFERS AND DATA I/O</a></h1> -<p class="Pp"><var class="Va">Rings</var> are circular queues of packets with - three indexes/pointers (<var class="Va">head</var>, - <var class="Va">cur</var>, <var class="Va">tail</var>); one slot is always - kept empty. The ring size (<var class="Va">num_slots</var>) should not be - assumed to be a power of two.</p> -<p class="Pp"><var class="Va">head</var> is the first slot available to - userspace;</p> -<p class="Pp"><var class="Va">cur</var> is the wakeup point: select/poll will - unblock when <var class="Va">tail</var> passes - <var class="Va">cur</var>;</p> -<p class="Pp"><var class="Va">tail</var> is the first slot reserved to the - kernel.</p> -<p class="Pp">Slot indexes <i class="Em">must</i> only move forward; for - convenience, the function</p> -<div class="Bd Bd-indent"><code class="Li">nm_ring_next(ring, - index)</code></div> -returns the next index modulo the ring size. -<p class="Pp"><var class="Va">head</var> and <var class="Va">cur</var> are only - modified by the user program; <var class="Va">tail</var> is only modified by - the kernel. The kernel only reads/writes the <var class="Vt">struct - netmap_ring</var> slots and buffers during the execution of a netmap-related - system call. The only exception are slots (and buffers) in the range - <var class="Va">tail </var>... <var class="Va">head-1</var>, that are - explicitly assigned to the kernel.</p> -<section class="Ss"> -<h2 class="Ss" id="TRANSMIT_RINGS"><a class="permalink" href="#TRANSMIT_RINGS">TRANSMIT - RINGS</a></h2> -<p class="Pp">On transmit rings, after a <code class="Nm">netmap</code> system - call, slots in the range <var class="Va">head </var>... - <var class="Va">tail-1</var> are available for transmission. User code - should fill the slots sequentially and advance <var class="Va">head</var> - and <var class="Va">cur</var> past slots ready to transmit. - <var class="Va">cur</var> may be moved further ahead if the user code needs - more slots before further transmissions (see - <a class="Sx" href="#SCATTER_GATHER_I/O">SCATTER GATHER I/O</a>).</p> -<p class="Pp">At the next NIOCTXSYNC/select()/poll(), slots up to - <var class="Va">head-1</var> are pushed to the port, and - <var class="Va">tail</var> may advance if further slots have become - available. Below is an example of the evolution of a TX ring:</p> -<div class="Bd Pp Li"> -<pre> after the syscall, slots between cur and tail are (a)vailable - head=cur tail - | | - v v - TX [.....aaaaaaaaaaa.............] - - user creates new packets to (T)ransmit - head=cur tail - | | - v v - TX [.....TTTTTaaaaaa.............] - - NIOCTXSYNC/poll()/select() sends packets and reports new slots - head=cur tail - | | - v v - TX [..........aaaaaaaaaaa........]</pre> -</div> -<p class="Pp" id="select"><a class="permalink" href="#select"><code class="Fn">select</code></a>() - and - <a class="permalink" href="#poll"><code class="Fn" id="poll">poll</code></a>() - will block if there is no space in the ring, i.e.,</p> -<div class="Bd Bd-indent"><code class="Li">ring->cur == - ring->tail</code></div> -and return when new slots have become available. -<p class="Pp">High speed applications may want to amortize the cost of system - calls by preparing as many packets as possible before issuing them.</p> -<p class="Pp">A transmit ring with pending transmissions has</p> -<div class="Bd Bd-indent"><code class="Li">ring->head != ring->tail + 1 - (modulo the ring size).</code></div> -The function <var class="Va">int nm_tx_pending(ring)</var> implements this test. -</section> -<section class="Ss"> -<h2 class="Ss" id="RECEIVE_RINGS"><a class="permalink" href="#RECEIVE_RINGS">RECEIVE - RINGS</a></h2> -<p class="Pp">On receive rings, after a <code class="Nm">netmap</code> system - call, the slots in the range <var class="Va">head</var>... - <var class="Va">tail-1</var> contain received packets. User code should - process them and advance <var class="Va">head</var> and - <var class="Va">cur</var> past slots it wants to return to the kernel. - <var class="Va">cur</var> may be moved further ahead if the user code wants - to wait for more packets without returning all the previous slots to the - kernel.</p> -<p class="Pp">At the next NIOCRXSYNC/select()/poll(), slots up to - <var class="Va">head-1</var> are returned to the kernel for further - receives, and <var class="Va">tail</var> may advance to report new incoming - packets.</p> -<p class="Pp">Below is an example of the evolution of an RX ring:</p> -<div class="Bd Pp Li"> -<pre> after the syscall, there are some (h)eld and some (R)eceived slots - head cur tail - | | | - v v v - RX [..hhhhhhRRRRRRRR..........] - - user advances head and cur, releasing some slots and holding others - head cur tail - | | | - v v v - RX [..*****hhhRRRRRR...........] - - NICRXSYNC/poll()/select() recovers slots and reports new packets - head cur tail - | | | - v v v - RX [.......hhhRRRRRRRRRRRR....]</pre> -</div> -</section> -</section> -<section class="Sh"> -<h1 class="Sh" id="SLOTS_AND_PACKET_BUFFERS"><a class="permalink" href="#SLOTS_AND_PACKET_BUFFERS">SLOTS - AND PACKET BUFFERS</a></h1> -<p class="Pp">Normally, packets should be stored in the netmap-allocated buffers - assigned to slots when ports are bound to a file descriptor. One packet is - fully contained in a single buffer.</p> -<p class="Pp">The following flags affect slot and buffer processing:</p> -<dl class="Bl-tag"> - <dt id="must">NS_BUF_CHANGED</dt> - <dd><a class="permalink" href="#must"><i class="Em">must</i></a> be used when - the <var class="Va">buf_idx</var> in the slot is changed. This can be used - to implement zero-copy forwarding, see - <a class="Sx" href="#ZERO_COPY_FORWARDING">ZERO-COPY FORWARDING</a>.</dd> - <dt>NS_REPORT</dt> - <dd>reports when this buffer has been transmitted. Normally, - <code class="Nm">netmap</code> notifies transmit completions in batches, - hence signals can be delayed indefinitely. This flag helps detect when - packets have been sent and a file descriptor can be closed.</dd> - <dt>NS_FORWARD</dt> - <dd>When a ring is in 'transparent' mode, packets marked with this flag by the - user application are forwarded to the other endpoint at the next system - call, thus restoring (in a selective way) the connection between a NIC and - the host stack.</dd> - <dt>NS_NO_LEARN</dt> - <dd>tells the forwarding code that the source MAC address for this packet must - not be used in the learning bridge code.</dd> - <dt>NS_INDIRECT</dt> - <dd>indicates that the packet's payload is in a user-supplied buffer whose - user virtual address is in the 'ptr' field of the slot. The size can reach - 65535 bytes. - <p class="Pp">This is only supported on the transmit ring of - <code class="Nm">VALE</code> ports, and it helps reducing data copies in - the interconnection of virtual machines.</p> - </dd> - <dt>NS_MOREFRAG</dt> - <dd>indicates that the packet continues with subsequent buffers; the last - buffer in a packet must have the flag clear.</dd> -</dl> -</section> -<section class="Sh"> -<h1 class="Sh" id="SCATTER_GATHER_I/O"><a class="permalink" href="#SCATTER_GATHER_I/O">SCATTER - GATHER I/O</a></h1> -<p class="Pp">Packets can span multiple slots if the - <var class="Va">NS_MOREFRAG</var> flag is set in all but the last slot. The - maximum length of a chain is 64 buffers. This is normally used with - <code class="Nm">VALE</code> ports when connecting virtual machines, as they - generate large TSO segments that are not split unless they reach a physical - device.</p> -<p class="Pp">NOTE: The length field always refers to the individual fragment; - there is no place with the total length of a packet.</p> -<p class="Pp">On receive rings the macro <var class="Va">NS_RFRAGS(slot)</var> - indicates the remaining number of slots for this packet, including the - current one. Slots with a value greater than 1 also have NS_MOREFRAG - set.</p> -</section> -<section class="Sh"> -<h1 class="Sh" id="IOCTLS"><a class="permalink" href="#IOCTLS">IOCTLS</a></h1> -<p class="Pp"><code class="Nm">netmap</code> uses two ioctls (NIOCTXSYNC, - NIOCRXSYNC) for non-blocking I/O. They take no argument. Two more ioctls - (NIOCGINFO, NIOCREGIF) are used to query and configure ports, with the - following argument:</p> -<div class="Bd Pp Li"> -<pre>struct nmreq { - char nr_name[IFNAMSIZ]; /* (i) port name */ - uint32_t nr_version; /* (i) API version */ - uint32_t nr_offset; /* (o) nifp offset in mmap region */ - uint32_t nr_memsize; /* (o) size of the mmap region */ - uint32_t nr_tx_slots; /* (i/o) slots in tx rings */ - uint32_t nr_rx_slots; /* (i/o) slots in rx rings */ - uint16_t nr_tx_rings; /* (i/o) number of tx rings */ - uint16_t nr_rx_rings; /* (i/o) number of rx rings */ - uint16_t nr_ringid; /* (i/o) ring(s) we care about */ - uint16_t nr_cmd; /* (i) special command */ - uint16_t nr_arg1; /* (i/o) extra arguments */ - uint16_t nr_arg2; /* (i/o) extra arguments */ - uint32_t nr_arg3; /* (i/o) extra arguments */ - uint32_t nr_flags /* (i/o) open mode */ - ... -};</pre> -</div> -<p class="Pp">A file descriptor obtained through - <span class="Pa">/dev/netmap</span> also supports the ioctl supported by - network devices, see <a class="Xr">netintro(4)</a>.</p> -<dl class="Bl-tag"> - <dt id="NIOCGINFO"><a class="permalink" href="#NIOCGINFO"><code class="Dv">NIOCGINFO</code></a></dt> - <dd>returns EINVAL if the named port does not support netmap. Otherwise, it - returns 0 and (advisory) information about the port. Note that all the - information below can change before the interface is actually put in - netmap mode. - <dl class="Bl-tag"> - <dt><span class="Pa">nr_memsize</span></dt> - <dd>indicates the size of the <code class="Nm">netmap</code> memory - region. NICs in <code class="Nm">netmap</code> mode all share the same - memory region, whereas <code class="Nm">VALE</code> ports have - independent regions for each port.</dd> - <dt><span class="Pa">nr_tx_slots</span>, - <span class="Pa">nr_rx_slots</span></dt> - <dd>indicate the size of transmit and receive rings.</dd> - <dt><span class="Pa">nr_tx_rings</span>, - <span class="Pa">nr_rx_rings</span></dt> - <dd>indicate the number of transmit and receive rings. Both ring number - and sizes may be configured at runtime using interface-specific - functions (e.g., <a class="Xr">ethtool(8)</a> ).</dd> - </dl> - </dd> - <dt id="NIOCREGIF"><a class="permalink" href="#NIOCREGIF"><code class="Dv">NIOCREGIF</code></a></dt> - <dd>binds the port named in <var class="Va">nr_name</var> to the file - descriptor. For a physical device this also switches it into - <code class="Nm">netmap</code> mode, disconnecting it from the host stack. - Multiple file descriptors can be bound to the same port, with proper - synchronization left to the user. - <p class="Pp">The recommended way to bind a file descriptor to a port is to - use function <var class="Va">nm_open(..)</var> (see - <a class="Sx" href="#LIBRARIES">LIBRARIES</a>) which parses names to - access specific port types and enable features. In the following we - document the main features.</p> - <p class="Pp" id="netmap"><code class="Dv">NIOCREGIF can also bind a file - descriptor to one endpoint of a</code> - <a class="permalink" href="#netmap"><i class="Em">netmap pipe</i></a>, - consisting of two netmap ports with a crossover connection. A netmap - pipe share the same memory space of the parent port, and is meant to - enable configuration where a master process acts as a dispatcher towards - slave processes.</p> - <p class="Pp">To enable this function, the <span class="Pa">nr_arg1</span> - field of the structure can be used as a hint to the kernel to indicate - how many pipes we expect to use, and reserve extra space in the memory - region.</p> - <p class="Pp">On return, it gives the same info as NIOCGINFO, with - <span class="Pa">nr_ringid</span> and <span class="Pa">nr_flags</span> - indicating the identity of the rings controlled through the file - descriptor.</p> - <p class="Pp"><var class="Va">nr_flags</var> <var class="Va">nr_ringid</var> - selects which rings are controlled through this file descriptor. - Possible values of <span class="Pa">nr_flags</span> are indicated below, - together with the naming schemes that application libraries (such as the - <code class="Nm">nm_open</code> indicated below) can use to indicate the - specific set of rings. In the example below, "netmap:foo" is - any valid netmap port name.</p> - <dl class="Bl-tag"> - <dt>NR_REG_ALL_NIC netmap:foo</dt> - <dd>(default) all hardware ring pairs</dd> - <dt>NR_REG_SW netmap:foo^</dt> - <dd>the ``host rings'', connecting to the host stack.</dd> - <dt>NR_REG_NIC_SW netmap:foo*</dt> - <dd>all hardware rings and the host rings</dd> - <dt>NR_REG_ONE_NIC netmap:foo-i</dt> - <dd>only the i-th hardware ring pair, where the number is in - <span class="Pa">nr_ringid</span>;</dd> - <dt>NR_REG_PIPE_MASTER netmap:foo{i</dt> - <dd>the master side of the netmap pipe whose identifier (i) is in - <span class="Pa">nr_ringid</span>;</dd> - <dt>NR_REG_PIPE_SLAVE netmap:foo}i</dt> - <dd>the slave side of the netmap pipe whose identifier (i) is in - <span class="Pa">nr_ringid</span>. - <p class="Pp">The identifier of a pipe must be thought as part of the - pipe name, and does not need to be sequential. On return the pipe - will only have a single ring pair with index 0, irrespective of the - value of <var class="Va">i</var>.</p> - </dd> - </dl> - <p class="Pp">By default, a <a class="Xr">poll(2)</a> or - <a class="Xr">select(2)</a> call pushes out any pending packets on the - transmit ring, even if no write events are specified. The feature can be - disabled by or-ing <var class="Va">NETMAP_NO_TX_POLL</var> to the value - written to <var class="Va">nr_ringid</var>. When this feature is used, - packets are transmitted only on <var class="Va">ioctl(NIOCTXSYNC)</var> - or <var class="Va">select() /</var> <var class="Va">poll()</var> are - called with a write event (POLLOUT/wfdset) or a full ring.</p> - <p class="Pp">When registering a virtual interface that is dynamically - created to a <code class="Nm">VALE</code> switch, we can specify the - desired number of rings (1 by default, and currently up to 16) on it - using nr_tx_rings and nr_rx_rings fields.</p> - </dd> - <dt id="NIOCTXSYNC"><a class="permalink" href="#NIOCTXSYNC"><code class="Dv">NIOCTXSYNC</code></a></dt> - <dd>tells the hardware of new packets to transmit, and updates the number of - slots available for transmission.</dd> - <dt id="NIOCRXSYNC"><a class="permalink" href="#NIOCRXSYNC"><code class="Dv">NIOCRXSYNC</code></a></dt> - <dd>tells the hardware of consumed packets, and asks for newly available - packets.</dd> -</dl> -</section> -<section class="Sh"> -<h1 class="Sh" id="SELECT,_POLL,_EPOLL,_KQUEUE"><a class="permalink" href="#SELECT,_POLL,_EPOLL,_KQUEUE">SELECT, - POLL, EPOLL, KQUEUE</a></h1> -<p class="Pp"><a class="Xr">select(2)</a> and <a class="Xr">poll(2)</a> on a - <code class="Nm">netmap</code> file descriptor process rings as indicated in - <a class="Sx" href="#TRANSMIT_RINGS">TRANSMIT RINGS</a> and - <a class="Sx" href="#RECEIVE_RINGS">RECEIVE RINGS</a>, respectively when - write (POLLOUT) and read (POLLIN) events are requested. Both block if no - slots are available in the ring (<var class="Va">ring->cur == - ring->tail</var>). Depending on the platform, <a class="Xr">epoll(7)</a> - and <a class="Xr">kqueue(2)</a> are supported too.</p> -<p class="Pp">Packets in transmit rings are normally pushed out (and buffers - reclaimed) even without requesting write events. Passing the - <code class="Dv">NETMAP_NO_TX_POLL</code> flag to - <i class="Em">NIOCREGIF</i> disables this feature. By default, receive rings - are processed only if read events are requested. Passing the - <code class="Dv">NETMAP_DO_RX_POLL</code> flag to <i class="Em">NIOCREGIF - updates receive rings even without read events.</i> Note that on - <a class="Xr">epoll(7)</a> and <a class="Xr">kqueue(2)</a>, - <code class="Dv">NETMAP_NO_TX_POLL</code> and - <code class="Dv">NETMAP_DO_RX_POLL</code> only have an effect when some - event is posted for the file descriptor.</p> -</section> -<section class="Sh"> -<h1 class="Sh" id="LIBRARIES"><a class="permalink" href="#LIBRARIES">LIBRARIES</a></h1> -<p class="Pp">The <code class="Nm">netmap</code> API is supposed to be used - directly, both because of its simplicity and for efficient integration with - applications.</p> -<p class="Pp">For convenience, the - <code class="In"><<a class="In">net/netmap_user.h</a>></code> header - provides a few macros and functions to ease creating a file descriptor and - doing I/O with a <code class="Nm">netmap</code> port. These are loosely - modeled after the <a class="Xr">pcap(3)</a> API, to ease porting of - libpcap-based applications to <code class="Nm">netmap</code>. To use these - extra functions, programs should</p> -<div class="Bd Bd-indent"><code class="Li">#define NETMAP_WITH_LIBS</code></div> -before -<div class="Bd Bd-indent"><code class="Li">#include - <net/netmap_user.h></code></div> -<p class="Pp">The following functions are available:</p> -<dl class="Bl-tag"> - <dt id="struct~4"><var class="Va">struct nm_desc * nm_open(const char *ifname, - const struct nmreq *req, uint64_t flags, const struct nm_desc - *arg</var>)</dt> - <dd>similar to <a class="Xr">pcap_open_live(3)</a>, binds a file descriptor to - a port. - <dl class="Bl-tag"> - <dt id="ifname"><var class="Va">ifname</var></dt> - <dd>is a port name, in the form "netmap:PPP" for a NIC and - "valeSSS:PPP" for a <code class="Nm">VALE</code> port.</dd> - <dt id="req"><var class="Va">req</var></dt> - <dd>provides the initial values for the argument to the NIOCREGIF ioctl. - The nm_flags and nm_ringid values are overwritten by parsing ifname - and flags, and other fields can be overridden through the other two - arguments.</dd> - <dt id="arg"><var class="Va">arg</var></dt> - <dd>points to a struct nm_desc containing arguments (e.g., from a - previously open file descriptor) that should override the defaults. - The fields are used as described below</dd> - <dt id="flags"><var class="Va">flags</var></dt> - <dd>can be set to a combination of the following flags: - <var class="Va">NETMAP_NO_TX_POLL</var>, - <var class="Va">NETMAP_DO_RX_POLL</var> (copied into nr_ringid); - <var class="Va">NM_OPEN_NO_MMAP</var> (if arg points to the same - memory region, avoids the mmap and uses the values from it); - <var class="Va">NM_OPEN_IFNAME</var> (ignores ifname and uses the - values in arg); <var class="Va">NM_OPEN_ARG1</var>, - <var class="Va">NM_OPEN_ARG2</var>, <var class="Va">NM_OPEN_ARG3</var> - (uses the fields from arg); <var class="Va">NM_OPEN_RING_CFG</var> - (uses the ring number and sizes from arg).</dd> - </dl> - </dd> - <dt id="int"><var class="Va">int nm_close(struct nm_desc *d</var>)</dt> - <dd>closes the file descriptor, unmaps memory, frees resources.</dd> - <dt id="int~2"><var class="Va">int nm_inject(struct nm_desc *d, const void - *buf, size_t size</var>)</dt> - <dd>similar to <var class="Va">pcap_inject()</var>, pushes a packet to a ring, - returns the size of the packet is successful, or 0 on error;</dd> - <dt id="int~3"><var class="Va">int nm_dispatch(struct nm_desc *d, int cnt, - nm_cb_t cb, u_char *arg</var>)</dt> - <dd>similar to <var class="Va">pcap_dispatch()</var>, applies a callback to - incoming packets</dd> - <dt id="u_char"><var class="Va">u_char * nm_nextpkt(struct nm_desc *d, struct - nm_pkthdr *hdr</var>)</dt> - <dd>similar to <var class="Va">pcap_next()</var>, fetches the next packet</dd> -</dl> -</section> -<section class="Sh"> -<h1 class="Sh" id="SUPPORTED_DEVICES"><a class="permalink" href="#SUPPORTED_DEVICES">SUPPORTED - DEVICES</a></h1> -<p class="Pp"><code class="Nm">netmap</code> natively supports the following - devices:</p> -<p class="Pp">On <span class="Ux">FreeBSD</span>: <a class="Xr">cxgbe(4)</a>, - <a class="Xr">em(4)</a>, <a class="Xr">iflib(4)</a> (providing - <a class="Xr">igb(4)</a> and <a class="Xr">em(4)</a>), - <a class="Xr">ix(4)</a>, <a class="Xr">ixl(4)</a>, <a class="Xr">re(4)</a>, - <a class="Xr">vtnet(4)</a>.</p> -<p class="Pp">On Linux e1000, e1000e, i40e, igb, ixgbe, ixgbevf, r8169, - virtio_net, vmxnet3.</p> -<p class="Pp">NICs without native support can still be used in - <code class="Nm">netmap</code> mode through emulation. Performance is - inferior to native netmap mode but still significantly higher than various - raw socket types (bpf, PF_PACKET, etc.). Note that for slow devices (such as - 1 Gbit/s and slower NICs, or several 10 Gbit/s NICs whose hardware is unable - to sustain line rate), emulated and native mode will likely have similar or - same throughput.</p> -<p class="Pp">When emulation is in use, packet sniffer programs such as tcpdump - could see received packets before they are diverted by netmap. This - behaviour is not intentional, being just an artifact of the implementation - of emulation. Note that in case the netmap application subsequently moves - packets received from the emulated adapter onto the host RX ring, the - sniffer will intercept those packets again, since the packets are injected - to the host stack as they were received by the network interface.</p> -<p class="Pp">Emulation is also available for devices with native netmap - support, which can be used for testing or performance comparison. The sysctl - variable <var class="Va">dev.netmap.admode</var> globally controls how - netmap mode is implemented.</p> -</section> -<section class="Sh"> -<h1 class="Sh" id="SYSCTL_VARIABLES_AND_MODULE_PARAMETERS"><a class="permalink" href="#SYSCTL_VARIABLES_AND_MODULE_PARAMETERS">SYSCTL - VARIABLES AND MODULE PARAMETERS</a></h1> -<p class="Pp">Some aspects of the operation of <code class="Nm">netmap</code> - and <code class="Nm">VALE</code> are controlled through sysctl variables on - <span class="Ux">FreeBSD</span> - (<a class="permalink" href="#dev.netmap.*"><i class="Em" id="dev.netmap.*">dev.netmap.*</i></a>) - and module parameters on Linux - (<a class="permalink" href="#/sys/module/netmap/parameters/*"><i class="Em" id="/sys/module/netmap/parameters/*">/sys/module/netmap/parameters/*</i></a>):</p> -<dl class="Bl-tag"> - <dt id="dev.netmap.admode:"><var class="Va">dev.netmap.admode: 0</var></dt> - <dd>Controls the use of native or emulated adapter mode. - <p class="Pp">0 uses the best available option;</p> - <p class="Pp">1 forces native mode and fails if not available;</p> - <p class="Pp">2 forces emulated hence never fails.</p> - </dd> - <dt id="dev.netmap.generic_rings:"><var class="Va">dev.netmap.generic_rings: - 1</var></dt> - <dd>Number of rings used for emulated netmap mode</dd> - <dt id="dev.netmap.generic_ringsize:"><var class="Va">dev.netmap.generic_ringsize: - 1024</var></dt> - <dd>Ring size used for emulated netmap mode</dd> - <dt id="dev.netmap.generic_mit:"><var class="Va">dev.netmap.generic_mit: - 100000</var></dt> - <dd>Controls interrupt moderation for emulated mode</dd> - <dt id="dev.netmap.fwd:"><var class="Va">dev.netmap.fwd: 0</var></dt> - <dd>Forces NS_FORWARD mode</dd> - <dt id="dev.netmap.txsync_retry:"><var class="Va">dev.netmap.txsync_retry: - 2</var></dt> - <dd>Number of txsync loops in the <code class="Nm">VALE</code> flush - function</dd> - <dt id="dev.netmap.no_pendintr:"><var class="Va">dev.netmap.no_pendintr: - 1</var></dt> - <dd>Forces recovery of transmit buffers on system calls</dd> - <dt id="dev.netmap.no_timestamp:"><var class="Va">dev.netmap.no_timestamp: - 0</var></dt> - <dd>Disables the update of the timestamp in the netmap ring</dd> - <dt id="dev.netmap.verbose:"><var class="Va">dev.netmap.verbose: 0</var></dt> - <dd>Verbose kernel messages</dd> - <dt id="dev.netmap.buf_num:"><var class="Va">dev.netmap.buf_num: - 163840</var></dt> - <dd style="width: auto;"> </dd> - <dt id="dev.netmap.buf_size:"><var class="Va">dev.netmap.buf_size: - 2048</var></dt> - <dd style="width: auto;"> </dd> - <dt id="dev.netmap.ring_num:"><var class="Va">dev.netmap.ring_num: - 200</var></dt> - <dd style="width: auto;"> </dd> - <dt id="dev.netmap.ring_size:"><var class="Va">dev.netmap.ring_size: - 36864</var></dt> - <dd style="width: auto;"> </dd> - <dt id="dev.netmap.if_num:"><var class="Va">dev.netmap.if_num: 100</var></dt> - <dd style="width: auto;"> </dd> - <dt id="dev.netmap.if_size:"><var class="Va">dev.netmap.if_size: - 1024</var></dt> - <dd>Sizes and number of objects (netmap_if, netmap_ring, buffers) for the - global memory region. The only parameter worth modifying is - <var class="Va">dev.netmap.buf_num</var> as it impacts the total amount of - memory used by netmap.</dd> - <dt id="dev.netmap.buf_curr_num:"><var class="Va">dev.netmap.buf_curr_num: - 0</var></dt> - <dd style="width: auto;"> </dd> - <dt id="dev.netmap.buf_curr_size:"><var class="Va">dev.netmap.buf_curr_size: - 0</var></dt> - <dd style="width: auto;"> </dd> - <dt id="dev.netmap.ring_curr_num:"><var class="Va">dev.netmap.ring_curr_num: - 0</var></dt> - <dd style="width: auto;"> </dd> - <dt id="dev.netmap.ring_curr_size:"><var class="Va">dev.netmap.ring_curr_size: - 0</var></dt> - <dd style="width: auto;"> </dd> - <dt id="dev.netmap.if_curr_num:"><var class="Va">dev.netmap.if_curr_num: - 0</var></dt> - <dd style="width: auto;"> </dd> - <dt id="dev.netmap.if_curr_size:"><var class="Va">dev.netmap.if_curr_size: - 0</var></dt> - <dd>Actual values in use.</dd> - <dt id="dev.netmap.priv_buf_num:"><var class="Va">dev.netmap.priv_buf_num: - 4098</var></dt> - <dd style="width: auto;"> </dd> - <dt id="dev.netmap.priv_buf_size:"><var class="Va">dev.netmap.priv_buf_size: - 2048</var></dt> - <dd style="width: auto;"> </dd> - <dt id="dev.netmap.priv_ring_num:"><var class="Va">dev.netmap.priv_ring_num: - 4</var></dt> - <dd style="width: auto;"> </dd> - <dt id="dev.netmap.priv_ring_size:"><var class="Va">dev.netmap.priv_ring_size: - 20480</var></dt> - <dd style="width: auto;"> </dd> - <dt id="dev.netmap.priv_if_num:"><var class="Va">dev.netmap.priv_if_num: - 2</var></dt> - <dd style="width: auto;"> </dd> - <dt id="dev.netmap.priv_if_size:"><var class="Va">dev.netmap.priv_if_size: - 1024</var></dt> - <dd>Sizes and number of objects (netmap_if, netmap_ring, buffers) for private - memory regions. A separate memory region is used for each - <code class="Nm">VALE</code> port and each pair of <code class="Nm">netmap - pipes</code>.</dd> - <dt id="dev.netmap.bridge_batch:"><var class="Va">dev.netmap.bridge_batch: - 1024</var></dt> - <dd>Batch size used when moving packets across a <code class="Nm">VALE</code> - switch. Values above 64 generally guarantee good performance.</dd> - <dt id="dev.netmap.max_bridges:"><var class="Va">dev.netmap.max_bridges: - 8</var></dt> - <dd>Max number of <code class="Nm">VALE</code> switches that can be created. - This tunable can be specified at loader time.</dd> - <dt id="dev.netmap.ptnet_vnet_hdr:"><var class="Va">dev.netmap.ptnet_vnet_hdr: - 1</var></dt> - <dd>Allow ptnet devices to use virtio-net headers</dd> - <dt id="dev.netmap.port_numa_affinity:"><var class="Va">dev.netmap.port_numa_affinity: - 0</var></dt> - <dd>On <a class="Xr">numa(4)</a> systems, allocate memory for netmap ports - from the local NUMA domain when possible. This can improve performance by - reducing the number of remote memory accesses. However, when forwarding - packets between ports attached to different NUMA domains, this will - prevent zero-copy forwarding optimizations and thus may hurt performance. - Note that this setting must be specified as a loader tunable at boot - time.</dd> -</dl> -</section> -<section class="Sh"> -<h1 class="Sh" id="SYSTEM_CALLS"><a class="permalink" href="#SYSTEM_CALLS">SYSTEM - CALLS</a></h1> -<p class="Pp"><code class="Nm">netmap</code> uses <a class="Xr">select(2)</a>, - <a class="Xr">poll(2)</a>, <a class="Xr">epoll(7)</a> and - <a class="Xr">kqueue(2)</a> to wake up processes when significant events - occur, and <a class="Xr">mmap(2)</a> to map memory. - <a class="Xr">ioctl(2)</a> is used to configure ports and - <code class="Nm">VALE switches</code>.</p> -<p class="Pp">Applications may need to create threads and bind them to specific - cores to improve performance, using standard OS primitives, see - <a class="Xr">pthread(3)</a>. In particular, - <a class="Xr">pthread_setaffinity_np(3)</a> may be of use.</p> -</section> -<section class="Sh"> -<h1 class="Sh" id="EXAMPLES"><a class="permalink" href="#EXAMPLES">EXAMPLES</a></h1> -<section class="Ss"> -<h2 class="Ss" id="TEST_PROGRAMS"><a class="permalink" href="#TEST_PROGRAMS">TEST - PROGRAMS</a></h2> -<p class="Pp"><code class="Nm">netmap</code> comes with a few programs that can - be used for testing or simple applications. See the - <span class="Pa">examples/</span> directory in - <code class="Nm">netmap</code> distributions, or - <span class="Pa">tools/tools/netmap/</span> directory in - <span class="Ux">FreeBSD</span> distributions.</p> -<p class="Pp"><a class="Xr">pkt-gen(8)</a> is a general purpose traffic - source/sink.</p> -<p class="Pp">As an example</p> -<div class="Bd Bd-indent"><code class="Li">pkt-gen -i ix0 -f tx -l - 60</code></div> -can generate an infinite stream of minimum size packets, and -<div class="Bd Bd-indent"><code class="Li">pkt-gen -i ix0 -f rx</code></div> -is a traffic sink. Both print traffic statistics, to help monitor how the system - performs. -<p class="Pp"><a class="Xr">pkt-gen(8)</a> has many options can be uses to set - packet sizes, addresses, rates, and use multiple send/receive threads and - cores.</p> -<p class="Pp"><a class="Xr">bridge(4)</a> is another test program which - interconnects two <code class="Nm">netmap</code> ports. It can be used for - transparent forwarding between interfaces, as in</p> -<div class="Bd Bd-indent"><code class="Li">bridge -i netmap:ix0 -i - netmap:ix1</code></div> -or even connect the NIC to the host stack using netmap -<div class="Bd Bd-indent"><code class="Li">bridge -i netmap:ix0</code></div> -</section> -<section class="Ss"> -<h2 class="Ss" id="USING_THE_NATIVE_API"><a class="permalink" href="#USING_THE_NATIVE_API">USING - THE NATIVE API</a></h2> -<p class="Pp">The following code implements a traffic generator:</p> -<p class="Pp"></p> -<div class="Bd Li"> -<pre>#include <net/netmap_user.h> -... -void sender(void) -{ - struct netmap_if *nifp; - struct netmap_ring *ring; - struct nmreq nmr; - struct pollfd fds; - - fd = open("/dev/netmap", O_RDWR); - bzero(&nmr, sizeof(nmr)); - strcpy(nmr.nr_name, "ix0"); - nmr.nm_version = NETMAP_API; - ioctl(fd, NIOCREGIF, &nmr); - p = mmap(0, nmr.nr_memsize, fd); - nifp = NETMAP_IF(p, nmr.nr_offset); - ring = NETMAP_TXRING(nifp, 0); - fds.fd = fd; - fds.events = POLLOUT; - for (;;) { - poll(&fds, 1, -1); - while (!nm_ring_empty(ring)) { - i = ring->cur; - buf = NETMAP_BUF(ring, ring->slot[i].buf_index); - ... prepare packet in buf ... - ring->slot[i].len = ... packet length ... - ring->head = ring->cur = nm_ring_next(ring, i); - } - } -}</pre> -</div> -</section> -<section class="Ss"> -<h2 class="Ss" id="HELPER_FUNCTIONS"><a class="permalink" href="#HELPER_FUNCTIONS">HELPER - FUNCTIONS</a></h2> -<p class="Pp">A simple receiver can be implemented using the helper - functions:</p> -<p class="Pp"></p> -<div class="Bd Li"> -<pre>#define NETMAP_WITH_LIBS -#include <net/netmap_user.h> -... -void receiver(void) -{ - struct nm_desc *d; - struct pollfd fds; - u_char *buf; - struct nm_pkthdr h; - ... - d = nm_open("netmap:ix0", NULL, 0, 0); - fds.fd = NETMAP_FD(d); - fds.events = POLLIN; - for (;;) { - poll(&fds, 1, -1); - while ( (buf = nm_nextpkt(d, &h)) ) - consume_pkt(buf, h.len); - } - nm_close(d); -}</pre> -</div> -</section> -<section class="Ss"> -<h2 class="Ss" id="ZERO-COPY_FORWARDING"><a class="permalink" href="#ZERO-COPY_FORWARDING">ZERO-COPY - FORWARDING</a></h2> -<p class="Pp">Since physical interfaces share the same memory region, it is - possible to do packet forwarding between ports swapping buffers. The buffer - from the transmit ring is used to replenish the receive ring:</p> -<p class="Pp"></p> -<div class="Bd Li"> -<pre> uint32_t tmp; - struct netmap_slot *src, *dst; - ... - src = &src_ring->slot[rxr->cur]; - dst = &dst_ring->slot[txr->cur]; - tmp = dst->buf_idx; - dst->buf_idx = src->buf_idx; - dst->len = src->len; - dst->flags = NS_BUF_CHANGED; - src->buf_idx = tmp; - src->flags = NS_BUF_CHANGED; - rxr->head = rxr->cur = nm_ring_next(rxr, rxr->cur); - txr->head = txr->cur = nm_ring_next(txr, txr->cur); - ...</pre> -</div> -</section> -<section class="Ss"> -<h2 class="Ss" id="ACCESSING_THE_HOST_STACK"><a class="permalink" href="#ACCESSING_THE_HOST_STACK">ACCESSING - THE HOST STACK</a></h2> -<p class="Pp">The host stack is for all practical purposes just a regular ring - pair, which you can access with the netmap API (e.g., with</p> -<div class="Bd Bd-indent"><code class="Li">nm_open("netmap:eth0^", - ...</code></div> -); All packets that the host would send to an interface in - <code class="Nm">netmap</code> mode end up into the RX ring, whereas all - packets queued to the TX ring are send up to the host stack. -</section> -<section class="Ss"> -<h2 class="Ss" id="VALE_SWITCH"><a class="permalink" href="#VALE_SWITCH">VALE - SWITCH</a></h2> -<p class="Pp">A simple way to test the performance of a - <code class="Nm">VALE</code> switch is to attach a sender and a receiver to - it, e.g., running the following in two different terminals:</p> -<div class="Bd Bd-indent"><code class="Li">pkt-gen -i vale1:a -f rx # - receiver</code></div> -<div class="Bd Bd-indent"><code class="Li">pkt-gen -i vale1:b -f tx # - sender</code></div> -The same example can be used to test netmap pipes, by simply changing port - names, e.g., -<div class="Bd Bd-indent"><code class="Li">pkt-gen -i vale2:x{3 -f rx # receiver - on the master side</code></div> -<div class="Bd Bd-indent"><code class="Li">pkt-gen -i vale2:x}3 -f tx # sender - on the slave side</code></div> -<p class="Pp">The following command attaches an interface and the host stack to - a switch:</p> -<div class="Bd Bd-indent"><code class="Li">valectl -h vale2:em0</code></div> -Other <code class="Nm">netmap</code> clients attached to the same switch can now - communicate with the network card or the host. -</section> -</section> -<section class="Sh"> -<h1 class="Sh" id="SEE_ALSO"><a class="permalink" href="#SEE_ALSO">SEE - ALSO</a></h1> -<p class="Pp"><a class="Xr">vale(4)</a>, <a class="Xr">bridge(8)</a>, - <a class="Xr">lb(8)</a>, <a class="Xr">nmreplay(8)</a>, - <a class="Xr">pkt-gen(8)</a>, <a class="Xr">valectl(8)</a></p> -<p class="Pp"><span class="Pa">http://info.iet.unipi.it/~luigi/netmap/</span></p> -<p class="Pp">Luigi Rizzo, Revisiting network I/O APIs: the netmap framework, - Communications of the ACM, 55 (3), pp.45-51, March 2012</p> -<p class="Pp">Luigi Rizzo, netmap: a novel framework for fast packet I/O, Usenix - ATC'12, June 2012, Boston</p> -<p class="Pp">Luigi Rizzo, Giuseppe Lettieri, VALE, a switched ethernet for - virtual machines, ACM CoNEXT'12, December 2012, Nice</p> -<p class="Pp">Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione, Speeding up - packet I/O in virtual machines, ACM/IEEE ANCS'13, October 2013, San Jose</p> -</section> -<section class="Sh"> -<h1 class="Sh" id="AUTHORS"><a class="permalink" href="#AUTHORS">AUTHORS</a></h1> -<p class="Pp">The <code class="Nm">netmap</code> framework has been originally - designed and implemented at the Universita` di Pisa in 2011 by - <span class="An">Luigi Rizzo</span>, and further extended with help from - <span class="An">Matteo Landi</span>, <span class="An">Gaetano - Catalli</span>, <span class="An">Giuseppe Lettieri</span>, and - <span class="An">Vincenzo Maffione</span>.</p> -<p class="Pp"><code class="Nm">netmap</code> and <code class="Nm">VALE</code> - have been funded by the European Commission within FP7 Projects CHANGE - (257422) and OPENLAB (287581).</p> -</section> -<section class="Sh"> -<h1 class="Sh" id="CAVEATS"><a class="permalink" href="#CAVEATS">CAVEATS</a></h1> -<p class="Pp">No matter how fast the CPU and OS are, achieving line rate on 10G - and faster interfaces requires hardware with sufficient performance. Several - NICs are unable to sustain line rate with small packet sizes. Insufficient - PCIe or memory bandwidth can also cause reduced performance.</p> -<p class="Pp">Another frequent reason for low performance is the use of flow - control on the link: a slow receiver can limit the transmit speed. Be sure - to disable flow control when running high speed experiments.</p> -<section class="Ss"> -<h2 class="Ss" id="SPECIAL_NIC_FEATURES"><a class="permalink" href="#SPECIAL_NIC_FEATURES">SPECIAL - NIC FEATURES</a></h2> -<p class="Pp"><code class="Nm">netmap</code> is orthogonal to some NIC features - such as multiqueue, schedulers, packet filters.</p> -<p class="Pp">Multiple transmit and receive rings are supported natively and can - be configured with ordinary OS tools, such as <a class="Xr">ethtool(8)</a> - or device-specific sysctl variables. The same goes for Receive Packet - Steering (RPS) and filtering of incoming traffic.</p> -<p class="Pp" id="does"><code class="Nm">netmap</code> - <a class="permalink" href="#does"><i class="Em">does not use</i></a> - features such as - <a class="permalink" href="#checksum"><i class="Em" id="checksum">checksum - offloading</i></a>, - <a class="permalink" href="#TCP"><i class="Em" id="TCP">TCP segmentation - offloading</i></a>, - <a class="permalink" href="#encryption"><i class="Em" id="encryption">encryption</i></a>, - <a class="permalink" href="#VLAN"><i class="Em" id="VLAN">VLAN - encapsulation/decapsulation</i></a>, etc. When using netmap to exchange - packets with the host stack, make sure to disable these features.</p> -</section> -</section> -</div> -<table class="foot"> - <tr> - <td class="foot-date">October 10, 2024</td> - <td class="foot-os">FreeBSD 15.0</td> - </tr> -</table> |
