diff options
Diffstat (limited to 'static/freebsd/man4/netmap.4 3.html')
| -rw-r--r-- | static/freebsd/man4/netmap.4 3.html | 1015 |
1 files changed, 1015 insertions, 0 deletions
diff --git a/static/freebsd/man4/netmap.4 3.html b/static/freebsd/man4/netmap.4 3.html new file mode 100644 index 00000000..e0cbff0e --- /dev/null +++ b/static/freebsd/man4/netmap.4 3.html @@ -0,0 +1,1015 @@ +<table class="head"> + <tr> + <td class="head-ltitle">NETMAP(4)</td> + <td class="head-vol">Device Drivers Manual</td> + <td class="head-rtitle">NETMAP(4)</td> + </tr> +</table> +<div class="manual-text"> +<section class="Sh"> +<h1 class="Sh" id="NAME"><a class="permalink" href="#NAME">NAME</a></h1> +<p class="Pp"><code class="Nm">netmap</code> — <span class="Nd">a + framework for fast packet I/O</span></p> +</section> +<section class="Sh"> +<h1 class="Sh" id="SYNOPSIS"><a class="permalink" href="#SYNOPSIS">SYNOPSIS</a></h1> +<p class="Pp"><code class="Cd">device netmap</code></p> +</section> +<section class="Sh"> +<h1 class="Sh" id="DESCRIPTION"><a class="permalink" href="#DESCRIPTION">DESCRIPTION</a></h1> +<p class="Pp"><code class="Nm">netmap</code> is a framework for extremely fast + and efficient packet I/O for userspace and kernel clients, and for Virtual + Machines. It runs on <span class="Ux">FreeBSD</span>, Linux and some + versions of Windows, and supports a variety of <code class="Nm">netmap + ports</code>, including</p> +<dl class="Bl-tag"> + <dt><code class="Nm">physical NIC ports</code></dt> + <dd>to access individual queues of network interfaces;</dd> + <dt><code class="Nm">host ports</code></dt> + <dd>to inject packets into the host stack;</dd> + <dt><code class="Nm">VALE ports</code></dt> + <dd>implementing a very fast and modular in-kernel software + switch/dataplane;</dd> + <dt><code class="Nm">netmap pipes</code></dt> + <dd>a shared memory packet transport channel;</dd> + <dt><code class="Nm">netmap monitors</code></dt> + <dd>a mechanism similar to <a class="Xr">bpf(4)</a> to capture traffic</dd> +</dl> +<p class="Pp">All these <code class="Nm">netmap ports</code> are accessed + interchangeably with the same API, and are at least one order of magnitude + faster than standard OS mechanisms (sockets, bpf, tun/tap interfaces, native + switches, pipes). With suitably fast hardware (NICs, PCIe buses, CPUs), + packet I/O using <code class="Nm">netmap</code> on supported NICs reaches + 14.88 million packets per second (Mpps) with much less than one core on 10 + Gbit/s NICs; 35-40 Mpps on 40 Gbit/s NICs (limited by the hardware); about + 20 Mpps per core for VALE ports; and over 100 Mpps for + <code class="Nm">netmap pipes</code>. NICs without native + <code class="Nm">netmap</code> support can still use the API in emulated + mode, which uses unmodified device drivers and is 3-5 times faster than + <a class="Xr">bpf(4)</a> or raw sockets.</p> +<p class="Pp">Userspace clients can dynamically switch NICs into + <code class="Nm">netmap</code> mode and send and receive raw packets through + memory mapped buffers. Similarly, <code class="Nm">VALE</code> switch + instances and ports, <code class="Nm">netmap pipes</code> and + <code class="Nm">netmap monitors</code> can be created dynamically, + providing high speed packet I/O between processes, virtual machines, NICs + and the host stack.</p> +<p class="Pp"><code class="Nm">netmap</code> supports both non-blocking I/O + through <a class="Xr">ioctl(2)</a>, synchronization and blocking I/O through + a file descriptor and standard OS mechanisms such as + <a class="Xr">select(2)</a>, <a class="Xr">poll(2)</a>, + <a class="Xr">kqueue(2)</a> and <a class="Xr">epoll(7)</a>. All types of + <code class="Nm">netmap ports</code> and the <code class="Nm">VALE + switch</code> are implemented by a single kernel module, which also emulates + the <code class="Nm">netmap</code> API over standard drivers. For best + performance, <code class="Nm">netmap</code> requires native support in + device drivers. A list of such devices is at the end of this document.</p> +<p class="Pp">In the rest of this (long) manual page we document various aspects + of the <code class="Nm">netmap</code> and <code class="Nm">VALE</code> + architecture, features and usage.</p> +</section> +<section class="Sh"> +<h1 class="Sh" id="ARCHITECTURE"><a class="permalink" href="#ARCHITECTURE">ARCHITECTURE</a></h1> +<p class="Pp"><code class="Nm">netmap</code> supports raw packet I/O through a + <a class="permalink" href="#port"><i class="Em" id="port">port</i></a>, + which can be connected to a physical interface + (<a class="permalink" href="#NIC"><i class="Em" id="NIC">NIC</i></a>), to + the host stack, or to a <code class="Nm">VALE</code> switch. Ports use + preallocated circular queues of buffers + (<a class="permalink" href="#rings"><i class="Em" id="rings">rings</i></a>) + residing in an mmapped region. There is one ring for each transmit/receive + queue of a NIC or virtual port. An additional ring pair connects to the host + stack.</p> +<p class="Pp">After binding a file descriptor to a port, a + <code class="Nm">netmap</code> client can send or receive packets in batches + through the rings, and possibly implement zero-copy forwarding between + ports.</p> +<p class="Pp">All NICs operating in <code class="Nm">netmap</code> mode use the + same memory region, accessible to all processes who own + <span class="Pa">/dev/netmap</span> file descriptors bound to NICs. + Independent <code class="Nm">VALE</code> and <code class="Nm">netmap + pipe</code> ports by default use separate memory regions, but can be + independently configured to share memory.</p> +</section> +<section class="Sh"> +<h1 class="Sh" id="ENTERING_AND_EXITING_NETMAP_MODE"><a class="permalink" href="#ENTERING_AND_EXITING_NETMAP_MODE">ENTERING + AND EXITING NETMAP MODE</a></h1> +<p class="Pp">The following section describes the system calls to create and + control <code class="Nm">netmap</code> ports (including + <code class="Nm">VALE</code> and <code class="Nm">netmap pipe</code> ports). + Simpler, higher level functions are described in the + <a class="Sx" href="#LIBRARIES">LIBRARIES</a> section.</p> +<p class="Pp">Ports and rings are created and controlled through a file + descriptor, created by opening a special device</p> +<div class="Bd Bd-indent"><code class="Li">fd = + open("/dev/netmap");</code></div> +and then bound to a specific port with an +<div class="Bd Bd-indent"><code class="Li">ioctl(fd, NIOCREGIF, (struct nmreq + *)arg);</code></div> +<p class="Pp"><code class="Nm">netmap</code> has multiple modes of operation + controlled by the <var class="Vt">struct nmreq</var> argument. + <var class="Va">arg.nr_name</var> specifies the netmap port name, as + follows:</p> +<dl class="Bl-tag"> + <dt id="OS"><a class="permalink" href="#OS"><code class="Dv">OS network + interface name (e.g., 'em0', 'eth1', ...</code></a>)</dt> + <dd>the data path of the NIC is disconnected from the host stack, and the file + descriptor is bound to the NIC (one or all queues), or to the host + stack;</dd> + <dt id="valeSSS:PPP"><a class="permalink" href="#valeSSS:PPP"><code class="Dv">valeSSS:PPP</code></a></dt> + <dd>the file descriptor is bound to port PPP of VALE switch SSS. Switch + instances and ports are dynamically created if necessary. + <p class="Pp">Both SSS and PPP have the form [0-9a-zA-Z_]+ , the string + cannot exceed IFNAMSIZ characters, and PPP cannot be the name of any + existing OS network interface.</p> + </dd> +</dl> +<p class="Pp">On return, <var class="Va">arg</var> indicates the size of the + shared memory region, and the number, size and location of all the + <code class="Nm">netmap</code> data structures, which can be accessed by + mmapping the memory</p> +<div class="Bd Bd-indent"><code class="Li">char *mem = mmap(0, arg.nr_memsize, + fd);</code></div> +<p class="Pp">Non-blocking I/O is done with special <a class="Xr">ioctl(2)</a> + <a class="Xr">select(2)</a> and <a class="Xr">poll(2)</a> on the file + descriptor permit blocking I/O.</p> +<p class="Pp">While a NIC is in <code class="Nm">netmap</code> mode, the OS will + still believe the interface is up and running. OS-generated packets for that + NIC end up into a <code class="Nm">netmap</code> ring, and another ring is + used to send packets into the OS network stack. A <a class="Xr">close(2)</a> + on the file descriptor removes the binding, and returns the NIC to normal + mode (reconnecting the data path to the host stack), or destroys the virtual + port.</p> +</section> +<section class="Sh"> +<h1 class="Sh" id="DATA_STRUCTURES"><a class="permalink" href="#DATA_STRUCTURES">DATA + STRUCTURES</a></h1> +<p class="Pp">The data structures in the mmapped memory region are detailed in + <code class="In"><<a class="In">sys/net/netmap.h</a>></code>, which is + the ultimate reference for the <code class="Nm">netmap</code> API. The main + structures and fields are indicated below:</p> +<dl class="Bl-tag"> + <dt id="struct"><a class="permalink" href="#struct"><code class="Dv">struct + netmap_if (one per interface</code></a>)</dt> + <dd> + <div class="Bd Pp Li"> + <pre>struct netmap_if { + ... + const uint32_t ni_flags; /* properties */ + ... + const uint32_t ni_tx_rings; /* NIC tx rings */ + const uint32_t ni_rx_rings; /* NIC rx rings */ + uint32_t ni_bufs_head; /* head of extra bufs list */ + ... +};</pre> + </div> + <p class="Pp">Indicates the number of available rings + (<span class="Pa">struct netmap_rings</span>) and their position in the + mmapped region. The number of tx and rx rings + (<span class="Pa">ni_tx_rings</span>, + <span class="Pa">ni_rx_rings</span>) normally depends on the hardware. + NICs also have an extra tx/rx ring pair connected to the host stack. + <i class="Em">NIOCREGIF</i> can also request additional unbound buffers + in the same memory space, to be used as temporary storage for packets. + The number of extra buffers is specified in the + <var class="Va">arg.nr_arg3</var> field. On success, the kernel writes + back to <var class="Va">arg.nr_arg3</var> the number of extra buffers + actually allocated (they may be less than the amount requested if the + memory space ran out of buffers). <span class="Pa">ni_bufs_head</span> + contains the index of the first of these extra buffers, which are + connected in a list (the first uint32_t of each buffer being the index + of the next buffer in the list). A <code class="Dv">0</code> indicates + the end of the list. The application is free to modify this list and use + the buffers (i.e., binding them to the slots of a netmap ring). When + closing the netmap file descriptor, the kernel frees the buffers + contained in the list pointed by <span class="Pa">ni_bufs_head</span> , + irrespectively of the buffers originally provided by the kernel on + <i class="Em">NIOCREGIF</i>.</p> + </dd> + <dt id="struct~2"><a class="permalink" href="#struct~2"><code class="Dv">struct + netmap_ring (one per ring</code></a>)</dt> + <dd> + <div class="Bd Pp Li"> + <pre>struct netmap_ring { + ... + const uint32_t num_slots; /* slots in each ring */ + const uint32_t nr_buf_size; /* size of each buffer */ + ... + uint32_t head; /* (u) first buf owned by user */ + uint32_t cur; /* (u) wakeup position */ + const uint32_t tail; /* (k) first buf owned by kernel */ + ... + uint32_t flags; + struct timeval ts; /* (k) time of last rxsync() */ + ... + struct netmap_slot slot[0]; /* array of slots */ +}</pre> + </div> + <p class="Pp" id="slots">Implements transmit and receive rings, with + read/write pointers, metadata and an array of + <a class="permalink" href="#slots"><i class="Em">slots</i></a> + describing the buffers.</p> + </dd> + <dt id="struct~3"><a class="permalink" href="#struct~3"><code class="Dv">struct + netmap_slot (one per buffer</code></a>)</dt> + <dd> + <div class="Bd Pp Li"> + <pre>struct netmap_slot { + uint32_t buf_idx; /* buffer index */ + uint16_t len; /* packet length */ + uint16_t flags; /* buf changed, etc. */ + uint64_t ptr; /* address for indirect buffers */ +};</pre> + </div> + <p class="Pp">Describes a packet buffer, which normally is identified by an + index and resides in the mmapped region.</p> + </dd> + <dt id="packet"><a class="permalink" href="#packet"><code class="Dv">packet + buffers</code></a></dt> + <dd>Fixed size (normally 2 KB) packet buffers allocated by the kernel.</dd> +</dl> +<p class="Pp">The offset of the <span class="Pa">struct netmap_if</span> in the + mmapped region is indicated by the <span class="Pa">nr_offset</span> field + in the structure returned by <code class="Dv">NIOCREGIF</code>. From there, + all other objects are reachable through relative references (offsets or + indexes). Macros and functions in + <code class="In"><<a class="In">net/netmap_user.h</a>></code> help + converting them into actual pointers:</p> +<p class="Pp"></p> +<div class="Bd Bd-indent"><code class="Li">struct netmap_if *nifp = + NETMAP_IF(mem, arg.nr_offset);</code></div> +<div class="Bd Bd-indent"><code class="Li">struct netmap_ring *txr = + NETMAP_TXRING(nifp, ring_index);</code></div> +<div class="Bd Bd-indent"><code class="Li">struct netmap_ring *rxr = + NETMAP_RXRING(nifp, ring_index);</code></div> +<p class="Pp"></p> +<div class="Bd Bd-indent"><code class="Li">char *buf = NETMAP_BUF(ring, + buffer_index);</code></div> +</section> +<section class="Sh"> +<h1 class="Sh" id="RINGS,_BUFFERS_AND_DATA_I/O"><a class="permalink" href="#RINGS,_BUFFERS_AND_DATA_I/O">RINGS, + BUFFERS AND DATA I/O</a></h1> +<p class="Pp"><var class="Va">Rings</var> are circular queues of packets with + three indexes/pointers (<var class="Va">head</var>, + <var class="Va">cur</var>, <var class="Va">tail</var>); one slot is always + kept empty. The ring size (<var class="Va">num_slots</var>) should not be + assumed to be a power of two.</p> +<p class="Pp"><var class="Va">head</var> is the first slot available to + userspace;</p> +<p class="Pp"><var class="Va">cur</var> is the wakeup point: select/poll will + unblock when <var class="Va">tail</var> passes + <var class="Va">cur</var>;</p> +<p class="Pp"><var class="Va">tail</var> is the first slot reserved to the + kernel.</p> +<p class="Pp">Slot indexes <i class="Em">must</i> only move forward; for + convenience, the function</p> +<div class="Bd Bd-indent"><code class="Li">nm_ring_next(ring, + index)</code></div> +returns the next index modulo the ring size. +<p class="Pp"><var class="Va">head</var> and <var class="Va">cur</var> are only + modified by the user program; <var class="Va">tail</var> is only modified by + the kernel. The kernel only reads/writes the <var class="Vt">struct + netmap_ring</var> slots and buffers during the execution of a netmap-related + system call. The only exception are slots (and buffers) in the range + <var class="Va">tail </var>... <var class="Va">head-1</var>, that are + explicitly assigned to the kernel.</p> +<section class="Ss"> +<h2 class="Ss" id="TRANSMIT_RINGS"><a class="permalink" href="#TRANSMIT_RINGS">TRANSMIT + RINGS</a></h2> +<p class="Pp">On transmit rings, after a <code class="Nm">netmap</code> system + call, slots in the range <var class="Va">head </var>... + <var class="Va">tail-1</var> are available for transmission. User code + should fill the slots sequentially and advance <var class="Va">head</var> + and <var class="Va">cur</var> past slots ready to transmit. + <var class="Va">cur</var> may be moved further ahead if the user code needs + more slots before further transmissions (see + <a class="Sx" href="#SCATTER_GATHER_I/O">SCATTER GATHER I/O</a>).</p> +<p class="Pp">At the next NIOCTXSYNC/select()/poll(), slots up to + <var class="Va">head-1</var> are pushed to the port, and + <var class="Va">tail</var> may advance if further slots have become + available. Below is an example of the evolution of a TX ring:</p> +<div class="Bd Pp Li"> +<pre> after the syscall, slots between cur and tail are (a)vailable + head=cur tail + | | + v v + TX [.....aaaaaaaaaaa.............] + + user creates new packets to (T)ransmit + head=cur tail + | | + v v + TX [.....TTTTTaaaaaa.............] + + NIOCTXSYNC/poll()/select() sends packets and reports new slots + head=cur tail + | | + v v + TX [..........aaaaaaaaaaa........]</pre> +</div> +<p class="Pp" id="select"><a class="permalink" href="#select"><code class="Fn">select</code></a>() + and + <a class="permalink" href="#poll"><code class="Fn" id="poll">poll</code></a>() + will block if there is no space in the ring, i.e.,</p> +<div class="Bd Bd-indent"><code class="Li">ring->cur == + ring->tail</code></div> +and return when new slots have become available. +<p class="Pp">High speed applications may want to amortize the cost of system + calls by preparing as many packets as possible before issuing them.</p> +<p class="Pp">A transmit ring with pending transmissions has</p> +<div class="Bd Bd-indent"><code class="Li">ring->head != ring->tail + 1 + (modulo the ring size).</code></div> +The function <var class="Va">int nm_tx_pending(ring)</var> implements this test. +</section> +<section class="Ss"> +<h2 class="Ss" id="RECEIVE_RINGS"><a class="permalink" href="#RECEIVE_RINGS">RECEIVE + RINGS</a></h2> +<p class="Pp">On receive rings, after a <code class="Nm">netmap</code> system + call, the slots in the range <var class="Va">head</var>... + <var class="Va">tail-1</var> contain received packets. User code should + process them and advance <var class="Va">head</var> and + <var class="Va">cur</var> past slots it wants to return to the kernel. + <var class="Va">cur</var> may be moved further ahead if the user code wants + to wait for more packets without returning all the previous slots to the + kernel.</p> +<p class="Pp">At the next NIOCRXSYNC/select()/poll(), slots up to + <var class="Va">head-1</var> are returned to the kernel for further + receives, and <var class="Va">tail</var> may advance to report new incoming + packets.</p> +<p class="Pp">Below is an example of the evolution of an RX ring:</p> +<div class="Bd Pp Li"> +<pre> after the syscall, there are some (h)eld and some (R)eceived slots + head cur tail + | | | + v v v + RX [..hhhhhhRRRRRRRR..........] + + user advances head and cur, releasing some slots and holding others + head cur tail + | | | + v v v + RX [..*****hhhRRRRRR...........] + + NICRXSYNC/poll()/select() recovers slots and reports new packets + head cur tail + | | | + v v v + RX [.......hhhRRRRRRRRRRRR....]</pre> +</div> +</section> +</section> +<section class="Sh"> +<h1 class="Sh" id="SLOTS_AND_PACKET_BUFFERS"><a class="permalink" href="#SLOTS_AND_PACKET_BUFFERS">SLOTS + AND PACKET BUFFERS</a></h1> +<p class="Pp">Normally, packets should be stored in the netmap-allocated buffers + assigned to slots when ports are bound to a file descriptor. One packet is + fully contained in a single buffer.</p> +<p class="Pp">The following flags affect slot and buffer processing:</p> +<dl class="Bl-tag"> + <dt id="must">NS_BUF_CHANGED</dt> + <dd><a class="permalink" href="#must"><i class="Em">must</i></a> be used when + the <var class="Va">buf_idx</var> in the slot is changed. This can be used + to implement zero-copy forwarding, see + <a class="Sx" href="#ZERO_COPY_FORWARDING">ZERO-COPY FORWARDING</a>.</dd> + <dt>NS_REPORT</dt> + <dd>reports when this buffer has been transmitted. Normally, + <code class="Nm">netmap</code> notifies transmit completions in batches, + hence signals can be delayed indefinitely. This flag helps detect when + packets have been sent and a file descriptor can be closed.</dd> + <dt>NS_FORWARD</dt> + <dd>When a ring is in 'transparent' mode, packets marked with this flag by the + user application are forwarded to the other endpoint at the next system + call, thus restoring (in a selective way) the connection between a NIC and + the host stack.</dd> + <dt>NS_NO_LEARN</dt> + <dd>tells the forwarding code that the source MAC address for this packet must + not be used in the learning bridge code.</dd> + <dt>NS_INDIRECT</dt> + <dd>indicates that the packet's payload is in a user-supplied buffer whose + user virtual address is in the 'ptr' field of the slot. The size can reach + 65535 bytes. + <p class="Pp">This is only supported on the transmit ring of + <code class="Nm">VALE</code> ports, and it helps reducing data copies in + the interconnection of virtual machines.</p> + </dd> + <dt>NS_MOREFRAG</dt> + <dd>indicates that the packet continues with subsequent buffers; the last + buffer in a packet must have the flag clear.</dd> +</dl> +</section> +<section class="Sh"> +<h1 class="Sh" id="SCATTER_GATHER_I/O"><a class="permalink" href="#SCATTER_GATHER_I/O">SCATTER + GATHER I/O</a></h1> +<p class="Pp">Packets can span multiple slots if the + <var class="Va">NS_MOREFRAG</var> flag is set in all but the last slot. The + maximum length of a chain is 64 buffers. This is normally used with + <code class="Nm">VALE</code> ports when connecting virtual machines, as they + generate large TSO segments that are not split unless they reach a physical + device.</p> +<p class="Pp">NOTE: The length field always refers to the individual fragment; + there is no place with the total length of a packet.</p> +<p class="Pp">On receive rings the macro <var class="Va">NS_RFRAGS(slot)</var> + indicates the remaining number of slots for this packet, including the + current one. Slots with a value greater than 1 also have NS_MOREFRAG + set.</p> +</section> +<section class="Sh"> +<h1 class="Sh" id="IOCTLS"><a class="permalink" href="#IOCTLS">IOCTLS</a></h1> +<p class="Pp"><code class="Nm">netmap</code> uses two ioctls (NIOCTXSYNC, + NIOCRXSYNC) for non-blocking I/O. They take no argument. Two more ioctls + (NIOCGINFO, NIOCREGIF) are used to query and configure ports, with the + following argument:</p> +<div class="Bd Pp Li"> +<pre>struct nmreq { + char nr_name[IFNAMSIZ]; /* (i) port name */ + uint32_t nr_version; /* (i) API version */ + uint32_t nr_offset; /* (o) nifp offset in mmap region */ + uint32_t nr_memsize; /* (o) size of the mmap region */ + uint32_t nr_tx_slots; /* (i/o) slots in tx rings */ + uint32_t nr_rx_slots; /* (i/o) slots in rx rings */ + uint16_t nr_tx_rings; /* (i/o) number of tx rings */ + uint16_t nr_rx_rings; /* (i/o) number of rx rings */ + uint16_t nr_ringid; /* (i/o) ring(s) we care about */ + uint16_t nr_cmd; /* (i) special command */ + uint16_t nr_arg1; /* (i/o) extra arguments */ + uint16_t nr_arg2; /* (i/o) extra arguments */ + uint32_t nr_arg3; /* (i/o) extra arguments */ + uint32_t nr_flags /* (i/o) open mode */ + ... +};</pre> +</div> +<p class="Pp">A file descriptor obtained through + <span class="Pa">/dev/netmap</span> also supports the ioctl supported by + network devices, see <a class="Xr">netintro(4)</a>.</p> +<dl class="Bl-tag"> + <dt id="NIOCGINFO"><a class="permalink" href="#NIOCGINFO"><code class="Dv">NIOCGINFO</code></a></dt> + <dd>returns EINVAL if the named port does not support netmap. Otherwise, it + returns 0 and (advisory) information about the port. Note that all the + information below can change before the interface is actually put in + netmap mode. + <dl class="Bl-tag"> + <dt><span class="Pa">nr_memsize</span></dt> + <dd>indicates the size of the <code class="Nm">netmap</code> memory + region. NICs in <code class="Nm">netmap</code> mode all share the same + memory region, whereas <code class="Nm">VALE</code> ports have + independent regions for each port.</dd> + <dt><span class="Pa">nr_tx_slots</span>, + <span class="Pa">nr_rx_slots</span></dt> + <dd>indicate the size of transmit and receive rings.</dd> + <dt><span class="Pa">nr_tx_rings</span>, + <span class="Pa">nr_rx_rings</span></dt> + <dd>indicate the number of transmit and receive rings. Both ring number + and sizes may be configured at runtime using interface-specific + functions (e.g., <a class="Xr">ethtool(8)</a> ).</dd> + </dl> + </dd> + <dt id="NIOCREGIF"><a class="permalink" href="#NIOCREGIF"><code class="Dv">NIOCREGIF</code></a></dt> + <dd>binds the port named in <var class="Va">nr_name</var> to the file + descriptor. For a physical device this also switches it into + <code class="Nm">netmap</code> mode, disconnecting it from the host stack. + Multiple file descriptors can be bound to the same port, with proper + synchronization left to the user. + <p class="Pp">The recommended way to bind a file descriptor to a port is to + use function <var class="Va">nm_open(..)</var> (see + <a class="Sx" href="#LIBRARIES">LIBRARIES</a>) which parses names to + access specific port types and enable features. In the following we + document the main features.</p> + <p class="Pp" id="netmap"><code class="Dv">NIOCREGIF can also bind a file + descriptor to one endpoint of a</code> + <a class="permalink" href="#netmap"><i class="Em">netmap pipe</i></a>, + consisting of two netmap ports with a crossover connection. A netmap + pipe share the same memory space of the parent port, and is meant to + enable configuration where a master process acts as a dispatcher towards + slave processes.</p> + <p class="Pp">To enable this function, the <span class="Pa">nr_arg1</span> + field of the structure can be used as a hint to the kernel to indicate + how many pipes we expect to use, and reserve extra space in the memory + region.</p> + <p class="Pp">On return, it gives the same info as NIOCGINFO, with + <span class="Pa">nr_ringid</span> and <span class="Pa">nr_flags</span> + indicating the identity of the rings controlled through the file + descriptor.</p> + <p class="Pp"><var class="Va">nr_flags</var> <var class="Va">nr_ringid</var> + selects which rings are controlled through this file descriptor. + Possible values of <span class="Pa">nr_flags</span> are indicated below, + together with the naming schemes that application libraries (such as the + <code class="Nm">nm_open</code> indicated below) can use to indicate the + specific set of rings. In the example below, "netmap:foo" is + any valid netmap port name.</p> + <dl class="Bl-tag"> + <dt>NR_REG_ALL_NIC netmap:foo</dt> + <dd>(default) all hardware ring pairs</dd> + <dt>NR_REG_SW netmap:foo^</dt> + <dd>the ``host rings'', connecting to the host stack.</dd> + <dt>NR_REG_NIC_SW netmap:foo*</dt> + <dd>all hardware rings and the host rings</dd> + <dt>NR_REG_ONE_NIC netmap:foo-i</dt> + <dd>only the i-th hardware ring pair, where the number is in + <span class="Pa">nr_ringid</span>;</dd> + <dt>NR_REG_PIPE_MASTER netmap:foo{i</dt> + <dd>the master side of the netmap pipe whose identifier (i) is in + <span class="Pa">nr_ringid</span>;</dd> + <dt>NR_REG_PIPE_SLAVE netmap:foo}i</dt> + <dd>the slave side of the netmap pipe whose identifier (i) is in + <span class="Pa">nr_ringid</span>. + <p class="Pp">The identifier of a pipe must be thought as part of the + pipe name, and does not need to be sequential. On return the pipe + will only have a single ring pair with index 0, irrespective of the + value of <var class="Va">i</var>.</p> + </dd> + </dl> + <p class="Pp">By default, a <a class="Xr">poll(2)</a> or + <a class="Xr">select(2)</a> call pushes out any pending packets on the + transmit ring, even if no write events are specified. The feature can be + disabled by or-ing <var class="Va">NETMAP_NO_TX_POLL</var> to the value + written to <var class="Va">nr_ringid</var>. When this feature is used, + packets are transmitted only on <var class="Va">ioctl(NIOCTXSYNC)</var> + or <var class="Va">select() /</var> <var class="Va">poll()</var> are + called with a write event (POLLOUT/wfdset) or a full ring.</p> + <p class="Pp">When registering a virtual interface that is dynamically + created to a <code class="Nm">VALE</code> switch, we can specify the + desired number of rings (1 by default, and currently up to 16) on it + using nr_tx_rings and nr_rx_rings fields.</p> + </dd> + <dt id="NIOCTXSYNC"><a class="permalink" href="#NIOCTXSYNC"><code class="Dv">NIOCTXSYNC</code></a></dt> + <dd>tells the hardware of new packets to transmit, and updates the number of + slots available for transmission.</dd> + <dt id="NIOCRXSYNC"><a class="permalink" href="#NIOCRXSYNC"><code class="Dv">NIOCRXSYNC</code></a></dt> + <dd>tells the hardware of consumed packets, and asks for newly available + packets.</dd> +</dl> +</section> +<section class="Sh"> +<h1 class="Sh" id="SELECT,_POLL,_EPOLL,_KQUEUE"><a class="permalink" href="#SELECT,_POLL,_EPOLL,_KQUEUE">SELECT, + POLL, EPOLL, KQUEUE</a></h1> +<p class="Pp"><a class="Xr">select(2)</a> and <a class="Xr">poll(2)</a> on a + <code class="Nm">netmap</code> file descriptor process rings as indicated in + <a class="Sx" href="#TRANSMIT_RINGS">TRANSMIT RINGS</a> and + <a class="Sx" href="#RECEIVE_RINGS">RECEIVE RINGS</a>, respectively when + write (POLLOUT) and read (POLLIN) events are requested. Both block if no + slots are available in the ring (<var class="Va">ring->cur == + ring->tail</var>). Depending on the platform, <a class="Xr">epoll(7)</a> + and <a class="Xr">kqueue(2)</a> are supported too.</p> +<p class="Pp">Packets in transmit rings are normally pushed out (and buffers + reclaimed) even without requesting write events. Passing the + <code class="Dv">NETMAP_NO_TX_POLL</code> flag to + <i class="Em">NIOCREGIF</i> disables this feature. By default, receive rings + are processed only if read events are requested. Passing the + <code class="Dv">NETMAP_DO_RX_POLL</code> flag to <i class="Em">NIOCREGIF + updates receive rings even without read events.</i> Note that on + <a class="Xr">epoll(7)</a> and <a class="Xr">kqueue(2)</a>, + <code class="Dv">NETMAP_NO_TX_POLL</code> and + <code class="Dv">NETMAP_DO_RX_POLL</code> only have an effect when some + event is posted for the file descriptor.</p> +</section> +<section class="Sh"> +<h1 class="Sh" id="LIBRARIES"><a class="permalink" href="#LIBRARIES">LIBRARIES</a></h1> +<p class="Pp">The <code class="Nm">netmap</code> API is supposed to be used + directly, both because of its simplicity and for efficient integration with + applications.</p> +<p class="Pp">For convenience, the + <code class="In"><<a class="In">net/netmap_user.h</a>></code> header + provides a few macros and functions to ease creating a file descriptor and + doing I/O with a <code class="Nm">netmap</code> port. These are loosely + modeled after the <a class="Xr">pcap(3)</a> API, to ease porting of + libpcap-based applications to <code class="Nm">netmap</code>. To use these + extra functions, programs should</p> +<div class="Bd Bd-indent"><code class="Li">#define NETMAP_WITH_LIBS</code></div> +before +<div class="Bd Bd-indent"><code class="Li">#include + <net/netmap_user.h></code></div> +<p class="Pp">The following functions are available:</p> +<dl class="Bl-tag"> + <dt id="struct~4"><var class="Va">struct nm_desc * nm_open(const char *ifname, + const struct nmreq *req, uint64_t flags, const struct nm_desc + *arg</var>)</dt> + <dd>similar to <a class="Xr">pcap_open_live(3)</a>, binds a file descriptor to + a port. + <dl class="Bl-tag"> + <dt id="ifname"><var class="Va">ifname</var></dt> + <dd>is a port name, in the form "netmap:PPP" for a NIC and + "valeSSS:PPP" for a <code class="Nm">VALE</code> port.</dd> + <dt id="req"><var class="Va">req</var></dt> + <dd>provides the initial values for the argument to the NIOCREGIF ioctl. + The nm_flags and nm_ringid values are overwritten by parsing ifname + and flags, and other fields can be overridden through the other two + arguments.</dd> + <dt id="arg"><var class="Va">arg</var></dt> + <dd>points to a struct nm_desc containing arguments (e.g., from a + previously open file descriptor) that should override the defaults. + The fields are used as described below</dd> + <dt id="flags"><var class="Va">flags</var></dt> + <dd>can be set to a combination of the following flags: + <var class="Va">NETMAP_NO_TX_POLL</var>, + <var class="Va">NETMAP_DO_RX_POLL</var> (copied into nr_ringid); + <var class="Va">NM_OPEN_NO_MMAP</var> (if arg points to the same + memory region, avoids the mmap and uses the values from it); + <var class="Va">NM_OPEN_IFNAME</var> (ignores ifname and uses the + values in arg); <var class="Va">NM_OPEN_ARG1</var>, + <var class="Va">NM_OPEN_ARG2</var>, <var class="Va">NM_OPEN_ARG3</var> + (uses the fields from arg); <var class="Va">NM_OPEN_RING_CFG</var> + (uses the ring number and sizes from arg).</dd> + </dl> + </dd> + <dt id="int"><var class="Va">int nm_close(struct nm_desc *d</var>)</dt> + <dd>closes the file descriptor, unmaps memory, frees resources.</dd> + <dt id="int~2"><var class="Va">int nm_inject(struct nm_desc *d, const void + *buf, size_t size</var>)</dt> + <dd>similar to <var class="Va">pcap_inject()</var>, pushes a packet to a ring, + returns the size of the packet is successful, or 0 on error;</dd> + <dt id="int~3"><var class="Va">int nm_dispatch(struct nm_desc *d, int cnt, + nm_cb_t cb, u_char *arg</var>)</dt> + <dd>similar to <var class="Va">pcap_dispatch()</var>, applies a callback to + incoming packets</dd> + <dt id="u_char"><var class="Va">u_char * nm_nextpkt(struct nm_desc *d, struct + nm_pkthdr *hdr</var>)</dt> + <dd>similar to <var class="Va">pcap_next()</var>, fetches the next packet</dd> +</dl> +</section> +<section class="Sh"> +<h1 class="Sh" id="SUPPORTED_DEVICES"><a class="permalink" href="#SUPPORTED_DEVICES">SUPPORTED + DEVICES</a></h1> +<p class="Pp"><code class="Nm">netmap</code> natively supports the following + devices:</p> +<p class="Pp">On <span class="Ux">FreeBSD</span>: <a class="Xr">cxgbe(4)</a>, + <a class="Xr">em(4)</a>, <a class="Xr">iflib(4)</a> (providing + <a class="Xr">igb(4)</a> and <a class="Xr">em(4)</a>), + <a class="Xr">ix(4)</a>, <a class="Xr">ixl(4)</a>, <a class="Xr">re(4)</a>, + <a class="Xr">vtnet(4)</a>.</p> +<p class="Pp">On Linux e1000, e1000e, i40e, igb, ixgbe, ixgbevf, r8169, + virtio_net, vmxnet3.</p> +<p class="Pp">NICs without native support can still be used in + <code class="Nm">netmap</code> mode through emulation. Performance is + inferior to native netmap mode but still significantly higher than various + raw socket types (bpf, PF_PACKET, etc.). Note that for slow devices (such as + 1 Gbit/s and slower NICs, or several 10 Gbit/s NICs whose hardware is unable + to sustain line rate), emulated and native mode will likely have similar or + same throughput.</p> +<p class="Pp">When emulation is in use, packet sniffer programs such as tcpdump + could see received packets before they are diverted by netmap. This + behaviour is not intentional, being just an artifact of the implementation + of emulation. Note that in case the netmap application subsequently moves + packets received from the emulated adapter onto the host RX ring, the + sniffer will intercept those packets again, since the packets are injected + to the host stack as they were received by the network interface.</p> +<p class="Pp">Emulation is also available for devices with native netmap + support, which can be used for testing or performance comparison. The sysctl + variable <var class="Va">dev.netmap.admode</var> globally controls how + netmap mode is implemented.</p> +</section> +<section class="Sh"> +<h1 class="Sh" id="SYSCTL_VARIABLES_AND_MODULE_PARAMETERS"><a class="permalink" href="#SYSCTL_VARIABLES_AND_MODULE_PARAMETERS">SYSCTL + VARIABLES AND MODULE PARAMETERS</a></h1> +<p class="Pp">Some aspects of the operation of <code class="Nm">netmap</code> + and <code class="Nm">VALE</code> are controlled through sysctl variables on + <span class="Ux">FreeBSD</span> + (<a class="permalink" href="#dev.netmap.*"><i class="Em" id="dev.netmap.*">dev.netmap.*</i></a>) + and module parameters on Linux + (<a class="permalink" href="#/sys/module/netmap/parameters/*"><i class="Em" id="/sys/module/netmap/parameters/*">/sys/module/netmap/parameters/*</i></a>):</p> +<dl class="Bl-tag"> + <dt id="dev.netmap.admode:"><var class="Va">dev.netmap.admode: 0</var></dt> + <dd>Controls the use of native or emulated adapter mode. + <p class="Pp">0 uses the best available option;</p> + <p class="Pp">1 forces native mode and fails if not available;</p> + <p class="Pp">2 forces emulated hence never fails.</p> + </dd> + <dt id="dev.netmap.generic_rings:"><var class="Va">dev.netmap.generic_rings: + 1</var></dt> + <dd>Number of rings used for emulated netmap mode</dd> + <dt id="dev.netmap.generic_ringsize:"><var class="Va">dev.netmap.generic_ringsize: + 1024</var></dt> + <dd>Ring size used for emulated netmap mode</dd> + <dt id="dev.netmap.generic_mit:"><var class="Va">dev.netmap.generic_mit: + 100000</var></dt> + <dd>Controls interrupt moderation for emulated mode</dd> + <dt id="dev.netmap.fwd:"><var class="Va">dev.netmap.fwd: 0</var></dt> + <dd>Forces NS_FORWARD mode</dd> + <dt id="dev.netmap.txsync_retry:"><var class="Va">dev.netmap.txsync_retry: + 2</var></dt> + <dd>Number of txsync loops in the <code class="Nm">VALE</code> flush + function</dd> + <dt id="dev.netmap.no_pendintr:"><var class="Va">dev.netmap.no_pendintr: + 1</var></dt> + <dd>Forces recovery of transmit buffers on system calls</dd> + <dt id="dev.netmap.no_timestamp:"><var class="Va">dev.netmap.no_timestamp: + 0</var></dt> + <dd>Disables the update of the timestamp in the netmap ring</dd> + <dt id="dev.netmap.verbose:"><var class="Va">dev.netmap.verbose: 0</var></dt> + <dd>Verbose kernel messages</dd> + <dt id="dev.netmap.buf_num:"><var class="Va">dev.netmap.buf_num: + 163840</var></dt> + <dd style="width: auto;"> </dd> + <dt id="dev.netmap.buf_size:"><var class="Va">dev.netmap.buf_size: + 2048</var></dt> + <dd style="width: auto;"> </dd> + <dt id="dev.netmap.ring_num:"><var class="Va">dev.netmap.ring_num: + 200</var></dt> + <dd style="width: auto;"> </dd> + <dt id="dev.netmap.ring_size:"><var class="Va">dev.netmap.ring_size: + 36864</var></dt> + <dd style="width: auto;"> </dd> + <dt id="dev.netmap.if_num:"><var class="Va">dev.netmap.if_num: 100</var></dt> + <dd style="width: auto;"> </dd> + <dt id="dev.netmap.if_size:"><var class="Va">dev.netmap.if_size: + 1024</var></dt> + <dd>Sizes and number of objects (netmap_if, netmap_ring, buffers) for the + global memory region. The only parameter worth modifying is + <var class="Va">dev.netmap.buf_num</var> as it impacts the total amount of + memory used by netmap.</dd> + <dt id="dev.netmap.buf_curr_num:"><var class="Va">dev.netmap.buf_curr_num: + 0</var></dt> + <dd style="width: auto;"> </dd> + <dt id="dev.netmap.buf_curr_size:"><var class="Va">dev.netmap.buf_curr_size: + 0</var></dt> + <dd style="width: auto;"> </dd> + <dt id="dev.netmap.ring_curr_num:"><var class="Va">dev.netmap.ring_curr_num: + 0</var></dt> + <dd style="width: auto;"> </dd> + <dt id="dev.netmap.ring_curr_size:"><var class="Va">dev.netmap.ring_curr_size: + 0</var></dt> + <dd style="width: auto;"> </dd> + <dt id="dev.netmap.if_curr_num:"><var class="Va">dev.netmap.if_curr_num: + 0</var></dt> + <dd style="width: auto;"> </dd> + <dt id="dev.netmap.if_curr_size:"><var class="Va">dev.netmap.if_curr_size: + 0</var></dt> + <dd>Actual values in use.</dd> + <dt id="dev.netmap.priv_buf_num:"><var class="Va">dev.netmap.priv_buf_num: + 4098</var></dt> + <dd style="width: auto;"> </dd> + <dt id="dev.netmap.priv_buf_size:"><var class="Va">dev.netmap.priv_buf_size: + 2048</var></dt> + <dd style="width: auto;"> </dd> + <dt id="dev.netmap.priv_ring_num:"><var class="Va">dev.netmap.priv_ring_num: + 4</var></dt> + <dd style="width: auto;"> </dd> + <dt id="dev.netmap.priv_ring_size:"><var class="Va">dev.netmap.priv_ring_size: + 20480</var></dt> + <dd style="width: auto;"> </dd> + <dt id="dev.netmap.priv_if_num:"><var class="Va">dev.netmap.priv_if_num: + 2</var></dt> + <dd style="width: auto;"> </dd> + <dt id="dev.netmap.priv_if_size:"><var class="Va">dev.netmap.priv_if_size: + 1024</var></dt> + <dd>Sizes and number of objects (netmap_if, netmap_ring, buffers) for private + memory regions. A separate memory region is used for each + <code class="Nm">VALE</code> port and each pair of <code class="Nm">netmap + pipes</code>.</dd> + <dt id="dev.netmap.bridge_batch:"><var class="Va">dev.netmap.bridge_batch: + 1024</var></dt> + <dd>Batch size used when moving packets across a <code class="Nm">VALE</code> + switch. Values above 64 generally guarantee good performance.</dd> + <dt id="dev.netmap.max_bridges:"><var class="Va">dev.netmap.max_bridges: + 8</var></dt> + <dd>Max number of <code class="Nm">VALE</code> switches that can be created. + This tunable can be specified at loader time.</dd> + <dt id="dev.netmap.ptnet_vnet_hdr:"><var class="Va">dev.netmap.ptnet_vnet_hdr: + 1</var></dt> + <dd>Allow ptnet devices to use virtio-net headers</dd> + <dt id="dev.netmap.port_numa_affinity:"><var class="Va">dev.netmap.port_numa_affinity: + 0</var></dt> + <dd>On <a class="Xr">numa(4)</a> systems, allocate memory for netmap ports + from the local NUMA domain when possible. This can improve performance by + reducing the number of remote memory accesses. However, when forwarding + packets between ports attached to different NUMA domains, this will + prevent zero-copy forwarding optimizations and thus may hurt performance. + Note that this setting must be specified as a loader tunable at boot + time.</dd> +</dl> +</section> +<section class="Sh"> +<h1 class="Sh" id="SYSTEM_CALLS"><a class="permalink" href="#SYSTEM_CALLS">SYSTEM + CALLS</a></h1> +<p class="Pp"><code class="Nm">netmap</code> uses <a class="Xr">select(2)</a>, + <a class="Xr">poll(2)</a>, <a class="Xr">epoll(7)</a> and + <a class="Xr">kqueue(2)</a> to wake up processes when significant events + occur, and <a class="Xr">mmap(2)</a> to map memory. + <a class="Xr">ioctl(2)</a> is used to configure ports and + <code class="Nm">VALE switches</code>.</p> +<p class="Pp">Applications may need to create threads and bind them to specific + cores to improve performance, using standard OS primitives, see + <a class="Xr">pthread(3)</a>. In particular, + <a class="Xr">pthread_setaffinity_np(3)</a> may be of use.</p> +</section> +<section class="Sh"> +<h1 class="Sh" id="EXAMPLES"><a class="permalink" href="#EXAMPLES">EXAMPLES</a></h1> +<section class="Ss"> +<h2 class="Ss" id="TEST_PROGRAMS"><a class="permalink" href="#TEST_PROGRAMS">TEST + PROGRAMS</a></h2> +<p class="Pp"><code class="Nm">netmap</code> comes with a few programs that can + be used for testing or simple applications. See the + <span class="Pa">examples/</span> directory in + <code class="Nm">netmap</code> distributions, or + <span class="Pa">tools/tools/netmap/</span> directory in + <span class="Ux">FreeBSD</span> distributions.</p> +<p class="Pp"><a class="Xr">pkt-gen(8)</a> is a general purpose traffic + source/sink.</p> +<p class="Pp">As an example</p> +<div class="Bd Bd-indent"><code class="Li">pkt-gen -i ix0 -f tx -l + 60</code></div> +can generate an infinite stream of minimum size packets, and +<div class="Bd Bd-indent"><code class="Li">pkt-gen -i ix0 -f rx</code></div> +is a traffic sink. Both print traffic statistics, to help monitor how the system + performs. +<p class="Pp"><a class="Xr">pkt-gen(8)</a> has many options can be uses to set + packet sizes, addresses, rates, and use multiple send/receive threads and + cores.</p> +<p class="Pp"><a class="Xr">bridge(4)</a> is another test program which + interconnects two <code class="Nm">netmap</code> ports. It can be used for + transparent forwarding between interfaces, as in</p> +<div class="Bd Bd-indent"><code class="Li">bridge -i netmap:ix0 -i + netmap:ix1</code></div> +or even connect the NIC to the host stack using netmap +<div class="Bd Bd-indent"><code class="Li">bridge -i netmap:ix0</code></div> +</section> +<section class="Ss"> +<h2 class="Ss" id="USING_THE_NATIVE_API"><a class="permalink" href="#USING_THE_NATIVE_API">USING + THE NATIVE API</a></h2> +<p class="Pp">The following code implements a traffic generator:</p> +<p class="Pp"></p> +<div class="Bd Li"> +<pre>#include <net/netmap_user.h> +... +void sender(void) +{ + struct netmap_if *nifp; + struct netmap_ring *ring; + struct nmreq nmr; + struct pollfd fds; + + fd = open("/dev/netmap", O_RDWR); + bzero(&nmr, sizeof(nmr)); + strcpy(nmr.nr_name, "ix0"); + nmr.nm_version = NETMAP_API; + ioctl(fd, NIOCREGIF, &nmr); + p = mmap(0, nmr.nr_memsize, fd); + nifp = NETMAP_IF(p, nmr.nr_offset); + ring = NETMAP_TXRING(nifp, 0); + fds.fd = fd; + fds.events = POLLOUT; + for (;;) { + poll(&fds, 1, -1); + while (!nm_ring_empty(ring)) { + i = ring->cur; + buf = NETMAP_BUF(ring, ring->slot[i].buf_index); + ... prepare packet in buf ... + ring->slot[i].len = ... packet length ... + ring->head = ring->cur = nm_ring_next(ring, i); + } + } +}</pre> +</div> +</section> +<section class="Ss"> +<h2 class="Ss" id="HELPER_FUNCTIONS"><a class="permalink" href="#HELPER_FUNCTIONS">HELPER + FUNCTIONS</a></h2> +<p class="Pp">A simple receiver can be implemented using the helper + functions:</p> +<p class="Pp"></p> +<div class="Bd Li"> +<pre>#define NETMAP_WITH_LIBS +#include <net/netmap_user.h> +... +void receiver(void) +{ + struct nm_desc *d; + struct pollfd fds; + u_char *buf; + struct nm_pkthdr h; + ... + d = nm_open("netmap:ix0", NULL, 0, 0); + fds.fd = NETMAP_FD(d); + fds.events = POLLIN; + for (;;) { + poll(&fds, 1, -1); + while ( (buf = nm_nextpkt(d, &h)) ) + consume_pkt(buf, h.len); + } + nm_close(d); +}</pre> +</div> +</section> +<section class="Ss"> +<h2 class="Ss" id="ZERO-COPY_FORWARDING"><a class="permalink" href="#ZERO-COPY_FORWARDING">ZERO-COPY + FORWARDING</a></h2> +<p class="Pp">Since physical interfaces share the same memory region, it is + possible to do packet forwarding between ports swapping buffers. The buffer + from the transmit ring is used to replenish the receive ring:</p> +<p class="Pp"></p> +<div class="Bd Li"> +<pre> uint32_t tmp; + struct netmap_slot *src, *dst; + ... + src = &src_ring->slot[rxr->cur]; + dst = &dst_ring->slot[txr->cur]; + tmp = dst->buf_idx; + dst->buf_idx = src->buf_idx; + dst->len = src->len; + dst->flags = NS_BUF_CHANGED; + src->buf_idx = tmp; + src->flags = NS_BUF_CHANGED; + rxr->head = rxr->cur = nm_ring_next(rxr, rxr->cur); + txr->head = txr->cur = nm_ring_next(txr, txr->cur); + ...</pre> +</div> +</section> +<section class="Ss"> +<h2 class="Ss" id="ACCESSING_THE_HOST_STACK"><a class="permalink" href="#ACCESSING_THE_HOST_STACK">ACCESSING + THE HOST STACK</a></h2> +<p class="Pp">The host stack is for all practical purposes just a regular ring + pair, which you can access with the netmap API (e.g., with</p> +<div class="Bd Bd-indent"><code class="Li">nm_open("netmap:eth0^", + ...</code></div> +); All packets that the host would send to an interface in + <code class="Nm">netmap</code> mode end up into the RX ring, whereas all + packets queued to the TX ring are send up to the host stack. +</section> +<section class="Ss"> +<h2 class="Ss" id="VALE_SWITCH"><a class="permalink" href="#VALE_SWITCH">VALE + SWITCH</a></h2> +<p class="Pp">A simple way to test the performance of a + <code class="Nm">VALE</code> switch is to attach a sender and a receiver to + it, e.g., running the following in two different terminals:</p> +<div class="Bd Bd-indent"><code class="Li">pkt-gen -i vale1:a -f rx # + receiver</code></div> +<div class="Bd Bd-indent"><code class="Li">pkt-gen -i vale1:b -f tx # + sender</code></div> +The same example can be used to test netmap pipes, by simply changing port + names, e.g., +<div class="Bd Bd-indent"><code class="Li">pkt-gen -i vale2:x{3 -f rx # receiver + on the master side</code></div> +<div class="Bd Bd-indent"><code class="Li">pkt-gen -i vale2:x}3 -f tx # sender + on the slave side</code></div> +<p class="Pp">The following command attaches an interface and the host stack to + a switch:</p> +<div class="Bd Bd-indent"><code class="Li">valectl -h vale2:em0</code></div> +Other <code class="Nm">netmap</code> clients attached to the same switch can now + communicate with the network card or the host. +</section> +</section> +<section class="Sh"> +<h1 class="Sh" id="SEE_ALSO"><a class="permalink" href="#SEE_ALSO">SEE + ALSO</a></h1> +<p class="Pp"><a class="Xr">vale(4)</a>, <a class="Xr">bridge(8)</a>, + <a class="Xr">lb(8)</a>, <a class="Xr">nmreplay(8)</a>, + <a class="Xr">pkt-gen(8)</a>, <a class="Xr">valectl(8)</a></p> +<p class="Pp"><span class="Pa">http://info.iet.unipi.it/~luigi/netmap/</span></p> +<p class="Pp">Luigi Rizzo, Revisiting network I/O APIs: the netmap framework, + Communications of the ACM, 55 (3), pp.45-51, March 2012</p> +<p class="Pp">Luigi Rizzo, netmap: a novel framework for fast packet I/O, Usenix + ATC'12, June 2012, Boston</p> +<p class="Pp">Luigi Rizzo, Giuseppe Lettieri, VALE, a switched ethernet for + virtual machines, ACM CoNEXT'12, December 2012, Nice</p> +<p class="Pp">Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione, Speeding up + packet I/O in virtual machines, ACM/IEEE ANCS'13, October 2013, San Jose</p> +</section> +<section class="Sh"> +<h1 class="Sh" id="AUTHORS"><a class="permalink" href="#AUTHORS">AUTHORS</a></h1> +<p class="Pp">The <code class="Nm">netmap</code> framework has been originally + designed and implemented at the Universita` di Pisa in 2011 by + <span class="An">Luigi Rizzo</span>, and further extended with help from + <span class="An">Matteo Landi</span>, <span class="An">Gaetano + Catalli</span>, <span class="An">Giuseppe Lettieri</span>, and + <span class="An">Vincenzo Maffione</span>.</p> +<p class="Pp"><code class="Nm">netmap</code> and <code class="Nm">VALE</code> + have been funded by the European Commission within FP7 Projects CHANGE + (257422) and OPENLAB (287581).</p> +</section> +<section class="Sh"> +<h1 class="Sh" id="CAVEATS"><a class="permalink" href="#CAVEATS">CAVEATS</a></h1> +<p class="Pp">No matter how fast the CPU and OS are, achieving line rate on 10G + and faster interfaces requires hardware with sufficient performance. Several + NICs are unable to sustain line rate with small packet sizes. Insufficient + PCIe or memory bandwidth can also cause reduced performance.</p> +<p class="Pp">Another frequent reason for low performance is the use of flow + control on the link: a slow receiver can limit the transmit speed. Be sure + to disable flow control when running high speed experiments.</p> +<section class="Ss"> +<h2 class="Ss" id="SPECIAL_NIC_FEATURES"><a class="permalink" href="#SPECIAL_NIC_FEATURES">SPECIAL + NIC FEATURES</a></h2> +<p class="Pp"><code class="Nm">netmap</code> is orthogonal to some NIC features + such as multiqueue, schedulers, packet filters.</p> +<p class="Pp">Multiple transmit and receive rings are supported natively and can + be configured with ordinary OS tools, such as <a class="Xr">ethtool(8)</a> + or device-specific sysctl variables. The same goes for Receive Packet + Steering (RPS) and filtering of incoming traffic.</p> +<p class="Pp" id="does"><code class="Nm">netmap</code> + <a class="permalink" href="#does"><i class="Em">does not use</i></a> + features such as + <a class="permalink" href="#checksum"><i class="Em" id="checksum">checksum + offloading</i></a>, + <a class="permalink" href="#TCP"><i class="Em" id="TCP">TCP segmentation + offloading</i></a>, + <a class="permalink" href="#encryption"><i class="Em" id="encryption">encryption</i></a>, + <a class="permalink" href="#VLAN"><i class="Em" id="VLAN">VLAN + encapsulation/decapsulation</i></a>, etc. When using netmap to exchange + packets with the host stack, make sure to disable these features.</p> +</section> +</section> +</div> +<table class="foot"> + <tr> + <td class="foot-date">October 10, 2024</td> + <td class="foot-os">FreeBSD 15.0</td> + </tr> +</table> |
