summaryrefslogtreecommitdiff
path: root/static/freebsd/man4/netmap.4 3.html
diff options
context:
space:
mode:
Diffstat (limited to 'static/freebsd/man4/netmap.4 3.html')
-rw-r--r--static/freebsd/man4/netmap.4 3.html1015
1 files changed, 1015 insertions, 0 deletions
diff --git a/static/freebsd/man4/netmap.4 3.html b/static/freebsd/man4/netmap.4 3.html
new file mode 100644
index 00000000..e0cbff0e
--- /dev/null
+++ b/static/freebsd/man4/netmap.4 3.html
@@ -0,0 +1,1015 @@
+<table class="head">
+ <tr>
+ <td class="head-ltitle">NETMAP(4)</td>
+ <td class="head-vol">Device Drivers Manual</td>
+ <td class="head-rtitle">NETMAP(4)</td>
+ </tr>
+</table>
+<div class="manual-text">
+<section class="Sh">
+<h1 class="Sh" id="NAME"><a class="permalink" href="#NAME">NAME</a></h1>
+<p class="Pp"><code class="Nm">netmap</code> &#x2014; <span class="Nd">a
+ framework for fast packet I/O</span></p>
+</section>
+<section class="Sh">
+<h1 class="Sh" id="SYNOPSIS"><a class="permalink" href="#SYNOPSIS">SYNOPSIS</a></h1>
+<p class="Pp"><code class="Cd">device netmap</code></p>
+</section>
+<section class="Sh">
+<h1 class="Sh" id="DESCRIPTION"><a class="permalink" href="#DESCRIPTION">DESCRIPTION</a></h1>
+<p class="Pp"><code class="Nm">netmap</code> is a framework for extremely fast
+ and efficient packet I/O for userspace and kernel clients, and for Virtual
+ Machines. It runs on <span class="Ux">FreeBSD</span>, Linux and some
+ versions of Windows, and supports a variety of <code class="Nm">netmap
+ ports</code>, including</p>
+<dl class="Bl-tag">
+ <dt><code class="Nm">physical NIC ports</code></dt>
+ <dd>to access individual queues of network interfaces;</dd>
+ <dt><code class="Nm">host ports</code></dt>
+ <dd>to inject packets into the host stack;</dd>
+ <dt><code class="Nm">VALE ports</code></dt>
+ <dd>implementing a very fast and modular in-kernel software
+ switch/dataplane;</dd>
+ <dt><code class="Nm">netmap pipes</code></dt>
+ <dd>a shared memory packet transport channel;</dd>
+ <dt><code class="Nm">netmap monitors</code></dt>
+ <dd>a mechanism similar to <a class="Xr">bpf(4)</a> to capture traffic</dd>
+</dl>
+<p class="Pp">All these <code class="Nm">netmap ports</code> are accessed
+ interchangeably with the same API, and are at least one order of magnitude
+ faster than standard OS mechanisms (sockets, bpf, tun/tap interfaces, native
+ switches, pipes). With suitably fast hardware (NICs, PCIe buses, CPUs),
+ packet I/O using <code class="Nm">netmap</code> on supported NICs reaches
+ 14.88 million packets per second (Mpps) with much less than one core on 10
+ Gbit/s NICs; 35-40 Mpps on 40 Gbit/s NICs (limited by the hardware); about
+ 20 Mpps per core for VALE ports; and over 100 Mpps for
+ <code class="Nm">netmap pipes</code>. NICs without native
+ <code class="Nm">netmap</code> support can still use the API in emulated
+ mode, which uses unmodified device drivers and is 3-5 times faster than
+ <a class="Xr">bpf(4)</a> or raw sockets.</p>
+<p class="Pp">Userspace clients can dynamically switch NICs into
+ <code class="Nm">netmap</code> mode and send and receive raw packets through
+ memory mapped buffers. Similarly, <code class="Nm">VALE</code> switch
+ instances and ports, <code class="Nm">netmap pipes</code> and
+ <code class="Nm">netmap monitors</code> can be created dynamically,
+ providing high speed packet I/O between processes, virtual machines, NICs
+ and the host stack.</p>
+<p class="Pp"><code class="Nm">netmap</code> supports both non-blocking I/O
+ through <a class="Xr">ioctl(2)</a>, synchronization and blocking I/O through
+ a file descriptor and standard OS mechanisms such as
+ <a class="Xr">select(2)</a>, <a class="Xr">poll(2)</a>,
+ <a class="Xr">kqueue(2)</a> and <a class="Xr">epoll(7)</a>. All types of
+ <code class="Nm">netmap ports</code> and the <code class="Nm">VALE
+ switch</code> are implemented by a single kernel module, which also emulates
+ the <code class="Nm">netmap</code> API over standard drivers. For best
+ performance, <code class="Nm">netmap</code> requires native support in
+ device drivers. A list of such devices is at the end of this document.</p>
+<p class="Pp">In the rest of this (long) manual page we document various aspects
+ of the <code class="Nm">netmap</code> and <code class="Nm">VALE</code>
+ architecture, features and usage.</p>
+</section>
+<section class="Sh">
+<h1 class="Sh" id="ARCHITECTURE"><a class="permalink" href="#ARCHITECTURE">ARCHITECTURE</a></h1>
+<p class="Pp"><code class="Nm">netmap</code> supports raw packet I/O through a
+ <a class="permalink" href="#port"><i class="Em" id="port">port</i></a>,
+ which can be connected to a physical interface
+ (<a class="permalink" href="#NIC"><i class="Em" id="NIC">NIC</i></a>), to
+ the host stack, or to a <code class="Nm">VALE</code> switch. Ports use
+ preallocated circular queues of buffers
+ (<a class="permalink" href="#rings"><i class="Em" id="rings">rings</i></a>)
+ residing in an mmapped region. There is one ring for each transmit/receive
+ queue of a NIC or virtual port. An additional ring pair connects to the host
+ stack.</p>
+<p class="Pp">After binding a file descriptor to a port, a
+ <code class="Nm">netmap</code> client can send or receive packets in batches
+ through the rings, and possibly implement zero-copy forwarding between
+ ports.</p>
+<p class="Pp">All NICs operating in <code class="Nm">netmap</code> mode use the
+ same memory region, accessible to all processes who own
+ <span class="Pa">/dev/netmap</span> file descriptors bound to NICs.
+ Independent <code class="Nm">VALE</code> and <code class="Nm">netmap
+ pipe</code> ports by default use separate memory regions, but can be
+ independently configured to share memory.</p>
+</section>
+<section class="Sh">
+<h1 class="Sh" id="ENTERING_AND_EXITING_NETMAP_MODE"><a class="permalink" href="#ENTERING_AND_EXITING_NETMAP_MODE">ENTERING
+ AND EXITING NETMAP MODE</a></h1>
+<p class="Pp">The following section describes the system calls to create and
+ control <code class="Nm">netmap</code> ports (including
+ <code class="Nm">VALE</code> and <code class="Nm">netmap pipe</code> ports).
+ Simpler, higher level functions are described in the
+ <a class="Sx" href="#LIBRARIES">LIBRARIES</a> section.</p>
+<p class="Pp">Ports and rings are created and controlled through a file
+ descriptor, created by opening a special device</p>
+<div class="Bd Bd-indent"><code class="Li">fd =
+ open(&quot;/dev/netmap&quot;);</code></div>
+and then bound to a specific port with an
+<div class="Bd Bd-indent"><code class="Li">ioctl(fd, NIOCREGIF, (struct nmreq
+ *)arg);</code></div>
+<p class="Pp"><code class="Nm">netmap</code> has multiple modes of operation
+ controlled by the <var class="Vt">struct nmreq</var> argument.
+ <var class="Va">arg.nr_name</var> specifies the netmap port name, as
+ follows:</p>
+<dl class="Bl-tag">
+ <dt id="OS"><a class="permalink" href="#OS"><code class="Dv">OS network
+ interface name (e.g., 'em0', 'eth1', ...</code></a>)</dt>
+ <dd>the data path of the NIC is disconnected from the host stack, and the file
+ descriptor is bound to the NIC (one or all queues), or to the host
+ stack;</dd>
+ <dt id="valeSSS:PPP"><a class="permalink" href="#valeSSS:PPP"><code class="Dv">valeSSS:PPP</code></a></dt>
+ <dd>the file descriptor is bound to port PPP of VALE switch SSS. Switch
+ instances and ports are dynamically created if necessary.
+ <p class="Pp">Both SSS and PPP have the form [0-9a-zA-Z_]+ , the string
+ cannot exceed IFNAMSIZ characters, and PPP cannot be the name of any
+ existing OS network interface.</p>
+ </dd>
+</dl>
+<p class="Pp">On return, <var class="Va">arg</var> indicates the size of the
+ shared memory region, and the number, size and location of all the
+ <code class="Nm">netmap</code> data structures, which can be accessed by
+ mmapping the memory</p>
+<div class="Bd Bd-indent"><code class="Li">char *mem = mmap(0, arg.nr_memsize,
+ fd);</code></div>
+<p class="Pp">Non-blocking I/O is done with special <a class="Xr">ioctl(2)</a>
+ <a class="Xr">select(2)</a> and <a class="Xr">poll(2)</a> on the file
+ descriptor permit blocking I/O.</p>
+<p class="Pp">While a NIC is in <code class="Nm">netmap</code> mode, the OS will
+ still believe the interface is up and running. OS-generated packets for that
+ NIC end up into a <code class="Nm">netmap</code> ring, and another ring is
+ used to send packets into the OS network stack. A <a class="Xr">close(2)</a>
+ on the file descriptor removes the binding, and returns the NIC to normal
+ mode (reconnecting the data path to the host stack), or destroys the virtual
+ port.</p>
+</section>
+<section class="Sh">
+<h1 class="Sh" id="DATA_STRUCTURES"><a class="permalink" href="#DATA_STRUCTURES">DATA
+ STRUCTURES</a></h1>
+<p class="Pp">The data structures in the mmapped memory region are detailed in
+ <code class="In">&lt;<a class="In">sys/net/netmap.h</a>&gt;</code>, which is
+ the ultimate reference for the <code class="Nm">netmap</code> API. The main
+ structures and fields are indicated below:</p>
+<dl class="Bl-tag">
+ <dt id="struct"><a class="permalink" href="#struct"><code class="Dv">struct
+ netmap_if (one per interface</code></a>)</dt>
+ <dd>
+ <div class="Bd Pp Li">
+ <pre>struct netmap_if {
+ ...
+ const uint32_t ni_flags; /* properties */
+ ...
+ const uint32_t ni_tx_rings; /* NIC tx rings */
+ const uint32_t ni_rx_rings; /* NIC rx rings */
+ uint32_t ni_bufs_head; /* head of extra bufs list */
+ ...
+};</pre>
+ </div>
+ <p class="Pp">Indicates the number of available rings
+ (<span class="Pa">struct netmap_rings</span>) and their position in the
+ mmapped region. The number of tx and rx rings
+ (<span class="Pa">ni_tx_rings</span>,
+ <span class="Pa">ni_rx_rings</span>) normally depends on the hardware.
+ NICs also have an extra tx/rx ring pair connected to the host stack.
+ <i class="Em">NIOCREGIF</i> can also request additional unbound buffers
+ in the same memory space, to be used as temporary storage for packets.
+ The number of extra buffers is specified in the
+ <var class="Va">arg.nr_arg3</var> field. On success, the kernel writes
+ back to <var class="Va">arg.nr_arg3</var> the number of extra buffers
+ actually allocated (they may be less than the amount requested if the
+ memory space ran out of buffers). <span class="Pa">ni_bufs_head</span>
+ contains the index of the first of these extra buffers, which are
+ connected in a list (the first uint32_t of each buffer being the index
+ of the next buffer in the list). A <code class="Dv">0</code> indicates
+ the end of the list. The application is free to modify this list and use
+ the buffers (i.e., binding them to the slots of a netmap ring). When
+ closing the netmap file descriptor, the kernel frees the buffers
+ contained in the list pointed by <span class="Pa">ni_bufs_head</span> ,
+ irrespectively of the buffers originally provided by the kernel on
+ <i class="Em">NIOCREGIF</i>.</p>
+ </dd>
+ <dt id="struct~2"><a class="permalink" href="#struct~2"><code class="Dv">struct
+ netmap_ring (one per ring</code></a>)</dt>
+ <dd>
+ <div class="Bd Pp Li">
+ <pre>struct netmap_ring {
+ ...
+ const uint32_t num_slots; /* slots in each ring */
+ const uint32_t nr_buf_size; /* size of each buffer */
+ ...
+ uint32_t head; /* (u) first buf owned by user */
+ uint32_t cur; /* (u) wakeup position */
+ const uint32_t tail; /* (k) first buf owned by kernel */
+ ...
+ uint32_t flags;
+ struct timeval ts; /* (k) time of last rxsync() */
+ ...
+ struct netmap_slot slot[0]; /* array of slots */
+}</pre>
+ </div>
+ <p class="Pp" id="slots">Implements transmit and receive rings, with
+ read/write pointers, metadata and an array of
+ <a class="permalink" href="#slots"><i class="Em">slots</i></a>
+ describing the buffers.</p>
+ </dd>
+ <dt id="struct~3"><a class="permalink" href="#struct~3"><code class="Dv">struct
+ netmap_slot (one per buffer</code></a>)</dt>
+ <dd>
+ <div class="Bd Pp Li">
+ <pre>struct netmap_slot {
+ uint32_t buf_idx; /* buffer index */
+ uint16_t len; /* packet length */
+ uint16_t flags; /* buf changed, etc. */
+ uint64_t ptr; /* address for indirect buffers */
+};</pre>
+ </div>
+ <p class="Pp">Describes a packet buffer, which normally is identified by an
+ index and resides in the mmapped region.</p>
+ </dd>
+ <dt id="packet"><a class="permalink" href="#packet"><code class="Dv">packet
+ buffers</code></a></dt>
+ <dd>Fixed size (normally 2 KB) packet buffers allocated by the kernel.</dd>
+</dl>
+<p class="Pp">The offset of the <span class="Pa">struct netmap_if</span> in the
+ mmapped region is indicated by the <span class="Pa">nr_offset</span> field
+ in the structure returned by <code class="Dv">NIOCREGIF</code>. From there,
+ all other objects are reachable through relative references (offsets or
+ indexes). Macros and functions in
+ <code class="In">&lt;<a class="In">net/netmap_user.h</a>&gt;</code> help
+ converting them into actual pointers:</p>
+<p class="Pp"></p>
+<div class="Bd Bd-indent"><code class="Li">struct netmap_if *nifp =
+ NETMAP_IF(mem, arg.nr_offset);</code></div>
+<div class="Bd Bd-indent"><code class="Li">struct netmap_ring *txr =
+ NETMAP_TXRING(nifp, ring_index);</code></div>
+<div class="Bd Bd-indent"><code class="Li">struct netmap_ring *rxr =
+ NETMAP_RXRING(nifp, ring_index);</code></div>
+<p class="Pp"></p>
+<div class="Bd Bd-indent"><code class="Li">char *buf = NETMAP_BUF(ring,
+ buffer_index);</code></div>
+</section>
+<section class="Sh">
+<h1 class="Sh" id="RINGS,_BUFFERS_AND_DATA_I/O"><a class="permalink" href="#RINGS,_BUFFERS_AND_DATA_I/O">RINGS,
+ BUFFERS AND DATA I/O</a></h1>
+<p class="Pp"><var class="Va">Rings</var> are circular queues of packets with
+ three indexes/pointers (<var class="Va">head</var>,
+ <var class="Va">cur</var>, <var class="Va">tail</var>); one slot is always
+ kept empty. The ring size (<var class="Va">num_slots</var>) should not be
+ assumed to be a power of two.</p>
+<p class="Pp"><var class="Va">head</var> is the first slot available to
+ userspace;</p>
+<p class="Pp"><var class="Va">cur</var> is the wakeup point: select/poll will
+ unblock when <var class="Va">tail</var> passes
+ <var class="Va">cur</var>;</p>
+<p class="Pp"><var class="Va">tail</var> is the first slot reserved to the
+ kernel.</p>
+<p class="Pp">Slot indexes <i class="Em">must</i> only move forward; for
+ convenience, the function</p>
+<div class="Bd Bd-indent"><code class="Li">nm_ring_next(ring,
+ index)</code></div>
+returns the next index modulo the ring size.
+<p class="Pp"><var class="Va">head</var> and <var class="Va">cur</var> are only
+ modified by the user program; <var class="Va">tail</var> is only modified by
+ the kernel. The kernel only reads/writes the <var class="Vt">struct
+ netmap_ring</var> slots and buffers during the execution of a netmap-related
+ system call. The only exception are slots (and buffers) in the range
+ <var class="Va">tail&#x00A0;</var>... <var class="Va">head-1</var>, that are
+ explicitly assigned to the kernel.</p>
+<section class="Ss">
+<h2 class="Ss" id="TRANSMIT_RINGS"><a class="permalink" href="#TRANSMIT_RINGS">TRANSMIT
+ RINGS</a></h2>
+<p class="Pp">On transmit rings, after a <code class="Nm">netmap</code> system
+ call, slots in the range <var class="Va">head&#x00A0;</var>...
+ <var class="Va">tail-1</var> are available for transmission. User code
+ should fill the slots sequentially and advance <var class="Va">head</var>
+ and <var class="Va">cur</var> past slots ready to transmit.
+ <var class="Va">cur</var> may be moved further ahead if the user code needs
+ more slots before further transmissions (see
+ <a class="Sx" href="#SCATTER_GATHER_I/O">SCATTER GATHER I/O</a>).</p>
+<p class="Pp">At the next NIOCTXSYNC/select()/poll(), slots up to
+ <var class="Va">head-1</var> are pushed to the port, and
+ <var class="Va">tail</var> may advance if further slots have become
+ available. Below is an example of the evolution of a TX ring:</p>
+<div class="Bd Pp Li">
+<pre> after the syscall, slots between cur and tail are (a)vailable
+ head=cur tail
+ | |
+ v v
+ TX [.....aaaaaaaaaaa.............]
+
+ user creates new packets to (T)ransmit
+ head=cur tail
+ | |
+ v v
+ TX [.....TTTTTaaaaaa.............]
+
+ NIOCTXSYNC/poll()/select() sends packets and reports new slots
+ head=cur tail
+ | |
+ v v
+ TX [..........aaaaaaaaaaa........]</pre>
+</div>
+<p class="Pp" id="select"><a class="permalink" href="#select"><code class="Fn">select</code></a>()
+ and
+ <a class="permalink" href="#poll"><code class="Fn" id="poll">poll</code></a>()
+ will block if there is no space in the ring, i.e.,</p>
+<div class="Bd Bd-indent"><code class="Li">ring-&gt;cur ==
+ ring-&gt;tail</code></div>
+and return when new slots have become available.
+<p class="Pp">High speed applications may want to amortize the cost of system
+ calls by preparing as many packets as possible before issuing them.</p>
+<p class="Pp">A transmit ring with pending transmissions has</p>
+<div class="Bd Bd-indent"><code class="Li">ring-&gt;head != ring-&gt;tail + 1
+ (modulo the ring size).</code></div>
+The function <var class="Va">int nm_tx_pending(ring)</var> implements this test.
+</section>
+<section class="Ss">
+<h2 class="Ss" id="RECEIVE_RINGS"><a class="permalink" href="#RECEIVE_RINGS">RECEIVE
+ RINGS</a></h2>
+<p class="Pp">On receive rings, after a <code class="Nm">netmap</code> system
+ call, the slots in the range <var class="Va">head</var>...
+ <var class="Va">tail-1</var> contain received packets. User code should
+ process them and advance <var class="Va">head</var> and
+ <var class="Va">cur</var> past slots it wants to return to the kernel.
+ <var class="Va">cur</var> may be moved further ahead if the user code wants
+ to wait for more packets without returning all the previous slots to the
+ kernel.</p>
+<p class="Pp">At the next NIOCRXSYNC/select()/poll(), slots up to
+ <var class="Va">head-1</var> are returned to the kernel for further
+ receives, and <var class="Va">tail</var> may advance to report new incoming
+ packets.</p>
+<p class="Pp">Below is an example of the evolution of an RX ring:</p>
+<div class="Bd Pp Li">
+<pre> after the syscall, there are some (h)eld and some (R)eceived slots
+ head cur tail
+ | | |
+ v v v
+ RX [..hhhhhhRRRRRRRR..........]
+
+ user advances head and cur, releasing some slots and holding others
+ head cur tail
+ | | |
+ v v v
+ RX [..*****hhhRRRRRR...........]
+
+ NICRXSYNC/poll()/select() recovers slots and reports new packets
+ head cur tail
+ | | |
+ v v v
+ RX [.......hhhRRRRRRRRRRRR....]</pre>
+</div>
+</section>
+</section>
+<section class="Sh">
+<h1 class="Sh" id="SLOTS_AND_PACKET_BUFFERS"><a class="permalink" href="#SLOTS_AND_PACKET_BUFFERS">SLOTS
+ AND PACKET BUFFERS</a></h1>
+<p class="Pp">Normally, packets should be stored in the netmap-allocated buffers
+ assigned to slots when ports are bound to a file descriptor. One packet is
+ fully contained in a single buffer.</p>
+<p class="Pp">The following flags affect slot and buffer processing:</p>
+<dl class="Bl-tag">
+ <dt id="must">NS_BUF_CHANGED</dt>
+ <dd><a class="permalink" href="#must"><i class="Em">must</i></a> be used when
+ the <var class="Va">buf_idx</var> in the slot is changed. This can be used
+ to implement zero-copy forwarding, see
+ <a class="Sx" href="#ZERO_COPY_FORWARDING">ZERO-COPY FORWARDING</a>.</dd>
+ <dt>NS_REPORT</dt>
+ <dd>reports when this buffer has been transmitted. Normally,
+ <code class="Nm">netmap</code> notifies transmit completions in batches,
+ hence signals can be delayed indefinitely. This flag helps detect when
+ packets have been sent and a file descriptor can be closed.</dd>
+ <dt>NS_FORWARD</dt>
+ <dd>When a ring is in 'transparent' mode, packets marked with this flag by the
+ user application are forwarded to the other endpoint at the next system
+ call, thus restoring (in a selective way) the connection between a NIC and
+ the host stack.</dd>
+ <dt>NS_NO_LEARN</dt>
+ <dd>tells the forwarding code that the source MAC address for this packet must
+ not be used in the learning bridge code.</dd>
+ <dt>NS_INDIRECT</dt>
+ <dd>indicates that the packet's payload is in a user-supplied buffer whose
+ user virtual address is in the 'ptr' field of the slot. The size can reach
+ 65535 bytes.
+ <p class="Pp">This is only supported on the transmit ring of
+ <code class="Nm">VALE</code> ports, and it helps reducing data copies in
+ the interconnection of virtual machines.</p>
+ </dd>
+ <dt>NS_MOREFRAG</dt>
+ <dd>indicates that the packet continues with subsequent buffers; the last
+ buffer in a packet must have the flag clear.</dd>
+</dl>
+</section>
+<section class="Sh">
+<h1 class="Sh" id="SCATTER_GATHER_I/O"><a class="permalink" href="#SCATTER_GATHER_I/O">SCATTER
+ GATHER I/O</a></h1>
+<p class="Pp">Packets can span multiple slots if the
+ <var class="Va">NS_MOREFRAG</var> flag is set in all but the last slot. The
+ maximum length of a chain is 64 buffers. This is normally used with
+ <code class="Nm">VALE</code> ports when connecting virtual machines, as they
+ generate large TSO segments that are not split unless they reach a physical
+ device.</p>
+<p class="Pp">NOTE: The length field always refers to the individual fragment;
+ there is no place with the total length of a packet.</p>
+<p class="Pp">On receive rings the macro <var class="Va">NS_RFRAGS(slot)</var>
+ indicates the remaining number of slots for this packet, including the
+ current one. Slots with a value greater than 1 also have NS_MOREFRAG
+ set.</p>
+</section>
+<section class="Sh">
+<h1 class="Sh" id="IOCTLS"><a class="permalink" href="#IOCTLS">IOCTLS</a></h1>
+<p class="Pp"><code class="Nm">netmap</code> uses two ioctls (NIOCTXSYNC,
+ NIOCRXSYNC) for non-blocking I/O. They take no argument. Two more ioctls
+ (NIOCGINFO, NIOCREGIF) are used to query and configure ports, with the
+ following argument:</p>
+<div class="Bd Pp Li">
+<pre>struct nmreq {
+ char nr_name[IFNAMSIZ]; /* (i) port name */
+ uint32_t nr_version; /* (i) API version */
+ uint32_t nr_offset; /* (o) nifp offset in mmap region */
+ uint32_t nr_memsize; /* (o) size of the mmap region */
+ uint32_t nr_tx_slots; /* (i/o) slots in tx rings */
+ uint32_t nr_rx_slots; /* (i/o) slots in rx rings */
+ uint16_t nr_tx_rings; /* (i/o) number of tx rings */
+ uint16_t nr_rx_rings; /* (i/o) number of rx rings */
+ uint16_t nr_ringid; /* (i/o) ring(s) we care about */
+ uint16_t nr_cmd; /* (i) special command */
+ uint16_t nr_arg1; /* (i/o) extra arguments */
+ uint16_t nr_arg2; /* (i/o) extra arguments */
+ uint32_t nr_arg3; /* (i/o) extra arguments */
+ uint32_t nr_flags /* (i/o) open mode */
+ ...
+};</pre>
+</div>
+<p class="Pp">A file descriptor obtained through
+ <span class="Pa">/dev/netmap</span> also supports the ioctl supported by
+ network devices, see <a class="Xr">netintro(4)</a>.</p>
+<dl class="Bl-tag">
+ <dt id="NIOCGINFO"><a class="permalink" href="#NIOCGINFO"><code class="Dv">NIOCGINFO</code></a></dt>
+ <dd>returns EINVAL if the named port does not support netmap. Otherwise, it
+ returns 0 and (advisory) information about the port. Note that all the
+ information below can change before the interface is actually put in
+ netmap mode.
+ <dl class="Bl-tag">
+ <dt><span class="Pa">nr_memsize</span></dt>
+ <dd>indicates the size of the <code class="Nm">netmap</code> memory
+ region. NICs in <code class="Nm">netmap</code> mode all share the same
+ memory region, whereas <code class="Nm">VALE</code> ports have
+ independent regions for each port.</dd>
+ <dt><span class="Pa">nr_tx_slots</span>,
+ <span class="Pa">nr_rx_slots</span></dt>
+ <dd>indicate the size of transmit and receive rings.</dd>
+ <dt><span class="Pa">nr_tx_rings</span>,
+ <span class="Pa">nr_rx_rings</span></dt>
+ <dd>indicate the number of transmit and receive rings. Both ring number
+ and sizes may be configured at runtime using interface-specific
+ functions (e.g., <a class="Xr">ethtool(8)</a> ).</dd>
+ </dl>
+ </dd>
+ <dt id="NIOCREGIF"><a class="permalink" href="#NIOCREGIF"><code class="Dv">NIOCREGIF</code></a></dt>
+ <dd>binds the port named in <var class="Va">nr_name</var> to the file
+ descriptor. For a physical device this also switches it into
+ <code class="Nm">netmap</code> mode, disconnecting it from the host stack.
+ Multiple file descriptors can be bound to the same port, with proper
+ synchronization left to the user.
+ <p class="Pp">The recommended way to bind a file descriptor to a port is to
+ use function <var class="Va">nm_open(..)</var> (see
+ <a class="Sx" href="#LIBRARIES">LIBRARIES</a>) which parses names to
+ access specific port types and enable features. In the following we
+ document the main features.</p>
+ <p class="Pp" id="netmap"><code class="Dv">NIOCREGIF can also bind a file
+ descriptor to one endpoint of a</code>
+ <a class="permalink" href="#netmap"><i class="Em">netmap pipe</i></a>,
+ consisting of two netmap ports with a crossover connection. A netmap
+ pipe share the same memory space of the parent port, and is meant to
+ enable configuration where a master process acts as a dispatcher towards
+ slave processes.</p>
+ <p class="Pp">To enable this function, the <span class="Pa">nr_arg1</span>
+ field of the structure can be used as a hint to the kernel to indicate
+ how many pipes we expect to use, and reserve extra space in the memory
+ region.</p>
+ <p class="Pp">On return, it gives the same info as NIOCGINFO, with
+ <span class="Pa">nr_ringid</span> and <span class="Pa">nr_flags</span>
+ indicating the identity of the rings controlled through the file
+ descriptor.</p>
+ <p class="Pp"><var class="Va">nr_flags</var> <var class="Va">nr_ringid</var>
+ selects which rings are controlled through this file descriptor.
+ Possible values of <span class="Pa">nr_flags</span> are indicated below,
+ together with the naming schemes that application libraries (such as the
+ <code class="Nm">nm_open</code> indicated below) can use to indicate the
+ specific set of rings. In the example below, &quot;netmap:foo&quot; is
+ any valid netmap port name.</p>
+ <dl class="Bl-tag">
+ <dt>NR_REG_ALL_NIC netmap:foo</dt>
+ <dd>(default) all hardware ring pairs</dd>
+ <dt>NR_REG_SW netmap:foo^</dt>
+ <dd>the ``host rings'', connecting to the host stack.</dd>
+ <dt>NR_REG_NIC_SW netmap:foo*</dt>
+ <dd>all hardware rings and the host rings</dd>
+ <dt>NR_REG_ONE_NIC netmap:foo-i</dt>
+ <dd>only the i-th hardware ring pair, where the number is in
+ <span class="Pa">nr_ringid</span>;</dd>
+ <dt>NR_REG_PIPE_MASTER netmap:foo{i</dt>
+ <dd>the master side of the netmap pipe whose identifier (i) is in
+ <span class="Pa">nr_ringid</span>;</dd>
+ <dt>NR_REG_PIPE_SLAVE netmap:foo}i</dt>
+ <dd>the slave side of the netmap pipe whose identifier (i) is in
+ <span class="Pa">nr_ringid</span>.
+ <p class="Pp">The identifier of a pipe must be thought as part of the
+ pipe name, and does not need to be sequential. On return the pipe
+ will only have a single ring pair with index 0, irrespective of the
+ value of <var class="Va">i</var>.</p>
+ </dd>
+ </dl>
+ <p class="Pp">By default, a <a class="Xr">poll(2)</a> or
+ <a class="Xr">select(2)</a> call pushes out any pending packets on the
+ transmit ring, even if no write events are specified. The feature can be
+ disabled by or-ing <var class="Va">NETMAP_NO_TX_POLL</var> to the value
+ written to <var class="Va">nr_ringid</var>. When this feature is used,
+ packets are transmitted only on <var class="Va">ioctl(NIOCTXSYNC)</var>
+ or <var class="Va">select() /</var> <var class="Va">poll()</var> are
+ called with a write event (POLLOUT/wfdset) or a full ring.</p>
+ <p class="Pp">When registering a virtual interface that is dynamically
+ created to a <code class="Nm">VALE</code> switch, we can specify the
+ desired number of rings (1 by default, and currently up to 16) on it
+ using nr_tx_rings and nr_rx_rings fields.</p>
+ </dd>
+ <dt id="NIOCTXSYNC"><a class="permalink" href="#NIOCTXSYNC"><code class="Dv">NIOCTXSYNC</code></a></dt>
+ <dd>tells the hardware of new packets to transmit, and updates the number of
+ slots available for transmission.</dd>
+ <dt id="NIOCRXSYNC"><a class="permalink" href="#NIOCRXSYNC"><code class="Dv">NIOCRXSYNC</code></a></dt>
+ <dd>tells the hardware of consumed packets, and asks for newly available
+ packets.</dd>
+</dl>
+</section>
+<section class="Sh">
+<h1 class="Sh" id="SELECT,_POLL,_EPOLL,_KQUEUE"><a class="permalink" href="#SELECT,_POLL,_EPOLL,_KQUEUE">SELECT,
+ POLL, EPOLL, KQUEUE</a></h1>
+<p class="Pp"><a class="Xr">select(2)</a> and <a class="Xr">poll(2)</a> on a
+ <code class="Nm">netmap</code> file descriptor process rings as indicated in
+ <a class="Sx" href="#TRANSMIT_RINGS">TRANSMIT RINGS</a> and
+ <a class="Sx" href="#RECEIVE_RINGS">RECEIVE RINGS</a>, respectively when
+ write (POLLOUT) and read (POLLIN) events are requested. Both block if no
+ slots are available in the ring (<var class="Va">ring-&gt;cur ==
+ ring-&gt;tail</var>). Depending on the platform, <a class="Xr">epoll(7)</a>
+ and <a class="Xr">kqueue(2)</a> are supported too.</p>
+<p class="Pp">Packets in transmit rings are normally pushed out (and buffers
+ reclaimed) even without requesting write events. Passing the
+ <code class="Dv">NETMAP_NO_TX_POLL</code> flag to
+ <i class="Em">NIOCREGIF</i> disables this feature. By default, receive rings
+ are processed only if read events are requested. Passing the
+ <code class="Dv">NETMAP_DO_RX_POLL</code> flag to <i class="Em">NIOCREGIF
+ updates receive rings even without read events.</i> Note that on
+ <a class="Xr">epoll(7)</a> and <a class="Xr">kqueue(2)</a>,
+ <code class="Dv">NETMAP_NO_TX_POLL</code> and
+ <code class="Dv">NETMAP_DO_RX_POLL</code> only have an effect when some
+ event is posted for the file descriptor.</p>
+</section>
+<section class="Sh">
+<h1 class="Sh" id="LIBRARIES"><a class="permalink" href="#LIBRARIES">LIBRARIES</a></h1>
+<p class="Pp">The <code class="Nm">netmap</code> API is supposed to be used
+ directly, both because of its simplicity and for efficient integration with
+ applications.</p>
+<p class="Pp">For convenience, the
+ <code class="In">&lt;<a class="In">net/netmap_user.h</a>&gt;</code> header
+ provides a few macros and functions to ease creating a file descriptor and
+ doing I/O with a <code class="Nm">netmap</code> port. These are loosely
+ modeled after the <a class="Xr">pcap(3)</a> API, to ease porting of
+ libpcap-based applications to <code class="Nm">netmap</code>. To use these
+ extra functions, programs should</p>
+<div class="Bd Bd-indent"><code class="Li">#define NETMAP_WITH_LIBS</code></div>
+before
+<div class="Bd Bd-indent"><code class="Li">#include
+ &lt;net/netmap_user.h&gt;</code></div>
+<p class="Pp">The following functions are available:</p>
+<dl class="Bl-tag">
+ <dt id="struct~4"><var class="Va">struct nm_desc * nm_open(const char *ifname,
+ const struct nmreq *req, uint64_t flags, const struct nm_desc
+ *arg</var>)</dt>
+ <dd>similar to <a class="Xr">pcap_open_live(3)</a>, binds a file descriptor to
+ a port.
+ <dl class="Bl-tag">
+ <dt id="ifname"><var class="Va">ifname</var></dt>
+ <dd>is a port name, in the form &quot;netmap:PPP&quot; for a NIC and
+ &quot;valeSSS:PPP&quot; for a <code class="Nm">VALE</code> port.</dd>
+ <dt id="req"><var class="Va">req</var></dt>
+ <dd>provides the initial values for the argument to the NIOCREGIF ioctl.
+ The nm_flags and nm_ringid values are overwritten by parsing ifname
+ and flags, and other fields can be overridden through the other two
+ arguments.</dd>
+ <dt id="arg"><var class="Va">arg</var></dt>
+ <dd>points to a struct nm_desc containing arguments (e.g., from a
+ previously open file descriptor) that should override the defaults.
+ The fields are used as described below</dd>
+ <dt id="flags"><var class="Va">flags</var></dt>
+ <dd>can be set to a combination of the following flags:
+ <var class="Va">NETMAP_NO_TX_POLL</var>,
+ <var class="Va">NETMAP_DO_RX_POLL</var> (copied into nr_ringid);
+ <var class="Va">NM_OPEN_NO_MMAP</var> (if arg points to the same
+ memory region, avoids the mmap and uses the values from it);
+ <var class="Va">NM_OPEN_IFNAME</var> (ignores ifname and uses the
+ values in arg); <var class="Va">NM_OPEN_ARG1</var>,
+ <var class="Va">NM_OPEN_ARG2</var>, <var class="Va">NM_OPEN_ARG3</var>
+ (uses the fields from arg); <var class="Va">NM_OPEN_RING_CFG</var>
+ (uses the ring number and sizes from arg).</dd>
+ </dl>
+ </dd>
+ <dt id="int"><var class="Va">int nm_close(struct nm_desc *d</var>)</dt>
+ <dd>closes the file descriptor, unmaps memory, frees resources.</dd>
+ <dt id="int~2"><var class="Va">int nm_inject(struct nm_desc *d, const void
+ *buf, size_t size</var>)</dt>
+ <dd>similar to <var class="Va">pcap_inject()</var>, pushes a packet to a ring,
+ returns the size of the packet is successful, or 0 on error;</dd>
+ <dt id="int~3"><var class="Va">int nm_dispatch(struct nm_desc *d, int cnt,
+ nm_cb_t cb, u_char *arg</var>)</dt>
+ <dd>similar to <var class="Va">pcap_dispatch()</var>, applies a callback to
+ incoming packets</dd>
+ <dt id="u_char"><var class="Va">u_char * nm_nextpkt(struct nm_desc *d, struct
+ nm_pkthdr *hdr</var>)</dt>
+ <dd>similar to <var class="Va">pcap_next()</var>, fetches the next packet</dd>
+</dl>
+</section>
+<section class="Sh">
+<h1 class="Sh" id="SUPPORTED_DEVICES"><a class="permalink" href="#SUPPORTED_DEVICES">SUPPORTED
+ DEVICES</a></h1>
+<p class="Pp"><code class="Nm">netmap</code> natively supports the following
+ devices:</p>
+<p class="Pp">On <span class="Ux">FreeBSD</span>: <a class="Xr">cxgbe(4)</a>,
+ <a class="Xr">em(4)</a>, <a class="Xr">iflib(4)</a> (providing
+ <a class="Xr">igb(4)</a> and <a class="Xr">em(4)</a>),
+ <a class="Xr">ix(4)</a>, <a class="Xr">ixl(4)</a>, <a class="Xr">re(4)</a>,
+ <a class="Xr">vtnet(4)</a>.</p>
+<p class="Pp">On Linux e1000, e1000e, i40e, igb, ixgbe, ixgbevf, r8169,
+ virtio_net, vmxnet3.</p>
+<p class="Pp">NICs without native support can still be used in
+ <code class="Nm">netmap</code> mode through emulation. Performance is
+ inferior to native netmap mode but still significantly higher than various
+ raw socket types (bpf, PF_PACKET, etc.). Note that for slow devices (such as
+ 1 Gbit/s and slower NICs, or several 10 Gbit/s NICs whose hardware is unable
+ to sustain line rate), emulated and native mode will likely have similar or
+ same throughput.</p>
+<p class="Pp">When emulation is in use, packet sniffer programs such as tcpdump
+ could see received packets before they are diverted by netmap. This
+ behaviour is not intentional, being just an artifact of the implementation
+ of emulation. Note that in case the netmap application subsequently moves
+ packets received from the emulated adapter onto the host RX ring, the
+ sniffer will intercept those packets again, since the packets are injected
+ to the host stack as they were received by the network interface.</p>
+<p class="Pp">Emulation is also available for devices with native netmap
+ support, which can be used for testing or performance comparison. The sysctl
+ variable <var class="Va">dev.netmap.admode</var> globally controls how
+ netmap mode is implemented.</p>
+</section>
+<section class="Sh">
+<h1 class="Sh" id="SYSCTL_VARIABLES_AND_MODULE_PARAMETERS"><a class="permalink" href="#SYSCTL_VARIABLES_AND_MODULE_PARAMETERS">SYSCTL
+ VARIABLES AND MODULE PARAMETERS</a></h1>
+<p class="Pp">Some aspects of the operation of <code class="Nm">netmap</code>
+ and <code class="Nm">VALE</code> are controlled through sysctl variables on
+ <span class="Ux">FreeBSD</span>
+ (<a class="permalink" href="#dev.netmap.*"><i class="Em" id="dev.netmap.*">dev.netmap.*</i></a>)
+ and module parameters on Linux
+ (<a class="permalink" href="#/sys/module/netmap/parameters/*"><i class="Em" id="/sys/module/netmap/parameters/*">/sys/module/netmap/parameters/*</i></a>):</p>
+<dl class="Bl-tag">
+ <dt id="dev.netmap.admode:"><var class="Va">dev.netmap.admode: 0</var></dt>
+ <dd>Controls the use of native or emulated adapter mode.
+ <p class="Pp">0 uses the best available option;</p>
+ <p class="Pp">1 forces native mode and fails if not available;</p>
+ <p class="Pp">2 forces emulated hence never fails.</p>
+ </dd>
+ <dt id="dev.netmap.generic_rings:"><var class="Va">dev.netmap.generic_rings:
+ 1</var></dt>
+ <dd>Number of rings used for emulated netmap mode</dd>
+ <dt id="dev.netmap.generic_ringsize:"><var class="Va">dev.netmap.generic_ringsize:
+ 1024</var></dt>
+ <dd>Ring size used for emulated netmap mode</dd>
+ <dt id="dev.netmap.generic_mit:"><var class="Va">dev.netmap.generic_mit:
+ 100000</var></dt>
+ <dd>Controls interrupt moderation for emulated mode</dd>
+ <dt id="dev.netmap.fwd:"><var class="Va">dev.netmap.fwd: 0</var></dt>
+ <dd>Forces NS_FORWARD mode</dd>
+ <dt id="dev.netmap.txsync_retry:"><var class="Va">dev.netmap.txsync_retry:
+ 2</var></dt>
+ <dd>Number of txsync loops in the <code class="Nm">VALE</code> flush
+ function</dd>
+ <dt id="dev.netmap.no_pendintr:"><var class="Va">dev.netmap.no_pendintr:
+ 1</var></dt>
+ <dd>Forces recovery of transmit buffers on system calls</dd>
+ <dt id="dev.netmap.no_timestamp:"><var class="Va">dev.netmap.no_timestamp:
+ 0</var></dt>
+ <dd>Disables the update of the timestamp in the netmap ring</dd>
+ <dt id="dev.netmap.verbose:"><var class="Va">dev.netmap.verbose: 0</var></dt>
+ <dd>Verbose kernel messages</dd>
+ <dt id="dev.netmap.buf_num:"><var class="Va">dev.netmap.buf_num:
+ 163840</var></dt>
+ <dd style="width: auto;">&#x00A0;</dd>
+ <dt id="dev.netmap.buf_size:"><var class="Va">dev.netmap.buf_size:
+ 2048</var></dt>
+ <dd style="width: auto;">&#x00A0;</dd>
+ <dt id="dev.netmap.ring_num:"><var class="Va">dev.netmap.ring_num:
+ 200</var></dt>
+ <dd style="width: auto;">&#x00A0;</dd>
+ <dt id="dev.netmap.ring_size:"><var class="Va">dev.netmap.ring_size:
+ 36864</var></dt>
+ <dd style="width: auto;">&#x00A0;</dd>
+ <dt id="dev.netmap.if_num:"><var class="Va">dev.netmap.if_num: 100</var></dt>
+ <dd style="width: auto;">&#x00A0;</dd>
+ <dt id="dev.netmap.if_size:"><var class="Va">dev.netmap.if_size:
+ 1024</var></dt>
+ <dd>Sizes and number of objects (netmap_if, netmap_ring, buffers) for the
+ global memory region. The only parameter worth modifying is
+ <var class="Va">dev.netmap.buf_num</var> as it impacts the total amount of
+ memory used by netmap.</dd>
+ <dt id="dev.netmap.buf_curr_num:"><var class="Va">dev.netmap.buf_curr_num:
+ 0</var></dt>
+ <dd style="width: auto;">&#x00A0;</dd>
+ <dt id="dev.netmap.buf_curr_size:"><var class="Va">dev.netmap.buf_curr_size:
+ 0</var></dt>
+ <dd style="width: auto;">&#x00A0;</dd>
+ <dt id="dev.netmap.ring_curr_num:"><var class="Va">dev.netmap.ring_curr_num:
+ 0</var></dt>
+ <dd style="width: auto;">&#x00A0;</dd>
+ <dt id="dev.netmap.ring_curr_size:"><var class="Va">dev.netmap.ring_curr_size:
+ 0</var></dt>
+ <dd style="width: auto;">&#x00A0;</dd>
+ <dt id="dev.netmap.if_curr_num:"><var class="Va">dev.netmap.if_curr_num:
+ 0</var></dt>
+ <dd style="width: auto;">&#x00A0;</dd>
+ <dt id="dev.netmap.if_curr_size:"><var class="Va">dev.netmap.if_curr_size:
+ 0</var></dt>
+ <dd>Actual values in use.</dd>
+ <dt id="dev.netmap.priv_buf_num:"><var class="Va">dev.netmap.priv_buf_num:
+ 4098</var></dt>
+ <dd style="width: auto;">&#x00A0;</dd>
+ <dt id="dev.netmap.priv_buf_size:"><var class="Va">dev.netmap.priv_buf_size:
+ 2048</var></dt>
+ <dd style="width: auto;">&#x00A0;</dd>
+ <dt id="dev.netmap.priv_ring_num:"><var class="Va">dev.netmap.priv_ring_num:
+ 4</var></dt>
+ <dd style="width: auto;">&#x00A0;</dd>
+ <dt id="dev.netmap.priv_ring_size:"><var class="Va">dev.netmap.priv_ring_size:
+ 20480</var></dt>
+ <dd style="width: auto;">&#x00A0;</dd>
+ <dt id="dev.netmap.priv_if_num:"><var class="Va">dev.netmap.priv_if_num:
+ 2</var></dt>
+ <dd style="width: auto;">&#x00A0;</dd>
+ <dt id="dev.netmap.priv_if_size:"><var class="Va">dev.netmap.priv_if_size:
+ 1024</var></dt>
+ <dd>Sizes and number of objects (netmap_if, netmap_ring, buffers) for private
+ memory regions. A separate memory region is used for each
+ <code class="Nm">VALE</code> port and each pair of <code class="Nm">netmap
+ pipes</code>.</dd>
+ <dt id="dev.netmap.bridge_batch:"><var class="Va">dev.netmap.bridge_batch:
+ 1024</var></dt>
+ <dd>Batch size used when moving packets across a <code class="Nm">VALE</code>
+ switch. Values above 64 generally guarantee good performance.</dd>
+ <dt id="dev.netmap.max_bridges:"><var class="Va">dev.netmap.max_bridges:
+ 8</var></dt>
+ <dd>Max number of <code class="Nm">VALE</code> switches that can be created.
+ This tunable can be specified at loader time.</dd>
+ <dt id="dev.netmap.ptnet_vnet_hdr:"><var class="Va">dev.netmap.ptnet_vnet_hdr:
+ 1</var></dt>
+ <dd>Allow ptnet devices to use virtio-net headers</dd>
+ <dt id="dev.netmap.port_numa_affinity:"><var class="Va">dev.netmap.port_numa_affinity:
+ 0</var></dt>
+ <dd>On <a class="Xr">numa(4)</a> systems, allocate memory for netmap ports
+ from the local NUMA domain when possible. This can improve performance by
+ reducing the number of remote memory accesses. However, when forwarding
+ packets between ports attached to different NUMA domains, this will
+ prevent zero-copy forwarding optimizations and thus may hurt performance.
+ Note that this setting must be specified as a loader tunable at boot
+ time.</dd>
+</dl>
+</section>
+<section class="Sh">
+<h1 class="Sh" id="SYSTEM_CALLS"><a class="permalink" href="#SYSTEM_CALLS">SYSTEM
+ CALLS</a></h1>
+<p class="Pp"><code class="Nm">netmap</code> uses <a class="Xr">select(2)</a>,
+ <a class="Xr">poll(2)</a>, <a class="Xr">epoll(7)</a> and
+ <a class="Xr">kqueue(2)</a> to wake up processes when significant events
+ occur, and <a class="Xr">mmap(2)</a> to map memory.
+ <a class="Xr">ioctl(2)</a> is used to configure ports and
+ <code class="Nm">VALE switches</code>.</p>
+<p class="Pp">Applications may need to create threads and bind them to specific
+ cores to improve performance, using standard OS primitives, see
+ <a class="Xr">pthread(3)</a>. In particular,
+ <a class="Xr">pthread_setaffinity_np(3)</a> may be of use.</p>
+</section>
+<section class="Sh">
+<h1 class="Sh" id="EXAMPLES"><a class="permalink" href="#EXAMPLES">EXAMPLES</a></h1>
+<section class="Ss">
+<h2 class="Ss" id="TEST_PROGRAMS"><a class="permalink" href="#TEST_PROGRAMS">TEST
+ PROGRAMS</a></h2>
+<p class="Pp"><code class="Nm">netmap</code> comes with a few programs that can
+ be used for testing or simple applications. See the
+ <span class="Pa">examples/</span> directory in
+ <code class="Nm">netmap</code> distributions, or
+ <span class="Pa">tools/tools/netmap/</span> directory in
+ <span class="Ux">FreeBSD</span> distributions.</p>
+<p class="Pp"><a class="Xr">pkt-gen(8)</a> is a general purpose traffic
+ source/sink.</p>
+<p class="Pp">As an example</p>
+<div class="Bd Bd-indent"><code class="Li">pkt-gen -i ix0 -f tx -l
+ 60</code></div>
+can generate an infinite stream of minimum size packets, and
+<div class="Bd Bd-indent"><code class="Li">pkt-gen -i ix0 -f rx</code></div>
+is a traffic sink. Both print traffic statistics, to help monitor how the system
+ performs.
+<p class="Pp"><a class="Xr">pkt-gen(8)</a> has many options can be uses to set
+ packet sizes, addresses, rates, and use multiple send/receive threads and
+ cores.</p>
+<p class="Pp"><a class="Xr">bridge(4)</a> is another test program which
+ interconnects two <code class="Nm">netmap</code> ports. It can be used for
+ transparent forwarding between interfaces, as in</p>
+<div class="Bd Bd-indent"><code class="Li">bridge -i netmap:ix0 -i
+ netmap:ix1</code></div>
+or even connect the NIC to the host stack using netmap
+<div class="Bd Bd-indent"><code class="Li">bridge -i netmap:ix0</code></div>
+</section>
+<section class="Ss">
+<h2 class="Ss" id="USING_THE_NATIVE_API"><a class="permalink" href="#USING_THE_NATIVE_API">USING
+ THE NATIVE API</a></h2>
+<p class="Pp">The following code implements a traffic generator:</p>
+<p class="Pp"></p>
+<div class="Bd Li">
+<pre>#include &lt;net/netmap_user.h&gt;
+...
+void sender(void)
+{
+ struct netmap_if *nifp;
+ struct netmap_ring *ring;
+ struct nmreq nmr;
+ struct pollfd fds;
+
+ fd = open(&quot;/dev/netmap&quot;, O_RDWR);
+ bzero(&amp;nmr, sizeof(nmr));
+ strcpy(nmr.nr_name, &quot;ix0&quot;);
+ nmr.nm_version = NETMAP_API;
+ ioctl(fd, NIOCREGIF, &amp;nmr);
+ p = mmap(0, nmr.nr_memsize, fd);
+ nifp = NETMAP_IF(p, nmr.nr_offset);
+ ring = NETMAP_TXRING(nifp, 0);
+ fds.fd = fd;
+ fds.events = POLLOUT;
+ for (;;) {
+ poll(&amp;fds, 1, -1);
+ while (!nm_ring_empty(ring)) {
+ i = ring-&gt;cur;
+ buf = NETMAP_BUF(ring, ring-&gt;slot[i].buf_index);
+ ... prepare packet in buf ...
+ ring-&gt;slot[i].len = ... packet length ...
+ ring-&gt;head = ring-&gt;cur = nm_ring_next(ring, i);
+ }
+ }
+}</pre>
+</div>
+</section>
+<section class="Ss">
+<h2 class="Ss" id="HELPER_FUNCTIONS"><a class="permalink" href="#HELPER_FUNCTIONS">HELPER
+ FUNCTIONS</a></h2>
+<p class="Pp">A simple receiver can be implemented using the helper
+ functions:</p>
+<p class="Pp"></p>
+<div class="Bd Li">
+<pre>#define NETMAP_WITH_LIBS
+#include &lt;net/netmap_user.h&gt;
+...
+void receiver(void)
+{
+ struct nm_desc *d;
+ struct pollfd fds;
+ u_char *buf;
+ struct nm_pkthdr h;
+ ...
+ d = nm_open(&quot;netmap:ix0&quot;, NULL, 0, 0);
+ fds.fd = NETMAP_FD(d);
+ fds.events = POLLIN;
+ for (;;) {
+ poll(&amp;fds, 1, -1);
+ while ( (buf = nm_nextpkt(d, &amp;h)) )
+ consume_pkt(buf, h.len);
+ }
+ nm_close(d);
+}</pre>
+</div>
+</section>
+<section class="Ss">
+<h2 class="Ss" id="ZERO-COPY_FORWARDING"><a class="permalink" href="#ZERO-COPY_FORWARDING">ZERO-COPY
+ FORWARDING</a></h2>
+<p class="Pp">Since physical interfaces share the same memory region, it is
+ possible to do packet forwarding between ports swapping buffers. The buffer
+ from the transmit ring is used to replenish the receive ring:</p>
+<p class="Pp"></p>
+<div class="Bd Li">
+<pre> uint32_t tmp;
+ struct netmap_slot *src, *dst;
+ ...
+ src = &amp;src_ring-&gt;slot[rxr-&gt;cur];
+ dst = &amp;dst_ring-&gt;slot[txr-&gt;cur];
+ tmp = dst-&gt;buf_idx;
+ dst-&gt;buf_idx = src-&gt;buf_idx;
+ dst-&gt;len = src-&gt;len;
+ dst-&gt;flags = NS_BUF_CHANGED;
+ src-&gt;buf_idx = tmp;
+ src-&gt;flags = NS_BUF_CHANGED;
+ rxr-&gt;head = rxr-&gt;cur = nm_ring_next(rxr, rxr-&gt;cur);
+ txr-&gt;head = txr-&gt;cur = nm_ring_next(txr, txr-&gt;cur);
+ ...</pre>
+</div>
+</section>
+<section class="Ss">
+<h2 class="Ss" id="ACCESSING_THE_HOST_STACK"><a class="permalink" href="#ACCESSING_THE_HOST_STACK">ACCESSING
+ THE HOST STACK</a></h2>
+<p class="Pp">The host stack is for all practical purposes just a regular ring
+ pair, which you can access with the netmap API (e.g., with</p>
+<div class="Bd Bd-indent"><code class="Li">nm_open(&quot;netmap:eth0^&quot;,
+ ...</code></div>
+); All packets that the host would send to an interface in
+ <code class="Nm">netmap</code> mode end up into the RX ring, whereas all
+ packets queued to the TX ring are send up to the host stack.
+</section>
+<section class="Ss">
+<h2 class="Ss" id="VALE_SWITCH"><a class="permalink" href="#VALE_SWITCH">VALE
+ SWITCH</a></h2>
+<p class="Pp">A simple way to test the performance of a
+ <code class="Nm">VALE</code> switch is to attach a sender and a receiver to
+ it, e.g., running the following in two different terminals:</p>
+<div class="Bd Bd-indent"><code class="Li">pkt-gen -i vale1:a -f rx #
+ receiver</code></div>
+<div class="Bd Bd-indent"><code class="Li">pkt-gen -i vale1:b -f tx #
+ sender</code></div>
+The same example can be used to test netmap pipes, by simply changing port
+ names, e.g.,
+<div class="Bd Bd-indent"><code class="Li">pkt-gen -i vale2:x{3 -f rx # receiver
+ on the master side</code></div>
+<div class="Bd Bd-indent"><code class="Li">pkt-gen -i vale2:x}3 -f tx # sender
+ on the slave side</code></div>
+<p class="Pp">The following command attaches an interface and the host stack to
+ a switch:</p>
+<div class="Bd Bd-indent"><code class="Li">valectl -h vale2:em0</code></div>
+Other <code class="Nm">netmap</code> clients attached to the same switch can now
+ communicate with the network card or the host.
+</section>
+</section>
+<section class="Sh">
+<h1 class="Sh" id="SEE_ALSO"><a class="permalink" href="#SEE_ALSO">SEE
+ ALSO</a></h1>
+<p class="Pp"><a class="Xr">vale(4)</a>, <a class="Xr">bridge(8)</a>,
+ <a class="Xr">lb(8)</a>, <a class="Xr">nmreplay(8)</a>,
+ <a class="Xr">pkt-gen(8)</a>, <a class="Xr">valectl(8)</a></p>
+<p class="Pp"><span class="Pa">http://info.iet.unipi.it/~luigi/netmap/</span></p>
+<p class="Pp">Luigi Rizzo, Revisiting network I/O APIs: the netmap framework,
+ Communications of the ACM, 55 (3), pp.45-51, March 2012</p>
+<p class="Pp">Luigi Rizzo, netmap: a novel framework for fast packet I/O, Usenix
+ ATC'12, June 2012, Boston</p>
+<p class="Pp">Luigi Rizzo, Giuseppe Lettieri, VALE, a switched ethernet for
+ virtual machines, ACM CoNEXT'12, December 2012, Nice</p>
+<p class="Pp">Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione, Speeding up
+ packet I/O in virtual machines, ACM/IEEE ANCS'13, October 2013, San Jose</p>
+</section>
+<section class="Sh">
+<h1 class="Sh" id="AUTHORS"><a class="permalink" href="#AUTHORS">AUTHORS</a></h1>
+<p class="Pp">The <code class="Nm">netmap</code> framework has been originally
+ designed and implemented at the Universita` di Pisa in 2011 by
+ <span class="An">Luigi Rizzo</span>, and further extended with help from
+ <span class="An">Matteo Landi</span>, <span class="An">Gaetano
+ Catalli</span>, <span class="An">Giuseppe Lettieri</span>, and
+ <span class="An">Vincenzo Maffione</span>.</p>
+<p class="Pp"><code class="Nm">netmap</code> and <code class="Nm">VALE</code>
+ have been funded by the European Commission within FP7 Projects CHANGE
+ (257422) and OPENLAB (287581).</p>
+</section>
+<section class="Sh">
+<h1 class="Sh" id="CAVEATS"><a class="permalink" href="#CAVEATS">CAVEATS</a></h1>
+<p class="Pp">No matter how fast the CPU and OS are, achieving line rate on 10G
+ and faster interfaces requires hardware with sufficient performance. Several
+ NICs are unable to sustain line rate with small packet sizes. Insufficient
+ PCIe or memory bandwidth can also cause reduced performance.</p>
+<p class="Pp">Another frequent reason for low performance is the use of flow
+ control on the link: a slow receiver can limit the transmit speed. Be sure
+ to disable flow control when running high speed experiments.</p>
+<section class="Ss">
+<h2 class="Ss" id="SPECIAL_NIC_FEATURES"><a class="permalink" href="#SPECIAL_NIC_FEATURES">SPECIAL
+ NIC FEATURES</a></h2>
+<p class="Pp"><code class="Nm">netmap</code> is orthogonal to some NIC features
+ such as multiqueue, schedulers, packet filters.</p>
+<p class="Pp">Multiple transmit and receive rings are supported natively and can
+ be configured with ordinary OS tools, such as <a class="Xr">ethtool(8)</a>
+ or device-specific sysctl variables. The same goes for Receive Packet
+ Steering (RPS) and filtering of incoming traffic.</p>
+<p class="Pp" id="does"><code class="Nm">netmap</code>
+ <a class="permalink" href="#does"><i class="Em">does not use</i></a>
+ features such as
+ <a class="permalink" href="#checksum"><i class="Em" id="checksum">checksum
+ offloading</i></a>,
+ <a class="permalink" href="#TCP"><i class="Em" id="TCP">TCP segmentation
+ offloading</i></a>,
+ <a class="permalink" href="#encryption"><i class="Em" id="encryption">encryption</i></a>,
+ <a class="permalink" href="#VLAN"><i class="Em" id="VLAN">VLAN
+ encapsulation/decapsulation</i></a>, etc. When using netmap to exchange
+ packets with the host stack, make sure to disable these features.</p>
+</section>
+</section>
+</div>
+<table class="foot">
+ <tr>
+ <td class="foot-date">October 10, 2024</td>
+ <td class="foot-os">FreeBSD 15.0</td>
+ </tr>
+</table>