summaryrefslogtreecommitdiff
path: root/static/freebsd/man4/netmap.4 3.html
diff options
context:
space:
mode:
Diffstat (limited to 'static/freebsd/man4/netmap.4 3.html')
-rw-r--r--static/freebsd/man4/netmap.4 3.html1015
1 files changed, 0 insertions, 1015 deletions
diff --git a/static/freebsd/man4/netmap.4 3.html b/static/freebsd/man4/netmap.4 3.html
deleted file mode 100644
index e0cbff0e..00000000
--- a/static/freebsd/man4/netmap.4 3.html
+++ /dev/null
@@ -1,1015 +0,0 @@
-<table class="head">
- <tr>
- <td class="head-ltitle">NETMAP(4)</td>
- <td class="head-vol">Device Drivers Manual</td>
- <td class="head-rtitle">NETMAP(4)</td>
- </tr>
-</table>
-<div class="manual-text">
-<section class="Sh">
-<h1 class="Sh" id="NAME"><a class="permalink" href="#NAME">NAME</a></h1>
-<p class="Pp"><code class="Nm">netmap</code> &#x2014; <span class="Nd">a
- framework for fast packet I/O</span></p>
-</section>
-<section class="Sh">
-<h1 class="Sh" id="SYNOPSIS"><a class="permalink" href="#SYNOPSIS">SYNOPSIS</a></h1>
-<p class="Pp"><code class="Cd">device netmap</code></p>
-</section>
-<section class="Sh">
-<h1 class="Sh" id="DESCRIPTION"><a class="permalink" href="#DESCRIPTION">DESCRIPTION</a></h1>
-<p class="Pp"><code class="Nm">netmap</code> is a framework for extremely fast
- and efficient packet I/O for userspace and kernel clients, and for Virtual
- Machines. It runs on <span class="Ux">FreeBSD</span>, Linux and some
- versions of Windows, and supports a variety of <code class="Nm">netmap
- ports</code>, including</p>
-<dl class="Bl-tag">
- <dt><code class="Nm">physical NIC ports</code></dt>
- <dd>to access individual queues of network interfaces;</dd>
- <dt><code class="Nm">host ports</code></dt>
- <dd>to inject packets into the host stack;</dd>
- <dt><code class="Nm">VALE ports</code></dt>
- <dd>implementing a very fast and modular in-kernel software
- switch/dataplane;</dd>
- <dt><code class="Nm">netmap pipes</code></dt>
- <dd>a shared memory packet transport channel;</dd>
- <dt><code class="Nm">netmap monitors</code></dt>
- <dd>a mechanism similar to <a class="Xr">bpf(4)</a> to capture traffic</dd>
-</dl>
-<p class="Pp">All these <code class="Nm">netmap ports</code> are accessed
- interchangeably with the same API, and are at least one order of magnitude
- faster than standard OS mechanisms (sockets, bpf, tun/tap interfaces, native
- switches, pipes). With suitably fast hardware (NICs, PCIe buses, CPUs),
- packet I/O using <code class="Nm">netmap</code> on supported NICs reaches
- 14.88 million packets per second (Mpps) with much less than one core on 10
- Gbit/s NICs; 35-40 Mpps on 40 Gbit/s NICs (limited by the hardware); about
- 20 Mpps per core for VALE ports; and over 100 Mpps for
- <code class="Nm">netmap pipes</code>. NICs without native
- <code class="Nm">netmap</code> support can still use the API in emulated
- mode, which uses unmodified device drivers and is 3-5 times faster than
- <a class="Xr">bpf(4)</a> or raw sockets.</p>
-<p class="Pp">Userspace clients can dynamically switch NICs into
- <code class="Nm">netmap</code> mode and send and receive raw packets through
- memory mapped buffers. Similarly, <code class="Nm">VALE</code> switch
- instances and ports, <code class="Nm">netmap pipes</code> and
- <code class="Nm">netmap monitors</code> can be created dynamically,
- providing high speed packet I/O between processes, virtual machines, NICs
- and the host stack.</p>
-<p class="Pp"><code class="Nm">netmap</code> supports both non-blocking I/O
- through <a class="Xr">ioctl(2)</a>, synchronization and blocking I/O through
- a file descriptor and standard OS mechanisms such as
- <a class="Xr">select(2)</a>, <a class="Xr">poll(2)</a>,
- <a class="Xr">kqueue(2)</a> and <a class="Xr">epoll(7)</a>. All types of
- <code class="Nm">netmap ports</code> and the <code class="Nm">VALE
- switch</code> are implemented by a single kernel module, which also emulates
- the <code class="Nm">netmap</code> API over standard drivers. For best
- performance, <code class="Nm">netmap</code> requires native support in
- device drivers. A list of such devices is at the end of this document.</p>
-<p class="Pp">In the rest of this (long) manual page we document various aspects
- of the <code class="Nm">netmap</code> and <code class="Nm">VALE</code>
- architecture, features and usage.</p>
-</section>
-<section class="Sh">
-<h1 class="Sh" id="ARCHITECTURE"><a class="permalink" href="#ARCHITECTURE">ARCHITECTURE</a></h1>
-<p class="Pp"><code class="Nm">netmap</code> supports raw packet I/O through a
- <a class="permalink" href="#port"><i class="Em" id="port">port</i></a>,
- which can be connected to a physical interface
- (<a class="permalink" href="#NIC"><i class="Em" id="NIC">NIC</i></a>), to
- the host stack, or to a <code class="Nm">VALE</code> switch. Ports use
- preallocated circular queues of buffers
- (<a class="permalink" href="#rings"><i class="Em" id="rings">rings</i></a>)
- residing in an mmapped region. There is one ring for each transmit/receive
- queue of a NIC or virtual port. An additional ring pair connects to the host
- stack.</p>
-<p class="Pp">After binding a file descriptor to a port, a
- <code class="Nm">netmap</code> client can send or receive packets in batches
- through the rings, and possibly implement zero-copy forwarding between
- ports.</p>
-<p class="Pp">All NICs operating in <code class="Nm">netmap</code> mode use the
- same memory region, accessible to all processes who own
- <span class="Pa">/dev/netmap</span> file descriptors bound to NICs.
- Independent <code class="Nm">VALE</code> and <code class="Nm">netmap
- pipe</code> ports by default use separate memory regions, but can be
- independently configured to share memory.</p>
-</section>
-<section class="Sh">
-<h1 class="Sh" id="ENTERING_AND_EXITING_NETMAP_MODE"><a class="permalink" href="#ENTERING_AND_EXITING_NETMAP_MODE">ENTERING
- AND EXITING NETMAP MODE</a></h1>
-<p class="Pp">The following section describes the system calls to create and
- control <code class="Nm">netmap</code> ports (including
- <code class="Nm">VALE</code> and <code class="Nm">netmap pipe</code> ports).
- Simpler, higher level functions are described in the
- <a class="Sx" href="#LIBRARIES">LIBRARIES</a> section.</p>
-<p class="Pp">Ports and rings are created and controlled through a file
- descriptor, created by opening a special device</p>
-<div class="Bd Bd-indent"><code class="Li">fd =
- open(&quot;/dev/netmap&quot;);</code></div>
-and then bound to a specific port with an
-<div class="Bd Bd-indent"><code class="Li">ioctl(fd, NIOCREGIF, (struct nmreq
- *)arg);</code></div>
-<p class="Pp"><code class="Nm">netmap</code> has multiple modes of operation
- controlled by the <var class="Vt">struct nmreq</var> argument.
- <var class="Va">arg.nr_name</var> specifies the netmap port name, as
- follows:</p>
-<dl class="Bl-tag">
- <dt id="OS"><a class="permalink" href="#OS"><code class="Dv">OS network
- interface name (e.g., 'em0', 'eth1', ...</code></a>)</dt>
- <dd>the data path of the NIC is disconnected from the host stack, and the file
- descriptor is bound to the NIC (one or all queues), or to the host
- stack;</dd>
- <dt id="valeSSS:PPP"><a class="permalink" href="#valeSSS:PPP"><code class="Dv">valeSSS:PPP</code></a></dt>
- <dd>the file descriptor is bound to port PPP of VALE switch SSS. Switch
- instances and ports are dynamically created if necessary.
- <p class="Pp">Both SSS and PPP have the form [0-9a-zA-Z_]+ , the string
- cannot exceed IFNAMSIZ characters, and PPP cannot be the name of any
- existing OS network interface.</p>
- </dd>
-</dl>
-<p class="Pp">On return, <var class="Va">arg</var> indicates the size of the
- shared memory region, and the number, size and location of all the
- <code class="Nm">netmap</code> data structures, which can be accessed by
- mmapping the memory</p>
-<div class="Bd Bd-indent"><code class="Li">char *mem = mmap(0, arg.nr_memsize,
- fd);</code></div>
-<p class="Pp">Non-blocking I/O is done with special <a class="Xr">ioctl(2)</a>
- <a class="Xr">select(2)</a> and <a class="Xr">poll(2)</a> on the file
- descriptor permit blocking I/O.</p>
-<p class="Pp">While a NIC is in <code class="Nm">netmap</code> mode, the OS will
- still believe the interface is up and running. OS-generated packets for that
- NIC end up into a <code class="Nm">netmap</code> ring, and another ring is
- used to send packets into the OS network stack. A <a class="Xr">close(2)</a>
- on the file descriptor removes the binding, and returns the NIC to normal
- mode (reconnecting the data path to the host stack), or destroys the virtual
- port.</p>
-</section>
-<section class="Sh">
-<h1 class="Sh" id="DATA_STRUCTURES"><a class="permalink" href="#DATA_STRUCTURES">DATA
- STRUCTURES</a></h1>
-<p class="Pp">The data structures in the mmapped memory region are detailed in
- <code class="In">&lt;<a class="In">sys/net/netmap.h</a>&gt;</code>, which is
- the ultimate reference for the <code class="Nm">netmap</code> API. The main
- structures and fields are indicated below:</p>
-<dl class="Bl-tag">
- <dt id="struct"><a class="permalink" href="#struct"><code class="Dv">struct
- netmap_if (one per interface</code></a>)</dt>
- <dd>
- <div class="Bd Pp Li">
- <pre>struct netmap_if {
- ...
- const uint32_t ni_flags; /* properties */
- ...
- const uint32_t ni_tx_rings; /* NIC tx rings */
- const uint32_t ni_rx_rings; /* NIC rx rings */
- uint32_t ni_bufs_head; /* head of extra bufs list */
- ...
-};</pre>
- </div>
- <p class="Pp">Indicates the number of available rings
- (<span class="Pa">struct netmap_rings</span>) and their position in the
- mmapped region. The number of tx and rx rings
- (<span class="Pa">ni_tx_rings</span>,
- <span class="Pa">ni_rx_rings</span>) normally depends on the hardware.
- NICs also have an extra tx/rx ring pair connected to the host stack.
- <i class="Em">NIOCREGIF</i> can also request additional unbound buffers
- in the same memory space, to be used as temporary storage for packets.
- The number of extra buffers is specified in the
- <var class="Va">arg.nr_arg3</var> field. On success, the kernel writes
- back to <var class="Va">arg.nr_arg3</var> the number of extra buffers
- actually allocated (they may be less than the amount requested if the
- memory space ran out of buffers). <span class="Pa">ni_bufs_head</span>
- contains the index of the first of these extra buffers, which are
- connected in a list (the first uint32_t of each buffer being the index
- of the next buffer in the list). A <code class="Dv">0</code> indicates
- the end of the list. The application is free to modify this list and use
- the buffers (i.e., binding them to the slots of a netmap ring). When
- closing the netmap file descriptor, the kernel frees the buffers
- contained in the list pointed by <span class="Pa">ni_bufs_head</span> ,
- irrespectively of the buffers originally provided by the kernel on
- <i class="Em">NIOCREGIF</i>.</p>
- </dd>
- <dt id="struct~2"><a class="permalink" href="#struct~2"><code class="Dv">struct
- netmap_ring (one per ring</code></a>)</dt>
- <dd>
- <div class="Bd Pp Li">
- <pre>struct netmap_ring {
- ...
- const uint32_t num_slots; /* slots in each ring */
- const uint32_t nr_buf_size; /* size of each buffer */
- ...
- uint32_t head; /* (u) first buf owned by user */
- uint32_t cur; /* (u) wakeup position */
- const uint32_t tail; /* (k) first buf owned by kernel */
- ...
- uint32_t flags;
- struct timeval ts; /* (k) time of last rxsync() */
- ...
- struct netmap_slot slot[0]; /* array of slots */
-}</pre>
- </div>
- <p class="Pp" id="slots">Implements transmit and receive rings, with
- read/write pointers, metadata and an array of
- <a class="permalink" href="#slots"><i class="Em">slots</i></a>
- describing the buffers.</p>
- </dd>
- <dt id="struct~3"><a class="permalink" href="#struct~3"><code class="Dv">struct
- netmap_slot (one per buffer</code></a>)</dt>
- <dd>
- <div class="Bd Pp Li">
- <pre>struct netmap_slot {
- uint32_t buf_idx; /* buffer index */
- uint16_t len; /* packet length */
- uint16_t flags; /* buf changed, etc. */
- uint64_t ptr; /* address for indirect buffers */
-};</pre>
- </div>
- <p class="Pp">Describes a packet buffer, which normally is identified by an
- index and resides in the mmapped region.</p>
- </dd>
- <dt id="packet"><a class="permalink" href="#packet"><code class="Dv">packet
- buffers</code></a></dt>
- <dd>Fixed size (normally 2 KB) packet buffers allocated by the kernel.</dd>
-</dl>
-<p class="Pp">The offset of the <span class="Pa">struct netmap_if</span> in the
- mmapped region is indicated by the <span class="Pa">nr_offset</span> field
- in the structure returned by <code class="Dv">NIOCREGIF</code>. From there,
- all other objects are reachable through relative references (offsets or
- indexes). Macros and functions in
- <code class="In">&lt;<a class="In">net/netmap_user.h</a>&gt;</code> help
- converting them into actual pointers:</p>
-<p class="Pp"></p>
-<div class="Bd Bd-indent"><code class="Li">struct netmap_if *nifp =
- NETMAP_IF(mem, arg.nr_offset);</code></div>
-<div class="Bd Bd-indent"><code class="Li">struct netmap_ring *txr =
- NETMAP_TXRING(nifp, ring_index);</code></div>
-<div class="Bd Bd-indent"><code class="Li">struct netmap_ring *rxr =
- NETMAP_RXRING(nifp, ring_index);</code></div>
-<p class="Pp"></p>
-<div class="Bd Bd-indent"><code class="Li">char *buf = NETMAP_BUF(ring,
- buffer_index);</code></div>
-</section>
-<section class="Sh">
-<h1 class="Sh" id="RINGS,_BUFFERS_AND_DATA_I/O"><a class="permalink" href="#RINGS,_BUFFERS_AND_DATA_I/O">RINGS,
- BUFFERS AND DATA I/O</a></h1>
-<p class="Pp"><var class="Va">Rings</var> are circular queues of packets with
- three indexes/pointers (<var class="Va">head</var>,
- <var class="Va">cur</var>, <var class="Va">tail</var>); one slot is always
- kept empty. The ring size (<var class="Va">num_slots</var>) should not be
- assumed to be a power of two.</p>
-<p class="Pp"><var class="Va">head</var> is the first slot available to
- userspace;</p>
-<p class="Pp"><var class="Va">cur</var> is the wakeup point: select/poll will
- unblock when <var class="Va">tail</var> passes
- <var class="Va">cur</var>;</p>
-<p class="Pp"><var class="Va">tail</var> is the first slot reserved to the
- kernel.</p>
-<p class="Pp">Slot indexes <i class="Em">must</i> only move forward; for
- convenience, the function</p>
-<div class="Bd Bd-indent"><code class="Li">nm_ring_next(ring,
- index)</code></div>
-returns the next index modulo the ring size.
-<p class="Pp"><var class="Va">head</var> and <var class="Va">cur</var> are only
- modified by the user program; <var class="Va">tail</var> is only modified by
- the kernel. The kernel only reads/writes the <var class="Vt">struct
- netmap_ring</var> slots and buffers during the execution of a netmap-related
- system call. The only exception are slots (and buffers) in the range
- <var class="Va">tail&#x00A0;</var>... <var class="Va">head-1</var>, that are
- explicitly assigned to the kernel.</p>
-<section class="Ss">
-<h2 class="Ss" id="TRANSMIT_RINGS"><a class="permalink" href="#TRANSMIT_RINGS">TRANSMIT
- RINGS</a></h2>
-<p class="Pp">On transmit rings, after a <code class="Nm">netmap</code> system
- call, slots in the range <var class="Va">head&#x00A0;</var>...
- <var class="Va">tail-1</var> are available for transmission. User code
- should fill the slots sequentially and advance <var class="Va">head</var>
- and <var class="Va">cur</var> past slots ready to transmit.
- <var class="Va">cur</var> may be moved further ahead if the user code needs
- more slots before further transmissions (see
- <a class="Sx" href="#SCATTER_GATHER_I/O">SCATTER GATHER I/O</a>).</p>
-<p class="Pp">At the next NIOCTXSYNC/select()/poll(), slots up to
- <var class="Va">head-1</var> are pushed to the port, and
- <var class="Va">tail</var> may advance if further slots have become
- available. Below is an example of the evolution of a TX ring:</p>
-<div class="Bd Pp Li">
-<pre> after the syscall, slots between cur and tail are (a)vailable
- head=cur tail
- | |
- v v
- TX [.....aaaaaaaaaaa.............]
-
- user creates new packets to (T)ransmit
- head=cur tail
- | |
- v v
- TX [.....TTTTTaaaaaa.............]
-
- NIOCTXSYNC/poll()/select() sends packets and reports new slots
- head=cur tail
- | |
- v v
- TX [..........aaaaaaaaaaa........]</pre>
-</div>
-<p class="Pp" id="select"><a class="permalink" href="#select"><code class="Fn">select</code></a>()
- and
- <a class="permalink" href="#poll"><code class="Fn" id="poll">poll</code></a>()
- will block if there is no space in the ring, i.e.,</p>
-<div class="Bd Bd-indent"><code class="Li">ring-&gt;cur ==
- ring-&gt;tail</code></div>
-and return when new slots have become available.
-<p class="Pp">High speed applications may want to amortize the cost of system
- calls by preparing as many packets as possible before issuing them.</p>
-<p class="Pp">A transmit ring with pending transmissions has</p>
-<div class="Bd Bd-indent"><code class="Li">ring-&gt;head != ring-&gt;tail + 1
- (modulo the ring size).</code></div>
-The function <var class="Va">int nm_tx_pending(ring)</var> implements this test.
-</section>
-<section class="Ss">
-<h2 class="Ss" id="RECEIVE_RINGS"><a class="permalink" href="#RECEIVE_RINGS">RECEIVE
- RINGS</a></h2>
-<p class="Pp">On receive rings, after a <code class="Nm">netmap</code> system
- call, the slots in the range <var class="Va">head</var>...
- <var class="Va">tail-1</var> contain received packets. User code should
- process them and advance <var class="Va">head</var> and
- <var class="Va">cur</var> past slots it wants to return to the kernel.
- <var class="Va">cur</var> may be moved further ahead if the user code wants
- to wait for more packets without returning all the previous slots to the
- kernel.</p>
-<p class="Pp">At the next NIOCRXSYNC/select()/poll(), slots up to
- <var class="Va">head-1</var> are returned to the kernel for further
- receives, and <var class="Va">tail</var> may advance to report new incoming
- packets.</p>
-<p class="Pp">Below is an example of the evolution of an RX ring:</p>
-<div class="Bd Pp Li">
-<pre> after the syscall, there are some (h)eld and some (R)eceived slots
- head cur tail
- | | |
- v v v
- RX [..hhhhhhRRRRRRRR..........]
-
- user advances head and cur, releasing some slots and holding others
- head cur tail
- | | |
- v v v
- RX [..*****hhhRRRRRR...........]
-
- NICRXSYNC/poll()/select() recovers slots and reports new packets
- head cur tail
- | | |
- v v v
- RX [.......hhhRRRRRRRRRRRR....]</pre>
-</div>
-</section>
-</section>
-<section class="Sh">
-<h1 class="Sh" id="SLOTS_AND_PACKET_BUFFERS"><a class="permalink" href="#SLOTS_AND_PACKET_BUFFERS">SLOTS
- AND PACKET BUFFERS</a></h1>
-<p class="Pp">Normally, packets should be stored in the netmap-allocated buffers
- assigned to slots when ports are bound to a file descriptor. One packet is
- fully contained in a single buffer.</p>
-<p class="Pp">The following flags affect slot and buffer processing:</p>
-<dl class="Bl-tag">
- <dt id="must">NS_BUF_CHANGED</dt>
- <dd><a class="permalink" href="#must"><i class="Em">must</i></a> be used when
- the <var class="Va">buf_idx</var> in the slot is changed. This can be used
- to implement zero-copy forwarding, see
- <a class="Sx" href="#ZERO_COPY_FORWARDING">ZERO-COPY FORWARDING</a>.</dd>
- <dt>NS_REPORT</dt>
- <dd>reports when this buffer has been transmitted. Normally,
- <code class="Nm">netmap</code> notifies transmit completions in batches,
- hence signals can be delayed indefinitely. This flag helps detect when
- packets have been sent and a file descriptor can be closed.</dd>
- <dt>NS_FORWARD</dt>
- <dd>When a ring is in 'transparent' mode, packets marked with this flag by the
- user application are forwarded to the other endpoint at the next system
- call, thus restoring (in a selective way) the connection between a NIC and
- the host stack.</dd>
- <dt>NS_NO_LEARN</dt>
- <dd>tells the forwarding code that the source MAC address for this packet must
- not be used in the learning bridge code.</dd>
- <dt>NS_INDIRECT</dt>
- <dd>indicates that the packet's payload is in a user-supplied buffer whose
- user virtual address is in the 'ptr' field of the slot. The size can reach
- 65535 bytes.
- <p class="Pp">This is only supported on the transmit ring of
- <code class="Nm">VALE</code> ports, and it helps reducing data copies in
- the interconnection of virtual machines.</p>
- </dd>
- <dt>NS_MOREFRAG</dt>
- <dd>indicates that the packet continues with subsequent buffers; the last
- buffer in a packet must have the flag clear.</dd>
-</dl>
-</section>
-<section class="Sh">
-<h1 class="Sh" id="SCATTER_GATHER_I/O"><a class="permalink" href="#SCATTER_GATHER_I/O">SCATTER
- GATHER I/O</a></h1>
-<p class="Pp">Packets can span multiple slots if the
- <var class="Va">NS_MOREFRAG</var> flag is set in all but the last slot. The
- maximum length of a chain is 64 buffers. This is normally used with
- <code class="Nm">VALE</code> ports when connecting virtual machines, as they
- generate large TSO segments that are not split unless they reach a physical
- device.</p>
-<p class="Pp">NOTE: The length field always refers to the individual fragment;
- there is no place with the total length of a packet.</p>
-<p class="Pp">On receive rings the macro <var class="Va">NS_RFRAGS(slot)</var>
- indicates the remaining number of slots for this packet, including the
- current one. Slots with a value greater than 1 also have NS_MOREFRAG
- set.</p>
-</section>
-<section class="Sh">
-<h1 class="Sh" id="IOCTLS"><a class="permalink" href="#IOCTLS">IOCTLS</a></h1>
-<p class="Pp"><code class="Nm">netmap</code> uses two ioctls (NIOCTXSYNC,
- NIOCRXSYNC) for non-blocking I/O. They take no argument. Two more ioctls
- (NIOCGINFO, NIOCREGIF) are used to query and configure ports, with the
- following argument:</p>
-<div class="Bd Pp Li">
-<pre>struct nmreq {
- char nr_name[IFNAMSIZ]; /* (i) port name */
- uint32_t nr_version; /* (i) API version */
- uint32_t nr_offset; /* (o) nifp offset in mmap region */
- uint32_t nr_memsize; /* (o) size of the mmap region */
- uint32_t nr_tx_slots; /* (i/o) slots in tx rings */
- uint32_t nr_rx_slots; /* (i/o) slots in rx rings */
- uint16_t nr_tx_rings; /* (i/o) number of tx rings */
- uint16_t nr_rx_rings; /* (i/o) number of rx rings */
- uint16_t nr_ringid; /* (i/o) ring(s) we care about */
- uint16_t nr_cmd; /* (i) special command */
- uint16_t nr_arg1; /* (i/o) extra arguments */
- uint16_t nr_arg2; /* (i/o) extra arguments */
- uint32_t nr_arg3; /* (i/o) extra arguments */
- uint32_t nr_flags /* (i/o) open mode */
- ...
-};</pre>
-</div>
-<p class="Pp">A file descriptor obtained through
- <span class="Pa">/dev/netmap</span> also supports the ioctl supported by
- network devices, see <a class="Xr">netintro(4)</a>.</p>
-<dl class="Bl-tag">
- <dt id="NIOCGINFO"><a class="permalink" href="#NIOCGINFO"><code class="Dv">NIOCGINFO</code></a></dt>
- <dd>returns EINVAL if the named port does not support netmap. Otherwise, it
- returns 0 and (advisory) information about the port. Note that all the
- information below can change before the interface is actually put in
- netmap mode.
- <dl class="Bl-tag">
- <dt><span class="Pa">nr_memsize</span></dt>
- <dd>indicates the size of the <code class="Nm">netmap</code> memory
- region. NICs in <code class="Nm">netmap</code> mode all share the same
- memory region, whereas <code class="Nm">VALE</code> ports have
- independent regions for each port.</dd>
- <dt><span class="Pa">nr_tx_slots</span>,
- <span class="Pa">nr_rx_slots</span></dt>
- <dd>indicate the size of transmit and receive rings.</dd>
- <dt><span class="Pa">nr_tx_rings</span>,
- <span class="Pa">nr_rx_rings</span></dt>
- <dd>indicate the number of transmit and receive rings. Both ring number
- and sizes may be configured at runtime using interface-specific
- functions (e.g., <a class="Xr">ethtool(8)</a> ).</dd>
- </dl>
- </dd>
- <dt id="NIOCREGIF"><a class="permalink" href="#NIOCREGIF"><code class="Dv">NIOCREGIF</code></a></dt>
- <dd>binds the port named in <var class="Va">nr_name</var> to the file
- descriptor. For a physical device this also switches it into
- <code class="Nm">netmap</code> mode, disconnecting it from the host stack.
- Multiple file descriptors can be bound to the same port, with proper
- synchronization left to the user.
- <p class="Pp">The recommended way to bind a file descriptor to a port is to
- use function <var class="Va">nm_open(..)</var> (see
- <a class="Sx" href="#LIBRARIES">LIBRARIES</a>) which parses names to
- access specific port types and enable features. In the following we
- document the main features.</p>
- <p class="Pp" id="netmap"><code class="Dv">NIOCREGIF can also bind a file
- descriptor to one endpoint of a</code>
- <a class="permalink" href="#netmap"><i class="Em">netmap pipe</i></a>,
- consisting of two netmap ports with a crossover connection. A netmap
- pipe share the same memory space of the parent port, and is meant to
- enable configuration where a master process acts as a dispatcher towards
- slave processes.</p>
- <p class="Pp">To enable this function, the <span class="Pa">nr_arg1</span>
- field of the structure can be used as a hint to the kernel to indicate
- how many pipes we expect to use, and reserve extra space in the memory
- region.</p>
- <p class="Pp">On return, it gives the same info as NIOCGINFO, with
- <span class="Pa">nr_ringid</span> and <span class="Pa">nr_flags</span>
- indicating the identity of the rings controlled through the file
- descriptor.</p>
- <p class="Pp"><var class="Va">nr_flags</var> <var class="Va">nr_ringid</var>
- selects which rings are controlled through this file descriptor.
- Possible values of <span class="Pa">nr_flags</span> are indicated below,
- together with the naming schemes that application libraries (such as the
- <code class="Nm">nm_open</code> indicated below) can use to indicate the
- specific set of rings. In the example below, &quot;netmap:foo&quot; is
- any valid netmap port name.</p>
- <dl class="Bl-tag">
- <dt>NR_REG_ALL_NIC netmap:foo</dt>
- <dd>(default) all hardware ring pairs</dd>
- <dt>NR_REG_SW netmap:foo^</dt>
- <dd>the ``host rings'', connecting to the host stack.</dd>
- <dt>NR_REG_NIC_SW netmap:foo*</dt>
- <dd>all hardware rings and the host rings</dd>
- <dt>NR_REG_ONE_NIC netmap:foo-i</dt>
- <dd>only the i-th hardware ring pair, where the number is in
- <span class="Pa">nr_ringid</span>;</dd>
- <dt>NR_REG_PIPE_MASTER netmap:foo{i</dt>
- <dd>the master side of the netmap pipe whose identifier (i) is in
- <span class="Pa">nr_ringid</span>;</dd>
- <dt>NR_REG_PIPE_SLAVE netmap:foo}i</dt>
- <dd>the slave side of the netmap pipe whose identifier (i) is in
- <span class="Pa">nr_ringid</span>.
- <p class="Pp">The identifier of a pipe must be thought as part of the
- pipe name, and does not need to be sequential. On return the pipe
- will only have a single ring pair with index 0, irrespective of the
- value of <var class="Va">i</var>.</p>
- </dd>
- </dl>
- <p class="Pp">By default, a <a class="Xr">poll(2)</a> or
- <a class="Xr">select(2)</a> call pushes out any pending packets on the
- transmit ring, even if no write events are specified. The feature can be
- disabled by or-ing <var class="Va">NETMAP_NO_TX_POLL</var> to the value
- written to <var class="Va">nr_ringid</var>. When this feature is used,
- packets are transmitted only on <var class="Va">ioctl(NIOCTXSYNC)</var>
- or <var class="Va">select() /</var> <var class="Va">poll()</var> are
- called with a write event (POLLOUT/wfdset) or a full ring.</p>
- <p class="Pp">When registering a virtual interface that is dynamically
- created to a <code class="Nm">VALE</code> switch, we can specify the
- desired number of rings (1 by default, and currently up to 16) on it
- using nr_tx_rings and nr_rx_rings fields.</p>
- </dd>
- <dt id="NIOCTXSYNC"><a class="permalink" href="#NIOCTXSYNC"><code class="Dv">NIOCTXSYNC</code></a></dt>
- <dd>tells the hardware of new packets to transmit, and updates the number of
- slots available for transmission.</dd>
- <dt id="NIOCRXSYNC"><a class="permalink" href="#NIOCRXSYNC"><code class="Dv">NIOCRXSYNC</code></a></dt>
- <dd>tells the hardware of consumed packets, and asks for newly available
- packets.</dd>
-</dl>
-</section>
-<section class="Sh">
-<h1 class="Sh" id="SELECT,_POLL,_EPOLL,_KQUEUE"><a class="permalink" href="#SELECT,_POLL,_EPOLL,_KQUEUE">SELECT,
- POLL, EPOLL, KQUEUE</a></h1>
-<p class="Pp"><a class="Xr">select(2)</a> and <a class="Xr">poll(2)</a> on a
- <code class="Nm">netmap</code> file descriptor process rings as indicated in
- <a class="Sx" href="#TRANSMIT_RINGS">TRANSMIT RINGS</a> and
- <a class="Sx" href="#RECEIVE_RINGS">RECEIVE RINGS</a>, respectively when
- write (POLLOUT) and read (POLLIN) events are requested. Both block if no
- slots are available in the ring (<var class="Va">ring-&gt;cur ==
- ring-&gt;tail</var>). Depending on the platform, <a class="Xr">epoll(7)</a>
- and <a class="Xr">kqueue(2)</a> are supported too.</p>
-<p class="Pp">Packets in transmit rings are normally pushed out (and buffers
- reclaimed) even without requesting write events. Passing the
- <code class="Dv">NETMAP_NO_TX_POLL</code> flag to
- <i class="Em">NIOCREGIF</i> disables this feature. By default, receive rings
- are processed only if read events are requested. Passing the
- <code class="Dv">NETMAP_DO_RX_POLL</code> flag to <i class="Em">NIOCREGIF
- updates receive rings even without read events.</i> Note that on
- <a class="Xr">epoll(7)</a> and <a class="Xr">kqueue(2)</a>,
- <code class="Dv">NETMAP_NO_TX_POLL</code> and
- <code class="Dv">NETMAP_DO_RX_POLL</code> only have an effect when some
- event is posted for the file descriptor.</p>
-</section>
-<section class="Sh">
-<h1 class="Sh" id="LIBRARIES"><a class="permalink" href="#LIBRARIES">LIBRARIES</a></h1>
-<p class="Pp">The <code class="Nm">netmap</code> API is supposed to be used
- directly, both because of its simplicity and for efficient integration with
- applications.</p>
-<p class="Pp">For convenience, the
- <code class="In">&lt;<a class="In">net/netmap_user.h</a>&gt;</code> header
- provides a few macros and functions to ease creating a file descriptor and
- doing I/O with a <code class="Nm">netmap</code> port. These are loosely
- modeled after the <a class="Xr">pcap(3)</a> API, to ease porting of
- libpcap-based applications to <code class="Nm">netmap</code>. To use these
- extra functions, programs should</p>
-<div class="Bd Bd-indent"><code class="Li">#define NETMAP_WITH_LIBS</code></div>
-before
-<div class="Bd Bd-indent"><code class="Li">#include
- &lt;net/netmap_user.h&gt;</code></div>
-<p class="Pp">The following functions are available:</p>
-<dl class="Bl-tag">
- <dt id="struct~4"><var class="Va">struct nm_desc * nm_open(const char *ifname,
- const struct nmreq *req, uint64_t flags, const struct nm_desc
- *arg</var>)</dt>
- <dd>similar to <a class="Xr">pcap_open_live(3)</a>, binds a file descriptor to
- a port.
- <dl class="Bl-tag">
- <dt id="ifname"><var class="Va">ifname</var></dt>
- <dd>is a port name, in the form &quot;netmap:PPP&quot; for a NIC and
- &quot;valeSSS:PPP&quot; for a <code class="Nm">VALE</code> port.</dd>
- <dt id="req"><var class="Va">req</var></dt>
- <dd>provides the initial values for the argument to the NIOCREGIF ioctl.
- The nm_flags and nm_ringid values are overwritten by parsing ifname
- and flags, and other fields can be overridden through the other two
- arguments.</dd>
- <dt id="arg"><var class="Va">arg</var></dt>
- <dd>points to a struct nm_desc containing arguments (e.g., from a
- previously open file descriptor) that should override the defaults.
- The fields are used as described below</dd>
- <dt id="flags"><var class="Va">flags</var></dt>
- <dd>can be set to a combination of the following flags:
- <var class="Va">NETMAP_NO_TX_POLL</var>,
- <var class="Va">NETMAP_DO_RX_POLL</var> (copied into nr_ringid);
- <var class="Va">NM_OPEN_NO_MMAP</var> (if arg points to the same
- memory region, avoids the mmap and uses the values from it);
- <var class="Va">NM_OPEN_IFNAME</var> (ignores ifname and uses the
- values in arg); <var class="Va">NM_OPEN_ARG1</var>,
- <var class="Va">NM_OPEN_ARG2</var>, <var class="Va">NM_OPEN_ARG3</var>
- (uses the fields from arg); <var class="Va">NM_OPEN_RING_CFG</var>
- (uses the ring number and sizes from arg).</dd>
- </dl>
- </dd>
- <dt id="int"><var class="Va">int nm_close(struct nm_desc *d</var>)</dt>
- <dd>closes the file descriptor, unmaps memory, frees resources.</dd>
- <dt id="int~2"><var class="Va">int nm_inject(struct nm_desc *d, const void
- *buf, size_t size</var>)</dt>
- <dd>similar to <var class="Va">pcap_inject()</var>, pushes a packet to a ring,
- returns the size of the packet is successful, or 0 on error;</dd>
- <dt id="int~3"><var class="Va">int nm_dispatch(struct nm_desc *d, int cnt,
- nm_cb_t cb, u_char *arg</var>)</dt>
- <dd>similar to <var class="Va">pcap_dispatch()</var>, applies a callback to
- incoming packets</dd>
- <dt id="u_char"><var class="Va">u_char * nm_nextpkt(struct nm_desc *d, struct
- nm_pkthdr *hdr</var>)</dt>
- <dd>similar to <var class="Va">pcap_next()</var>, fetches the next packet</dd>
-</dl>
-</section>
-<section class="Sh">
-<h1 class="Sh" id="SUPPORTED_DEVICES"><a class="permalink" href="#SUPPORTED_DEVICES">SUPPORTED
- DEVICES</a></h1>
-<p class="Pp"><code class="Nm">netmap</code> natively supports the following
- devices:</p>
-<p class="Pp">On <span class="Ux">FreeBSD</span>: <a class="Xr">cxgbe(4)</a>,
- <a class="Xr">em(4)</a>, <a class="Xr">iflib(4)</a> (providing
- <a class="Xr">igb(4)</a> and <a class="Xr">em(4)</a>),
- <a class="Xr">ix(4)</a>, <a class="Xr">ixl(4)</a>, <a class="Xr">re(4)</a>,
- <a class="Xr">vtnet(4)</a>.</p>
-<p class="Pp">On Linux e1000, e1000e, i40e, igb, ixgbe, ixgbevf, r8169,
- virtio_net, vmxnet3.</p>
-<p class="Pp">NICs without native support can still be used in
- <code class="Nm">netmap</code> mode through emulation. Performance is
- inferior to native netmap mode but still significantly higher than various
- raw socket types (bpf, PF_PACKET, etc.). Note that for slow devices (such as
- 1 Gbit/s and slower NICs, or several 10 Gbit/s NICs whose hardware is unable
- to sustain line rate), emulated and native mode will likely have similar or
- same throughput.</p>
-<p class="Pp">When emulation is in use, packet sniffer programs such as tcpdump
- could see received packets before they are diverted by netmap. This
- behaviour is not intentional, being just an artifact of the implementation
- of emulation. Note that in case the netmap application subsequently moves
- packets received from the emulated adapter onto the host RX ring, the
- sniffer will intercept those packets again, since the packets are injected
- to the host stack as they were received by the network interface.</p>
-<p class="Pp">Emulation is also available for devices with native netmap
- support, which can be used for testing or performance comparison. The sysctl
- variable <var class="Va">dev.netmap.admode</var> globally controls how
- netmap mode is implemented.</p>
-</section>
-<section class="Sh">
-<h1 class="Sh" id="SYSCTL_VARIABLES_AND_MODULE_PARAMETERS"><a class="permalink" href="#SYSCTL_VARIABLES_AND_MODULE_PARAMETERS">SYSCTL
- VARIABLES AND MODULE PARAMETERS</a></h1>
-<p class="Pp">Some aspects of the operation of <code class="Nm">netmap</code>
- and <code class="Nm">VALE</code> are controlled through sysctl variables on
- <span class="Ux">FreeBSD</span>
- (<a class="permalink" href="#dev.netmap.*"><i class="Em" id="dev.netmap.*">dev.netmap.*</i></a>)
- and module parameters on Linux
- (<a class="permalink" href="#/sys/module/netmap/parameters/*"><i class="Em" id="/sys/module/netmap/parameters/*">/sys/module/netmap/parameters/*</i></a>):</p>
-<dl class="Bl-tag">
- <dt id="dev.netmap.admode:"><var class="Va">dev.netmap.admode: 0</var></dt>
- <dd>Controls the use of native or emulated adapter mode.
- <p class="Pp">0 uses the best available option;</p>
- <p class="Pp">1 forces native mode and fails if not available;</p>
- <p class="Pp">2 forces emulated hence never fails.</p>
- </dd>
- <dt id="dev.netmap.generic_rings:"><var class="Va">dev.netmap.generic_rings:
- 1</var></dt>
- <dd>Number of rings used for emulated netmap mode</dd>
- <dt id="dev.netmap.generic_ringsize:"><var class="Va">dev.netmap.generic_ringsize:
- 1024</var></dt>
- <dd>Ring size used for emulated netmap mode</dd>
- <dt id="dev.netmap.generic_mit:"><var class="Va">dev.netmap.generic_mit:
- 100000</var></dt>
- <dd>Controls interrupt moderation for emulated mode</dd>
- <dt id="dev.netmap.fwd:"><var class="Va">dev.netmap.fwd: 0</var></dt>
- <dd>Forces NS_FORWARD mode</dd>
- <dt id="dev.netmap.txsync_retry:"><var class="Va">dev.netmap.txsync_retry:
- 2</var></dt>
- <dd>Number of txsync loops in the <code class="Nm">VALE</code> flush
- function</dd>
- <dt id="dev.netmap.no_pendintr:"><var class="Va">dev.netmap.no_pendintr:
- 1</var></dt>
- <dd>Forces recovery of transmit buffers on system calls</dd>
- <dt id="dev.netmap.no_timestamp:"><var class="Va">dev.netmap.no_timestamp:
- 0</var></dt>
- <dd>Disables the update of the timestamp in the netmap ring</dd>
- <dt id="dev.netmap.verbose:"><var class="Va">dev.netmap.verbose: 0</var></dt>
- <dd>Verbose kernel messages</dd>
- <dt id="dev.netmap.buf_num:"><var class="Va">dev.netmap.buf_num:
- 163840</var></dt>
- <dd style="width: auto;">&#x00A0;</dd>
- <dt id="dev.netmap.buf_size:"><var class="Va">dev.netmap.buf_size:
- 2048</var></dt>
- <dd style="width: auto;">&#x00A0;</dd>
- <dt id="dev.netmap.ring_num:"><var class="Va">dev.netmap.ring_num:
- 200</var></dt>
- <dd style="width: auto;">&#x00A0;</dd>
- <dt id="dev.netmap.ring_size:"><var class="Va">dev.netmap.ring_size:
- 36864</var></dt>
- <dd style="width: auto;">&#x00A0;</dd>
- <dt id="dev.netmap.if_num:"><var class="Va">dev.netmap.if_num: 100</var></dt>
- <dd style="width: auto;">&#x00A0;</dd>
- <dt id="dev.netmap.if_size:"><var class="Va">dev.netmap.if_size:
- 1024</var></dt>
- <dd>Sizes and number of objects (netmap_if, netmap_ring, buffers) for the
- global memory region. The only parameter worth modifying is
- <var class="Va">dev.netmap.buf_num</var> as it impacts the total amount of
- memory used by netmap.</dd>
- <dt id="dev.netmap.buf_curr_num:"><var class="Va">dev.netmap.buf_curr_num:
- 0</var></dt>
- <dd style="width: auto;">&#x00A0;</dd>
- <dt id="dev.netmap.buf_curr_size:"><var class="Va">dev.netmap.buf_curr_size:
- 0</var></dt>
- <dd style="width: auto;">&#x00A0;</dd>
- <dt id="dev.netmap.ring_curr_num:"><var class="Va">dev.netmap.ring_curr_num:
- 0</var></dt>
- <dd style="width: auto;">&#x00A0;</dd>
- <dt id="dev.netmap.ring_curr_size:"><var class="Va">dev.netmap.ring_curr_size:
- 0</var></dt>
- <dd style="width: auto;">&#x00A0;</dd>
- <dt id="dev.netmap.if_curr_num:"><var class="Va">dev.netmap.if_curr_num:
- 0</var></dt>
- <dd style="width: auto;">&#x00A0;</dd>
- <dt id="dev.netmap.if_curr_size:"><var class="Va">dev.netmap.if_curr_size:
- 0</var></dt>
- <dd>Actual values in use.</dd>
- <dt id="dev.netmap.priv_buf_num:"><var class="Va">dev.netmap.priv_buf_num:
- 4098</var></dt>
- <dd style="width: auto;">&#x00A0;</dd>
- <dt id="dev.netmap.priv_buf_size:"><var class="Va">dev.netmap.priv_buf_size:
- 2048</var></dt>
- <dd style="width: auto;">&#x00A0;</dd>
- <dt id="dev.netmap.priv_ring_num:"><var class="Va">dev.netmap.priv_ring_num:
- 4</var></dt>
- <dd style="width: auto;">&#x00A0;</dd>
- <dt id="dev.netmap.priv_ring_size:"><var class="Va">dev.netmap.priv_ring_size:
- 20480</var></dt>
- <dd style="width: auto;">&#x00A0;</dd>
- <dt id="dev.netmap.priv_if_num:"><var class="Va">dev.netmap.priv_if_num:
- 2</var></dt>
- <dd style="width: auto;">&#x00A0;</dd>
- <dt id="dev.netmap.priv_if_size:"><var class="Va">dev.netmap.priv_if_size:
- 1024</var></dt>
- <dd>Sizes and number of objects (netmap_if, netmap_ring, buffers) for private
- memory regions. A separate memory region is used for each
- <code class="Nm">VALE</code> port and each pair of <code class="Nm">netmap
- pipes</code>.</dd>
- <dt id="dev.netmap.bridge_batch:"><var class="Va">dev.netmap.bridge_batch:
- 1024</var></dt>
- <dd>Batch size used when moving packets across a <code class="Nm">VALE</code>
- switch. Values above 64 generally guarantee good performance.</dd>
- <dt id="dev.netmap.max_bridges:"><var class="Va">dev.netmap.max_bridges:
- 8</var></dt>
- <dd>Max number of <code class="Nm">VALE</code> switches that can be created.
- This tunable can be specified at loader time.</dd>
- <dt id="dev.netmap.ptnet_vnet_hdr:"><var class="Va">dev.netmap.ptnet_vnet_hdr:
- 1</var></dt>
- <dd>Allow ptnet devices to use virtio-net headers</dd>
- <dt id="dev.netmap.port_numa_affinity:"><var class="Va">dev.netmap.port_numa_affinity:
- 0</var></dt>
- <dd>On <a class="Xr">numa(4)</a> systems, allocate memory for netmap ports
- from the local NUMA domain when possible. This can improve performance by
- reducing the number of remote memory accesses. However, when forwarding
- packets between ports attached to different NUMA domains, this will
- prevent zero-copy forwarding optimizations and thus may hurt performance.
- Note that this setting must be specified as a loader tunable at boot
- time.</dd>
-</dl>
-</section>
-<section class="Sh">
-<h1 class="Sh" id="SYSTEM_CALLS"><a class="permalink" href="#SYSTEM_CALLS">SYSTEM
- CALLS</a></h1>
-<p class="Pp"><code class="Nm">netmap</code> uses <a class="Xr">select(2)</a>,
- <a class="Xr">poll(2)</a>, <a class="Xr">epoll(7)</a> and
- <a class="Xr">kqueue(2)</a> to wake up processes when significant events
- occur, and <a class="Xr">mmap(2)</a> to map memory.
- <a class="Xr">ioctl(2)</a> is used to configure ports and
- <code class="Nm">VALE switches</code>.</p>
-<p class="Pp">Applications may need to create threads and bind them to specific
- cores to improve performance, using standard OS primitives, see
- <a class="Xr">pthread(3)</a>. In particular,
- <a class="Xr">pthread_setaffinity_np(3)</a> may be of use.</p>
-</section>
-<section class="Sh">
-<h1 class="Sh" id="EXAMPLES"><a class="permalink" href="#EXAMPLES">EXAMPLES</a></h1>
-<section class="Ss">
-<h2 class="Ss" id="TEST_PROGRAMS"><a class="permalink" href="#TEST_PROGRAMS">TEST
- PROGRAMS</a></h2>
-<p class="Pp"><code class="Nm">netmap</code> comes with a few programs that can
- be used for testing or simple applications. See the
- <span class="Pa">examples/</span> directory in
- <code class="Nm">netmap</code> distributions, or
- <span class="Pa">tools/tools/netmap/</span> directory in
- <span class="Ux">FreeBSD</span> distributions.</p>
-<p class="Pp"><a class="Xr">pkt-gen(8)</a> is a general purpose traffic
- source/sink.</p>
-<p class="Pp">As an example</p>
-<div class="Bd Bd-indent"><code class="Li">pkt-gen -i ix0 -f tx -l
- 60</code></div>
-can generate an infinite stream of minimum size packets, and
-<div class="Bd Bd-indent"><code class="Li">pkt-gen -i ix0 -f rx</code></div>
-is a traffic sink. Both print traffic statistics, to help monitor how the system
- performs.
-<p class="Pp"><a class="Xr">pkt-gen(8)</a> has many options can be uses to set
- packet sizes, addresses, rates, and use multiple send/receive threads and
- cores.</p>
-<p class="Pp"><a class="Xr">bridge(4)</a> is another test program which
- interconnects two <code class="Nm">netmap</code> ports. It can be used for
- transparent forwarding between interfaces, as in</p>
-<div class="Bd Bd-indent"><code class="Li">bridge -i netmap:ix0 -i
- netmap:ix1</code></div>
-or even connect the NIC to the host stack using netmap
-<div class="Bd Bd-indent"><code class="Li">bridge -i netmap:ix0</code></div>
-</section>
-<section class="Ss">
-<h2 class="Ss" id="USING_THE_NATIVE_API"><a class="permalink" href="#USING_THE_NATIVE_API">USING
- THE NATIVE API</a></h2>
-<p class="Pp">The following code implements a traffic generator:</p>
-<p class="Pp"></p>
-<div class="Bd Li">
-<pre>#include &lt;net/netmap_user.h&gt;
-...
-void sender(void)
-{
- struct netmap_if *nifp;
- struct netmap_ring *ring;
- struct nmreq nmr;
- struct pollfd fds;
-
- fd = open(&quot;/dev/netmap&quot;, O_RDWR);
- bzero(&amp;nmr, sizeof(nmr));
- strcpy(nmr.nr_name, &quot;ix0&quot;);
- nmr.nm_version = NETMAP_API;
- ioctl(fd, NIOCREGIF, &amp;nmr);
- p = mmap(0, nmr.nr_memsize, fd);
- nifp = NETMAP_IF(p, nmr.nr_offset);
- ring = NETMAP_TXRING(nifp, 0);
- fds.fd = fd;
- fds.events = POLLOUT;
- for (;;) {
- poll(&amp;fds, 1, -1);
- while (!nm_ring_empty(ring)) {
- i = ring-&gt;cur;
- buf = NETMAP_BUF(ring, ring-&gt;slot[i].buf_index);
- ... prepare packet in buf ...
- ring-&gt;slot[i].len = ... packet length ...
- ring-&gt;head = ring-&gt;cur = nm_ring_next(ring, i);
- }
- }
-}</pre>
-</div>
-</section>
-<section class="Ss">
-<h2 class="Ss" id="HELPER_FUNCTIONS"><a class="permalink" href="#HELPER_FUNCTIONS">HELPER
- FUNCTIONS</a></h2>
-<p class="Pp">A simple receiver can be implemented using the helper
- functions:</p>
-<p class="Pp"></p>
-<div class="Bd Li">
-<pre>#define NETMAP_WITH_LIBS
-#include &lt;net/netmap_user.h&gt;
-...
-void receiver(void)
-{
- struct nm_desc *d;
- struct pollfd fds;
- u_char *buf;
- struct nm_pkthdr h;
- ...
- d = nm_open(&quot;netmap:ix0&quot;, NULL, 0, 0);
- fds.fd = NETMAP_FD(d);
- fds.events = POLLIN;
- for (;;) {
- poll(&amp;fds, 1, -1);
- while ( (buf = nm_nextpkt(d, &amp;h)) )
- consume_pkt(buf, h.len);
- }
- nm_close(d);
-}</pre>
-</div>
-</section>
-<section class="Ss">
-<h2 class="Ss" id="ZERO-COPY_FORWARDING"><a class="permalink" href="#ZERO-COPY_FORWARDING">ZERO-COPY
- FORWARDING</a></h2>
-<p class="Pp">Since physical interfaces share the same memory region, it is
- possible to do packet forwarding between ports swapping buffers. The buffer
- from the transmit ring is used to replenish the receive ring:</p>
-<p class="Pp"></p>
-<div class="Bd Li">
-<pre> uint32_t tmp;
- struct netmap_slot *src, *dst;
- ...
- src = &amp;src_ring-&gt;slot[rxr-&gt;cur];
- dst = &amp;dst_ring-&gt;slot[txr-&gt;cur];
- tmp = dst-&gt;buf_idx;
- dst-&gt;buf_idx = src-&gt;buf_idx;
- dst-&gt;len = src-&gt;len;
- dst-&gt;flags = NS_BUF_CHANGED;
- src-&gt;buf_idx = tmp;
- src-&gt;flags = NS_BUF_CHANGED;
- rxr-&gt;head = rxr-&gt;cur = nm_ring_next(rxr, rxr-&gt;cur);
- txr-&gt;head = txr-&gt;cur = nm_ring_next(txr, txr-&gt;cur);
- ...</pre>
-</div>
-</section>
-<section class="Ss">
-<h2 class="Ss" id="ACCESSING_THE_HOST_STACK"><a class="permalink" href="#ACCESSING_THE_HOST_STACK">ACCESSING
- THE HOST STACK</a></h2>
-<p class="Pp">The host stack is for all practical purposes just a regular ring
- pair, which you can access with the netmap API (e.g., with</p>
-<div class="Bd Bd-indent"><code class="Li">nm_open(&quot;netmap:eth0^&quot;,
- ...</code></div>
-); All packets that the host would send to an interface in
- <code class="Nm">netmap</code> mode end up into the RX ring, whereas all
- packets queued to the TX ring are send up to the host stack.
-</section>
-<section class="Ss">
-<h2 class="Ss" id="VALE_SWITCH"><a class="permalink" href="#VALE_SWITCH">VALE
- SWITCH</a></h2>
-<p class="Pp">A simple way to test the performance of a
- <code class="Nm">VALE</code> switch is to attach a sender and a receiver to
- it, e.g., running the following in two different terminals:</p>
-<div class="Bd Bd-indent"><code class="Li">pkt-gen -i vale1:a -f rx #
- receiver</code></div>
-<div class="Bd Bd-indent"><code class="Li">pkt-gen -i vale1:b -f tx #
- sender</code></div>
-The same example can be used to test netmap pipes, by simply changing port
- names, e.g.,
-<div class="Bd Bd-indent"><code class="Li">pkt-gen -i vale2:x{3 -f rx # receiver
- on the master side</code></div>
-<div class="Bd Bd-indent"><code class="Li">pkt-gen -i vale2:x}3 -f tx # sender
- on the slave side</code></div>
-<p class="Pp">The following command attaches an interface and the host stack to
- a switch:</p>
-<div class="Bd Bd-indent"><code class="Li">valectl -h vale2:em0</code></div>
-Other <code class="Nm">netmap</code> clients attached to the same switch can now
- communicate with the network card or the host.
-</section>
-</section>
-<section class="Sh">
-<h1 class="Sh" id="SEE_ALSO"><a class="permalink" href="#SEE_ALSO">SEE
- ALSO</a></h1>
-<p class="Pp"><a class="Xr">vale(4)</a>, <a class="Xr">bridge(8)</a>,
- <a class="Xr">lb(8)</a>, <a class="Xr">nmreplay(8)</a>,
- <a class="Xr">pkt-gen(8)</a>, <a class="Xr">valectl(8)</a></p>
-<p class="Pp"><span class="Pa">http://info.iet.unipi.it/~luigi/netmap/</span></p>
-<p class="Pp">Luigi Rizzo, Revisiting network I/O APIs: the netmap framework,
- Communications of the ACM, 55 (3), pp.45-51, March 2012</p>
-<p class="Pp">Luigi Rizzo, netmap: a novel framework for fast packet I/O, Usenix
- ATC'12, June 2012, Boston</p>
-<p class="Pp">Luigi Rizzo, Giuseppe Lettieri, VALE, a switched ethernet for
- virtual machines, ACM CoNEXT'12, December 2012, Nice</p>
-<p class="Pp">Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione, Speeding up
- packet I/O in virtual machines, ACM/IEEE ANCS'13, October 2013, San Jose</p>
-</section>
-<section class="Sh">
-<h1 class="Sh" id="AUTHORS"><a class="permalink" href="#AUTHORS">AUTHORS</a></h1>
-<p class="Pp">The <code class="Nm">netmap</code> framework has been originally
- designed and implemented at the Universita` di Pisa in 2011 by
- <span class="An">Luigi Rizzo</span>, and further extended with help from
- <span class="An">Matteo Landi</span>, <span class="An">Gaetano
- Catalli</span>, <span class="An">Giuseppe Lettieri</span>, and
- <span class="An">Vincenzo Maffione</span>.</p>
-<p class="Pp"><code class="Nm">netmap</code> and <code class="Nm">VALE</code>
- have been funded by the European Commission within FP7 Projects CHANGE
- (257422) and OPENLAB (287581).</p>
-</section>
-<section class="Sh">
-<h1 class="Sh" id="CAVEATS"><a class="permalink" href="#CAVEATS">CAVEATS</a></h1>
-<p class="Pp">No matter how fast the CPU and OS are, achieving line rate on 10G
- and faster interfaces requires hardware with sufficient performance. Several
- NICs are unable to sustain line rate with small packet sizes. Insufficient
- PCIe or memory bandwidth can also cause reduced performance.</p>
-<p class="Pp">Another frequent reason for low performance is the use of flow
- control on the link: a slow receiver can limit the transmit speed. Be sure
- to disable flow control when running high speed experiments.</p>
-<section class="Ss">
-<h2 class="Ss" id="SPECIAL_NIC_FEATURES"><a class="permalink" href="#SPECIAL_NIC_FEATURES">SPECIAL
- NIC FEATURES</a></h2>
-<p class="Pp"><code class="Nm">netmap</code> is orthogonal to some NIC features
- such as multiqueue, schedulers, packet filters.</p>
-<p class="Pp">Multiple transmit and receive rings are supported natively and can
- be configured with ordinary OS tools, such as <a class="Xr">ethtool(8)</a>
- or device-specific sysctl variables. The same goes for Receive Packet
- Steering (RPS) and filtering of incoming traffic.</p>
-<p class="Pp" id="does"><code class="Nm">netmap</code>
- <a class="permalink" href="#does"><i class="Em">does not use</i></a>
- features such as
- <a class="permalink" href="#checksum"><i class="Em" id="checksum">checksum
- offloading</i></a>,
- <a class="permalink" href="#TCP"><i class="Em" id="TCP">TCP segmentation
- offloading</i></a>,
- <a class="permalink" href="#encryption"><i class="Em" id="encryption">encryption</i></a>,
- <a class="permalink" href="#VLAN"><i class="Em" id="VLAN">VLAN
- encapsulation/decapsulation</i></a>, etc. When using netmap to exchange
- packets with the host stack, make sure to disable these features.</p>
-</section>
-</section>
-</div>
-<table class="foot">
- <tr>
- <td class="foot-date">October 10, 2024</td>
- <td class="foot-os">FreeBSD 15.0</td>
- </tr>
-</table>