diff options
Diffstat (limited to 'static/freebsd/man4/multicast.4 3.html')
| -rw-r--r-- | static/freebsd/man4/multicast.4 3.html | 750 |
1 files changed, 750 insertions, 0 deletions
diff --git a/static/freebsd/man4/multicast.4 3.html b/static/freebsd/man4/multicast.4 3.html new file mode 100644 index 00000000..b47b6143 --- /dev/null +++ b/static/freebsd/man4/multicast.4 3.html @@ -0,0 +1,750 @@ +<table class="head"> + <tr> + <td class="head-ltitle">MULTICAST(4)</td> + <td class="head-vol">Device Drivers Manual</td> + <td class="head-rtitle">MULTICAST(4)</td> + </tr> +</table> +<div class="manual-text"> +<section class="Sh"> +<h1 class="Sh" id="NAME"><a class="permalink" href="#NAME">NAME</a></h1> +<p class="Pp"><code class="Nm">multicast</code> — + <span class="Nd">Multicast Routing</span></p> +</section> +<section class="Sh"> +<h1 class="Sh" id="SYNOPSIS"><a class="permalink" href="#SYNOPSIS">SYNOPSIS</a></h1> +<p class="Pp"><code class="Cd">options MROUTING</code></p> +<p class="Pp"> + <br/> + <code class="In">#include <<a class="In">sys/types.h</a>></code> + <br/> + <code class="In">#include <<a class="In">sys/socket.h</a>></code> + <br/> + <code class="In">#include <<a class="In">netinet/in.h</a>></code> + <br/> + <code class="In">#include <<a class="In">netinet/ip_mroute.h</a>></code> + <br/> + <code class="In">#include + <<a class="In">netinet6/ip6_mroute.h</a>></code></p> +<p class="Pp"><var class="Ft">int</var> + <br/> + <code class="Fn">getsockopt</code>(<var class="Fa" style="white-space: nowrap;">int + s</var>, <var class="Fa" style="white-space: nowrap;">IPPROTO_IP</var>, + <var class="Fa" style="white-space: nowrap;">MRT_INIT</var>, + <var class="Fa" style="white-space: nowrap;">void *optval</var>, + <var class="Fa" style="white-space: nowrap;">socklen_t *optlen</var>);</p> +<p class="Pp"><var class="Ft">int</var> + <br/> + <code class="Fn">setsockopt</code>(<var class="Fa" style="white-space: nowrap;">int + s</var>, <var class="Fa" style="white-space: nowrap;">IPPROTO_IP</var>, + <var class="Fa" style="white-space: nowrap;">MRT_INIT</var>, + <var class="Fa" style="white-space: nowrap;">const void *optval</var>, + <var class="Fa" style="white-space: nowrap;">socklen_t optlen</var>);</p> +<p class="Pp"><var class="Ft">int</var> + <br/> + <code class="Fn">getsockopt</code>(<var class="Fa" style="white-space: nowrap;">int + s</var>, <var class="Fa" style="white-space: nowrap;">IPPROTO_IPV6</var>, + <var class="Fa" style="white-space: nowrap;">MRT6_INIT</var>, + <var class="Fa" style="white-space: nowrap;">void *optval</var>, + <var class="Fa" style="white-space: nowrap;">socklen_t *optlen</var>);</p> +<p class="Pp"><var class="Ft">int</var> + <br/> + <code class="Fn">setsockopt</code>(<var class="Fa" style="white-space: nowrap;">int + s</var>, <var class="Fa" style="white-space: nowrap;">IPPROTO_IPV6</var>, + <var class="Fa" style="white-space: nowrap;">MRT6_INIT</var>, + <var class="Fa" style="white-space: nowrap;">const void *optval</var>, + <var class="Fa" style="white-space: nowrap;">socklen_t optlen</var>);</p> +</section> +<section class="Sh"> +<h1 class="Sh" id="DESCRIPTION"><a class="permalink" href="#DESCRIPTION">DESCRIPTION</a></h1> +<p class="Pp">Multicast routing is used to efficiently propagate data packets to + a set of multicast listeners in multipoint networks. If unicast is used to + replicate the data to all listeners, then some of the network links may + carry multiple copies of the same data packets. With multicast routing, the + overhead is reduced to one copy (at most) per network link.</p> +<p class="Pp">All multicast-capable routers must run a common multicast routing + protocol. It is recommended that either Protocol Independent Multicast - + Sparse Mode (PIM-SM), or Protocol Independent Multicast - Dense Mode + (PIM-DM) are used, as these are now the generally accepted protocols in the + Internet community. The <a class="Sx" href="#HISTORY">HISTORY</a> section + discusses previous multicast routing protocols.</p> +<p class="Pp">To start multicast routing, the user must enable multicast + forwarding in the kernel (see <a class="Sx" href="#SYNOPSIS">SYNOPSIS</a> + about the kernel configuration options), and must run a multicast routing + capable user-level process. From developer's point of view, the programming + guide described in the <a class="Sx" href="#Programming_Guide">Programming + Guide</a> section should be used to control the multicast forwarding in the + kernel.</p> +<section class="Ss"> +<h2 class="Ss" id="Programming_Guide"><a class="permalink" href="#Programming_Guide">Programming + Guide</a></h2> +<p class="Pp">This section provides information about the basic multicast + routing API. The so-called “advanced multicast API” is + described in the + <a class="Sx" href="#Advanced_Multicast_API_Programming_Guide">Advanced + Multicast API Programming Guide</a> section.</p> +<p class="Pp">First, a multicast routing socket must be open. That socket would + be used to control the multicast forwarding in the kernel. Note that most + operations below require certain privilege (i.e., root privilege):</p> +<div class="Bd Pp Li"> +<pre>/* IPv4 */ +int mrouter_s4; +mrouter_s4 = socket(AF_INET, SOCK_RAW, IPPROTO_IGMP);</pre> +</div> +<div class="Bd Pp Li"> +<pre>int mrouter_s6; +mrouter_s6 = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6);</pre> +</div> +<p class="Pp">Note that if the router needs to open an IGMP or ICMPv6 socket (in + case of IPv4 and IPv6 respectively) for sending or receiving of IGMP or MLD + multicast group membership messages, then the same + <var class="Va">mrouter_s4</var> or <var class="Va">mrouter_s6</var> sockets + should be used for sending and receiving respectively IGMP or MLD messages. + In case of <span class="Ux">BSD</span>-derived kernel, it may be possible to + open separate sockets for IGMP or MLD messages only. However, some other + kernels (e.g., Linux) require that the multicast routing socket must be used + for sending and receiving of IGMP or MLD messages. Therefore, for + portability reason the multicast routing socket should be reused for IGMP + and MLD messages as well.</p> +<p class="Pp">After the multicast routing socket is open, it can be used to + enable multicast forwarding in the kernel:</p> +<div class="Bd Pp Li"> +<pre>/* IPv4 */ +int v = 1; +setsockopt(mrouter_s4, IPPROTO_IP, MRT_INIT, (void *)&v, sizeof(v));</pre> +</div> +<div class="Bd Pp Li"> +<pre>/* IPv6 */ +int v = 1; +setsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_INIT, (void *)&v, sizeof(v)); +... +/* If necessary, filter all ICMPv6 messages */ +struct icmp6_filter filter; +ICMP6_FILTER_SETBLOCKALL(&filter); +setsockopt(mrouter_s6, IPPROTO_ICMPV6, ICMP6_FILTER, (void *)&filter, + sizeof(filter));</pre> +</div> +<p class="Pp">When applied to the multicast routing socket, the + <code class="Dv">MRT_DONE</code> and <code class="Dv">MRT6_DONE</code> + socket options disable multicast forwarding in the kernel:</p> +<div class="Bd Pp Li"> +<pre>/* IPv4 */ +int v = 1; +setsockopt(mrouter_s4, IPPROTO_IP, MRT_DONE, (void *)&v, sizeof(v));</pre> +</div> +<div class="Bd Pp Li"> +<pre>/* IPv6 */ +int v = 1; +setsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_DONE, (void *)&v, sizeof(v));</pre> +</div> +<p class="Pp">Closing the socket has the same effect.</p> +<p class="Pp">After multicast forwarding is enabled, the multicast routing + socket can be used to enable PIM processing in the kernel if we are running + PIM-SM or PIM-DM (see <a class="Xr">pim(4)</a>).</p> +<p class="Pp">For each network interface (e.g., physical or a virtual tunnel) + that would be used for multicast forwarding, a corresponding multicast + interface must be added to the kernel:</p> +<div class="Bd Pp Li"> +<pre>/* IPv4 */ +struct vifctl vc; +memset(&vc, 0, sizeof(vc)); +/* Assign all vifctl fields as appropriate */ +vc.vifc_vifi = vif_index; +vc.vifc_flags = vif_flags; +vc.vifc_threshold = min_ttl_threshold; +vc.vifc_rate_limit = 0; +memcpy(&vc.vifc_lcl_addr, &vif_local_address, sizeof(vc.vifc_lcl_addr)); +setsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_VIF, (void *)&vc, + sizeof(vc));</pre> +</div> +<p class="Pp">The <var class="Va">vif_index</var> must be unique per vif. The + <var class="Va">vif_flags</var> contains the <code class="Dv">VIFF_*</code> + flags as defined in + <code class="In"><<a class="In">netinet/ip_mroute.h</a>></code>. The + <code class="Dv">VIFF_TUNNEL</code> flag is no longer supported by + <span class="Ux">FreeBSD</span>. Users who wish to forward multicast + datagrams over a tunnel should consider configuring a + <a class="Xr">gif(4)</a> or <a class="Xr">gre(4)</a> tunnel and using it as + a physical interface.</p> +<p class="Pp">The <var class="Va">min_ttl_threshold</var> contains the minimum + TTL a multicast data packet must have to be forwarded on that vif. + Typically, it would have value of 1.</p> +<p class="Pp">The <var class="Va">max_rate_limit</var> argument is no longer + supported in <span class="Ux">FreeBSD</span> and should be set to 0. Users + who wish to rate-limit multicast datagrams should consider the use of + <a class="Xr">dummynet(4)</a> or <a class="Xr">altq(4)</a>.</p> +<p class="Pp">The <var class="Va">vif_local_address</var> contains the local IP + address of the corresponding local interface. The + <var class="Va">vif_remote_address</var> contains the remote IP address in + case of DVMRP multicast tunnels.</p> +<div class="Bd Pp Li"> +<pre>/* IPv6 */ +struct mif6ctl mc; +memset(&mc, 0, sizeof(mc)); +/* Assign all mif6ctl fields as appropriate */ +mc.mif6c_mifi = mif_index; +mc.mif6c_flags = mif_flags; +mc.mif6c_pifi = pif_index; +setsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_ADD_MIF, (void *)&mc, + sizeof(mc));</pre> +</div> +<p class="Pp">The <var class="Va">mif_index</var> must be unique per vif. The + <var class="Va">mif_flags</var> contains the <code class="Dv">MIFF_*</code> + flags as defined in + <code class="In"><<a class="In">netinet6/ip6_mroute.h</a>></code>. The + <var class="Va">pif_index</var> is the physical interface index of the + corresponding local interface.</p> +<p class="Pp">A multicast interface is deleted by:</p> +<div class="Bd Pp Li"> +<pre>/* IPv4 */ +vifi_t vifi = vif_index; +setsockopt(mrouter_s4, IPPROTO_IP, MRT_DEL_VIF, (void *)&vifi, + sizeof(vifi));</pre> +</div> +<div class="Bd Pp Li"> +<pre>/* IPv6 */ +mifi_t mifi = mif_index; +setsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_DEL_MIF, (void *)&mifi, + sizeof(mifi));</pre> +</div> +<p class="Pp">After the multicast forwarding is enabled, and the multicast + virtual interfaces are added, the kernel may deliver upcall messages (also + called signals later in this text) on the multicast routing socket that was + open earlier with <code class="Dv">MRT_INIT</code> or + <code class="Dv">MRT6_INIT</code>. The IPv4 upcalls have + <var class="Vt">struct igmpmsg</var> header (see + <code class="In"><<a class="In">netinet/ip_mroute.h</a>></code>) with + field <var class="Va">im_mbz</var> set to zero. Note that this header + follows the structure of <var class="Vt">struct ip</var> with the protocol + field <var class="Va">ip_p</var> set to zero. The IPv6 upcalls have + <var class="Vt">struct mrt6msg</var> header (see + <code class="In"><<a class="In">netinet6/ip6_mroute.h</a>></code>) + with field <var class="Va">im6_mbz</var> set to zero. Note that this header + follows the structure of <var class="Vt">struct ip6_hdr</var> with the next + header field <var class="Va">ip6_nxt</var> set to zero.</p> +<p class="Pp">The upcall header contains field <var class="Va">im_msgtype</var> + and <var class="Va">im6_msgtype</var> with the type of the upcall + <code class="Dv">IGMPMSG_*</code> and <code class="Dv">MRT6MSG_*</code> for + IPv4 and IPv6 respectively. The values of the rest of the upcall header + fields and the body of the upcall message depend on the particular upcall + type.</p> +<p class="Pp">If the upcall message type is + <code class="Dv">IGMPMSG_NOCACHE</code> or + <code class="Dv">MRT6MSG_NOCACHE</code>, this is an indication that a + multicast packet has reached the multicast router, but the router has no + forwarding state for that packet. Typically, the upcall would be a signal + for the multicast routing user-level process to install the appropriate + Multicast Forwarding Cache (MFC) entry in the kernel.</p> +<p class="Pp">An MFC entry is added by:</p> +<div class="Bd Pp Li"> +<pre>/* IPv4 */ +struct mfcctl mc; +memset(&mc, 0, sizeof(mc)); +memcpy(&mc.mfcc_origin, &source_addr, sizeof(mc.mfcc_origin)); +memcpy(&mc.mfcc_mcastgrp, &group_addr, sizeof(mc.mfcc_mcastgrp)); +mc.mfcc_parent = iif_index; +for (i = 0; i < maxvifs; i++) + mc.mfcc_ttls[i] = oifs_ttl[i]; +setsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_MFC, + (void *)&mc, sizeof(mc));</pre> +</div> +<div class="Bd Pp Li"> +<pre>/* IPv6 */ +struct mf6cctl mc; +memset(&mc, 0, sizeof(mc)); +memcpy(&mc.mf6cc_origin, &source_addr, sizeof(mc.mf6cc_origin)); +memcpy(&mc.mf6cc_mcastgrp, &group_addr, sizeof(mf6cc_mcastgrp)); +mc.mf6cc_parent = iif_index; +for (i = 0; i < maxvifs; i++) + if (oifs_ttl[i] > 0) + IF_SET(i, &mc.mf6cc_ifset); +setsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_ADD_MFC, + (void *)&mc, sizeof(mc));</pre> +</div> +<p class="Pp">The <var class="Va">source_addr</var> and + <var class="Va">group_addr</var> are the source and group address of the + multicast packet (as set in the upcall message). The + <var class="Va">iif_index</var> is the virtual interface index of the + multicast interface the multicast packets for this specific source and group + address should be received on. The <var class="Va">oifs_ttl[]</var> array + contains the minimum TTL (per interface) a multicast packet should have to + be forwarded on an outgoing interface. If the TTL value is zero, the + corresponding interface is not included in the set of outgoing interfaces. + Note that in case of IPv6 only the set of outgoing interfaces can be + specified.</p> +<p class="Pp">An MFC entry is deleted by:</p> +<div class="Bd Pp Li"> +<pre>/* IPv4 */ +struct mfcctl mc; +memset(&mc, 0, sizeof(mc)); +memcpy(&mc.mfcc_origin, &source_addr, sizeof(mc.mfcc_origin)); +memcpy(&mc.mfcc_mcastgrp, &group_addr, sizeof(mc.mfcc_mcastgrp)); +setsockopt(mrouter_s4, IPPROTO_IP, MRT_DEL_MFC, + (void *)&mc, sizeof(mc));</pre> +</div> +<div class="Bd Pp Li"> +<pre>/* IPv6 */ +struct mf6cctl mc; +memset(&mc, 0, sizeof(mc)); +memcpy(&mc.mf6cc_origin, &source_addr, sizeof(mc.mf6cc_origin)); +memcpy(&mc.mf6cc_mcastgrp, &group_addr, sizeof(mf6cc_mcastgrp)); +setsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_DEL_MFC, + (void *)&mc, sizeof(mc));</pre> +</div> +<p class="Pp">The following method can be used to get various statistics per + installed MFC entry in the kernel (e.g., the number of forwarded packets per + source and group address):</p> +<div class="Bd Pp Li"> +<pre>/* IPv4 */ +struct sioc_sg_req sgreq; +memset(&sgreq, 0, sizeof(sgreq)); +memcpy(&sgreq.src, &source_addr, sizeof(sgreq.src)); +memcpy(&sgreq.grp, &group_addr, sizeof(sgreq.grp)); +ioctl(mrouter_s4, SIOCGETSGCNT, &sgreq);</pre> +</div> +<div class="Bd Pp Li"> +<pre>/* IPv6 */ +struct sioc_sg_req6 sgreq; +memset(&sgreq, 0, sizeof(sgreq)); +memcpy(&sgreq.src, &source_addr, sizeof(sgreq.src)); +memcpy(&sgreq.grp, &group_addr, sizeof(sgreq.grp)); +ioctl(mrouter_s6, SIOCGETSGCNT_IN6, &sgreq);</pre> +</div> +<p class="Pp">The following method can be used to get various statistics per + multicast virtual interface in the kernel (e.g., the number of forwarded + packets per interface):</p> +<div class="Bd Pp Li"> +<pre>/* IPv4 */ +struct sioc_vif_req vreq; +memset(&vreq, 0, sizeof(vreq)); +vreq.vifi = vif_index; +ioctl(mrouter_s4, SIOCGETVIFCNT, &vreq);</pre> +</div> +<div class="Bd Pp Li"> +<pre>/* IPv6 */ +struct sioc_mif_req6 mreq; +memset(&mreq, 0, sizeof(mreq)); +mreq.mifi = vif_index; +ioctl(mrouter_s6, SIOCGETMIFCNT_IN6, &mreq);</pre> +</div> +</section> +<section class="Ss"> +<h2 class="Ss" id="Advanced_Multicast_API_Programming_Guide"><a class="permalink" href="#Advanced_Multicast_API_Programming_Guide">Advanced + Multicast API Programming Guide</a></h2> +<p class="Pp">If we want to add new features in the kernel, it becomes difficult + to preserve backward compatibility (binary and API), and at the same time to + allow user-level processes to take advantage of the new features (if the + kernel supports them).</p> +<p class="Pp">One of the mechanisms that allows us to preserve the backward + compatibility is a sort of negotiation between the user-level process and + the kernel:</p> +<ol class="Bl-enum"> + <li>The user-level process tries to enable in the kernel the set of new + features (and the corresponding API) it would like to use.</li> + <li>The kernel returns the (sub)set of features it knows about and is willing + to be enabled.</li> + <li>The user-level process uses only that set of features the kernel has + agreed on.</li> +</ol> +<p class="Pp">To support backward compatibility, if the user-level process does + not ask for any new features, the kernel defaults to the basic multicast API + (see the <a class="Sx" href="#Programming_Guide">Programming Guide</a> + section). Currently, the advanced multicast API exists only for IPv4; in the + future there will be IPv6 support as well.</p> +<p class="Pp">Below is a summary of the expandable API solution. Note that all + new options and structures are defined in + <code class="In"><<a class="In">netinet/ip_mroute.h</a>></code> and + <code class="In"><<a class="In">netinet6/ip6_mroute.h</a>></code>, + unless stated otherwise.</p> +<p class="Pp" id="getsockopt">The user-level process uses new + <a class="permalink" href="#getsockopt"><code class="Fn">getsockopt</code></a>()/<code class="Fn">setsockopt</code>() + options to perform the API features negotiation with the kernel. This + negotiation must be performed right after the multicast routing socket is + open. The set of desired/allowed features is stored in a bitset (currently, + in <var class="Vt">uint32_t</var>; i.e., maximum of 32 new features). The + new + <code class="Fn">getsockopt</code>()/<code class="Fn">setsockopt</code>() + options are <code class="Dv">MRT_API_SUPPORT</code> and + <code class="Dv">MRT_API_CONFIG</code>. Example:</p> +<div class="Bd Pp Li"> +<pre>uint32_t v; +getsockopt(sock, IPPROTO_IP, MRT_API_SUPPORT, (void *)&v, sizeof(v));</pre> +</div> +<p class="Pp" id="getsockopt~2">would set in <var class="Va">v</var> the + pre-defined bits that the kernel API supports. The eight least significant + bits in <var class="Vt">uint32_t</var> are same as the eight possible flags + <code class="Dv">MRT_MFC_FLAGS_*</code> that can be used in + <var class="Va">mfcc_flags</var> as part of the new definition of + <var class="Vt">struct mfcctl</var> (see below about those flags), which + leaves 24 flags for other new features. The value returned by + <a class="permalink" href="#getsockopt~2"><code class="Fn">getsockopt</code></a>(<var class="Fa">MRT_API_SUPPORT</var>) + is read-only; in other words, + <code class="Fn">setsockopt</code>(<var class="Fa">MRT_API_SUPPORT</var>) + would fail.</p> +<p class="Pp">To modify the API, and to set some specific feature in the kernel, + then:</p> +<div class="Bd Pp Li"> +<pre>uint32_t v = MRT_MFC_FLAGS_DISABLE_WRONGVIF; +if (setsockopt(sock, IPPROTO_IP, MRT_API_CONFIG, (void *)&v, sizeof(v)) + != 0) { + return (ERROR); +} +if (v & MRT_MFC_FLAGS_DISABLE_WRONGVIF) + return (OK); /* Success */ +else + return (ERROR);</pre> +</div> +<p class="Pp" id="setsockopt">In other words, when + <a class="permalink" href="#setsockopt"><code class="Fn">setsockopt</code></a>(<var class="Fa">MRT_API_CONFIG</var>) + is called, the argument to it specifies the desired set of features to be + enabled in the API and the kernel. The return value in + <var class="Va">v</var> is the actual (sub)set of features that were enabled + in the kernel. To obtain later the same set of features that were enabled, + then:</p> +<div class="Bd Pp Li"> +<pre>getsockopt(sock, IPPROTO_IP, MRT_API_CONFIG, (void *)&v, sizeof(v));</pre> +</div> +<p class="Pp" id="setsockopt~2">The set of enabled features is global. In other + words, + <a class="permalink" href="#setsockopt~2"><code class="Fn">setsockopt</code></a>(<var class="Fa">MRT_API_CONFIG</var>) + should be called right after + <code class="Fn">setsockopt</code>(<var class="Fa">MRT_INIT</var>).</p> +<p class="Pp">Currently, the following set of new features is defined:</p> +<div class="Bd Pp Li"> +<pre>#define MRT_MFC_FLAGS_DISABLE_WRONGVIF (1 << 0) /* disable WRONGVIF signals */ +#define MRT_MFC_FLAGS_BORDER_VIF (1 << 1) /* border vif */ +#define MRT_MFC_RP (1 << 8) /* enable RP address */ +#define MRT_MFC_BW_UPCALL (1 << 9) /* enable bw upcalls */</pre> +</div> +<p class="Pp">The advanced multicast API uses a newly defined + <var class="Vt">struct mfcctl2</var> instead of the traditional + <var class="Vt">struct mfcctl</var>. The original <var class="Vt">struct + mfcctl</var> is kept as is. The new <var class="Vt">struct mfcctl2</var> + is:</p> +<div class="Bd Pp Li"> +<pre>/* + * The new argument structure for MRT_ADD_MFC and MRT_DEL_MFC overlays + * and extends the old struct mfcctl. + */ +struct mfcctl2 { + /* the mfcctl fields */ + struct in_addr mfcc_origin; /* ip origin of mcasts */ + struct in_addr mfcc_mcastgrp; /* multicast group associated*/ + vifi_t mfcc_parent; /* incoming vif */ + u_char mfcc_ttls[MAXVIFS];/* forwarding ttls on vifs */ + + /* extension fields */ + uint8_t mfcc_flags[MAXVIFS];/* the MRT_MFC_FLAGS_* flags*/ + struct in_addr mfcc_rp; /* the RP address */ +};</pre> +</div> +<p class="Pp">The new fields are <var class="Va">mfcc_flags[MAXVIFS]</var> and + <var class="Va">mfcc_rp</var>. Note that for compatibility reasons they are + added at the end.</p> +<p class="Pp">The <var class="Va">mfcc_flags[MAXVIFS]</var> field is used to set + various flags per interface per (S,G) entry. Currently, the defined flags + are:</p> +<div class="Bd Pp Li"> +<pre>#define MRT_MFC_FLAGS_DISABLE_WRONGVIF (1 << 0) /* disable WRONGVIF signals */ +#define MRT_MFC_FLAGS_BORDER_VIF (1 << 1) /* border vif */</pre> +</div> +<p class="Pp">The <code class="Dv">MRT_MFC_FLAGS_DISABLE_WRONGVIF</code> flag is + used to explicitly disable the <code class="Dv">IGMPMSG_WRONGVIF</code> + kernel signal at the (S,G) granularity if a multicast data packet arrives on + the wrong interface. Usually, this signal is used to complete the + shortest-path switch in case of PIM-SM multicast routing, or to trigger a + PIM assert message. However, it should not be delivered for interfaces that + are not in the outgoing interface set, and that are not expecting to become + an incoming interface. Hence, if the + <code class="Dv">MRT_MFC_FLAGS_DISABLE_WRONGVIF</code> flag is set for some + of the interfaces, then a data packet that arrives on that interface for + that MFC entry will NOT trigger a WRONGVIF signal. If that flag is not set, + then a signal is triggered (the default action).</p> +<p class="Pp">The <code class="Dv">MRT_MFC_FLAGS_BORDER_VIF</code> flag is used + to specify whether the Border-bit in PIM Register messages should be set (in + case when the Register encapsulation is performed inside the kernel). If it + is set for the special PIM Register kernel virtual interface (see + <a class="Xr">pim(4)</a>), the Border-bit in the Register messages sent to + the RP will be set.</p> +<p class="Pp">The remaining six bits are reserved for future usage.</p> +<p class="Pp" id="setsockopt~3">The <var class="Va">mfcc_rp</var> field is used + to specify the RP address (in case of PIM-SM multicast routing) for a + multicast group G if we want to perform kernel-level PIM Register + encapsulation. The <var class="Va">mfcc_rp</var> field is used only if the + <code class="Dv">MRT_MFC_RP</code> advanced API flag/capability has been + successfully set by + <a class="permalink" href="#setsockopt~3"><code class="Fn">setsockopt</code></a>(<var class="Fa">MRT_API_CONFIG</var>).</p> +<p class="Pp" id="setsockopt~4">If the <code class="Dv">MRT_MFC_RP</code> flag + was successfully set by + <a class="permalink" href="#setsockopt~4"><code class="Fn">setsockopt</code></a>(<var class="Fa">MRT_API_CONFIG</var>), + then the kernel will attempt to perform the PIM Register encapsulation + itself instead of sending the multicast data packets to user level (inside + <code class="Dv">IGMPMSG_WHOLEPKT</code> upcalls) for user-level + encapsulation. The RP address would be taken from the + <var class="Va">mfcc_rp</var> field inside the new <var class="Vt">struct + mfcctl2</var>. However, even if the <code class="Dv">MRT_MFC_RP</code> flag + was successfully set, if the <var class="Va">mfcc_rp</var> field was set to + <code class="Dv">INADDR_ANY</code>, then the kernel will still deliver an + <code class="Dv">IGMPMSG_WHOLEPKT</code> upcall with the multicast data + packet to the user-level process.</p> +<p class="Pp">In addition, if the multicast data packet is too large to fit + within a single IP packet after the PIM Register encapsulation (e.g., if its + size was on the order of 65500 bytes), the data packet will be fragmented, + and then each of the fragments will be encapsulated separately. Note that + typically a multicast data packet can be that large only if it was + originated locally from the same hosts that performs the encapsulation; + otherwise the transmission of the multicast data packet over Ethernet for + example would have fragmented it into much smaller pieces.</p> +<p class="Pp">Typically, a multicast routing user-level process would need to + know the forwarding bandwidth for some data flow. For example, the multicast + routing process may want to timeout idle MFC entries, or in case of PIM-SM + it can initiate (S,G) shortest-path switch if the bandwidth rate is above a + threshold for example.</p> +<p class="Pp">The original solution for measuring the bandwidth of a dataflow + was that a user-level process would periodically query the kernel about the + number of forwarded packets/bytes per (S,G), and then based on those numbers + it would estimate whether a source has been idle, or whether the source's + transmission bandwidth is above a threshold. That solution is far from being + scalable, hence the need for a new mechanism for bandwidth monitoring.</p> +<p class="Pp">Below is a description of the bandwidth monitoring mechanism.</p> +<ul class="Bl-bullet"> + <li>If the bandwidth of a data flow satisfies some pre-defined filter, the + kernel delivers an upcall on the multicast routing socket to the multicast + routing process that has installed that filter.</li> + <li>The bandwidth-upcall filters are installed per (S,G). There can be more + than one filter per (S,G).</li> + <li>Instead of supporting all possible comparison operations (i.e., < <= + == != > >= ), there is support only for the <= and >= + operations, because this makes the kernel-level implementation simpler, + and because practically we need only those two. Further, the missing + operations can be simulated by secondary user-level filtering of those + <= and >= filters. For example, to simulate !=, then we need to + install filter “bw <= 0xffffffff”, and after an upcall is + received, we need to check whether “measured_bw != + expected_bw”.</li> + <li id="setsockopt~5">The bandwidth-upcall mechanism is enabled by + <a class="permalink" href="#setsockopt~5"><code class="Fn">setsockopt</code></a>(<var class="Fa">MRT_API_CONFIG</var>) + for the <code class="Dv">MRT_MFC_BW_UPCALL</code> flag.</li> + <li>The bandwidth-upcall filters are added/deleted by the new + <code class="Fn">setsockopt</code>(<var class="Fa">MRT_ADD_BW_UPCALL</var>) + and + <code class="Fn">setsockopt</code>(<var class="Fa">MRT_DEL_BW_UPCALL</var>) + respectively (with the appropriate <var class="Vt">struct bw_upcall</var> + argument of course).</li> +</ul> +<p class="Pp">From application point of view, a developer needs to know about + the following:</p> +<div class="Bd Pp Li"> +<pre>/* + * Structure for installing or delivering an upcall if the + * measured bandwidth is above or below a threshold. + * + * User programs (e.g. daemons) may have a need to know when the + * bandwidth used by some data flow is above or below some threshold. + * This interface allows the userland to specify the threshold (in + * bytes and/or packets) and the measurement interval. Flows are + * all packet with the same source and destination IP address. + * At the moment the code is only used for multicast destinations + * but there is nothing that prevents its use for unicast. + * + * The measurement interval cannot be shorter than some Tmin (currently, 3s). + * The threshold is set in packets and/or bytes per_interval. + * + * Measurement works as follows: + * + * For >= measurements: + * The first packet marks the start of a measurement interval. + * During an interval we count packets and bytes, and when we + * pass the threshold we deliver an upcall and we are done. + * The first packet after the end of the interval resets the + * count and restarts the measurement. + * + * For <= measurement: + * We start a timer to fire at the end of the interval, and + * then for each incoming packet we count packets and bytes. + * When the timer fires, we compare the value with the threshold, + * schedule an upcall if we are below, and restart the measurement + * (reschedule timer and zero counters). + */ + +struct bw_data { + struct timeval b_time; + uint64_t b_packets; + uint64_t b_bytes; +}; + +struct bw_upcall { + struct in_addr bu_src; /* source address */ + struct in_addr bu_dst; /* destination address */ + uint32_t bu_flags; /* misc flags (see below) */ +#define BW_UPCALL_UNIT_PACKETS (1 << 0) /* threshold (in packets) */ +#define BW_UPCALL_UNIT_BYTES (1 << 1) /* threshold (in bytes) */ +#define BW_UPCALL_GEQ (1 << 2) /* upcall if bw >= threshold */ +#define BW_UPCALL_LEQ (1 << 3) /* upcall if bw <= threshold */ +#define BW_UPCALL_DELETE_ALL (1 << 4) /* delete all upcalls for s,d*/ + struct bw_data bu_threshold; /* the bw threshold */ + struct bw_data bu_measured; /* the measured bw */ +}; + +/* max. number of upcalls to deliver together */ +#define BW_UPCALLS_MAX 128 +/* min. threshold time interval for bandwidth measurement */ +#define BW_UPCALL_THRESHOLD_INTERVAL_MIN_SEC 3 +#define BW_UPCALL_THRESHOLD_INTERVAL_MIN_USEC 0</pre> +</div> +<p class="Pp" id="setsockopt~6">The <var class="Vt">bw_upcall</var> structure is + used as an argument to + <a class="permalink" href="#setsockopt~6"><code class="Fn">setsockopt</code></a>(<var class="Fa">MRT_ADD_BW_UPCALL</var>) + and + <code class="Fn">setsockopt</code>(<var class="Fa">MRT_DEL_BW_UPCALL</var>). + Each + <code class="Fn">setsockopt</code>(<var class="Fa">MRT_ADD_BW_UPCALL</var>) + installs a filter in the kernel for the source and destination address in + the <var class="Vt">bw_upcall</var> argument, and that filter will trigger + an upcall according to the following pseudo-algorithm:</p> +<div class="Bd Pp Li"> +<pre> if (bw_upcall_oper IS ">=") { + if (((bw_upcall_unit & PACKETS == PACKETS) && + (measured_packets >= threshold_packets)) || + ((bw_upcall_unit & BYTES == BYTES) && + (measured_bytes >= threshold_bytes))) + SEND_UPCALL("measured bandwidth is >= threshold"); + } + if (bw_upcall_oper IS "<=" && measured_interval >= threshold_interval) { + if (((bw_upcall_unit & PACKETS == PACKETS) && + (measured_packets <= threshold_packets)) || + ((bw_upcall_unit & BYTES == BYTES) && + (measured_bytes <= threshold_bytes))) + SEND_UPCALL("measured bandwidth is <= threshold"); + }</pre> +</div> +<p class="Pp">In the same <var class="Vt">bw_upcall</var> the unit can be + specified in both BYTES and PACKETS. However, the GEQ and LEQ flags are + mutually exclusive.</p> +<p class="Pp">Basically, an upcall is delivered if the measured bandwidth is + >= or <= the threshold bandwidth (within the specified measurement + interval). For practical reasons, the smallest value for the measurement + interval is 3 seconds. If smaller values are allowed, then the bandwidth + estimation may be less accurate, or the potentially very high frequency of + the generated upcalls may introduce too much overhead. For the >= + operation, the answer may be known before the end of + <var class="Va">threshold_interval</var>, therefore the upcall may be + delivered earlier. For the <= operation however, we must wait until the + threshold interval has expired to know the answer.</p> +<p class="Pp">Example of usage:</p> +<div class="Bd Pp Li"> +<pre>struct bw_upcall bw_upcall; +/* Assign all bw_upcall fields as appropriate */ +memset(&bw_upcall, 0, sizeof(bw_upcall)); +memcpy(&bw_upcall.bu_src, &source, sizeof(bw_upcall.bu_src)); +memcpy(&bw_upcall.bu_dst, &group, sizeof(bw_upcall.bu_dst)); +bw_upcall.bu_threshold.b_data = threshold_interval; +bw_upcall.bu_threshold.b_packets = threshold_packets; +bw_upcall.bu_threshold.b_bytes = threshold_bytes; +if (is_threshold_in_packets) + bw_upcall.bu_flags |= BW_UPCALL_UNIT_PACKETS; +if (is_threshold_in_bytes) + bw_upcall.bu_flags |= BW_UPCALL_UNIT_BYTES; +do { + if (is_geq_upcall) { + bw_upcall.bu_flags |= BW_UPCALL_GEQ; + break; + } + if (is_leq_upcall) { + bw_upcall.bu_flags |= BW_UPCALL_LEQ; + break; + } + return (ERROR); +} while (0); +setsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_BW_UPCALL, + (void *)&bw_upcall, sizeof(bw_upcall));</pre> +</div> +<p class="Pp">To delete a single filter, then use + <code class="Dv">MRT_DEL_BW_UPCALL</code>, and the fields of bw_upcall must + be set exactly same as when <code class="Dv">MRT_ADD_BW_UPCALL</code> was + called.</p> +<p class="Pp">To delete all bandwidth filters for a given (S,G), then only the + <var class="Va">bu_src</var> and <var class="Va">bu_dst</var> fields in + <var class="Vt">struct bw_upcall</var> need to be set, and then just set + only the <code class="Dv">BW_UPCALL_DELETE_ALL</code> flag inside field + <var class="Va">bw_upcall.bu_flags</var>.</p> +<p class="Pp">The bandwidth upcalls are received by aggregating them in the new + upcall message:</p> +<div class="Bd Pp Li"> +<pre>#define IGMPMSG_BW_UPCALL 4 /* BW monitoring upcall */</pre> +</div> +<p class="Pp" id="ioctl">This message is an array of <var class="Vt">struct + bw_upcall</var> elements (up to <code class="Dv">BW_UPCALLS_MAX</code> = + 128). The upcalls are delivered when there are 128 pending upcalls, or when + 1 second has expired since the previous upcall (whichever comes first). In + an <var class="Vt">struct upcall</var> element, the + <var class="Va">bu_measured</var> field is filled-in to indicate the + particular measured values. However, because of the way the particular + intervals are measured, the user should be careful how + <var class="Va">bu_measured.b_time</var> is used. For example, if the filter + is installed to trigger an upcall if the number of packets is >= 1, then + <var class="Va">bu_measured</var> may have a value of zero in the upcalls + after the first one, because the measured interval for >= filters is + “clocked” by the forwarded packets. Hence, this upcall + mechanism should not be used for measuring the exact value of the bandwidth + of the forwarded data. To measure the exact bandwidth, the user would need + to get the forwarded packets statistics with the + <a class="permalink" href="#ioctl"><code class="Fn">ioctl</code></a>(<var class="Fa">SIOCGETSGCNT</var>) + mechanism (see the <a class="Sx" href="#Programming_Guide">Programming + Guide</a> section) .</p> +<p class="Pp">Note that the upcalls for a filter are delivered until the + specific filter is deleted, but no more frequently than once per + <var class="Va">bu_threshold.b_time</var>. For example, if the filter is + specified to deliver a signal if bw >= 1 packet, the first packet will + trigger a signal, but the next upcall will be triggered no earlier than + <var class="Va">bu_threshold.b_time</var> after the previous upcall.</p> +</section> +</section> +<section class="Sh"> +<h1 class="Sh" id="SEE_ALSO"><a class="permalink" href="#SEE_ALSO">SEE + ALSO</a></h1> +<p class="Pp"><a class="Xr">getsockopt(2)</a>, <a class="Xr">recvfrom(2)</a>, + <a class="Xr">recvmsg(2)</a>, <a class="Xr">setsockopt(2)</a>, + <a class="Xr">socket(2)</a>, <a class="Xr">sourcefilter(3)</a>, + <a class="Xr">altq(4)</a>, <a class="Xr">dummynet(4)</a>, + <a class="Xr">gif(4)</a>, <a class="Xr">gre(4)</a>, + <a class="Xr">icmp6(4)</a>, <a class="Xr">igmp(4)</a>, + <a class="Xr">inet(4)</a>, <a class="Xr">inet6(4)</a>, + <a class="Xr">intro(4)</a>, <a class="Xr">ip(4)</a>, + <a class="Xr">ip6(4)</a>, <a class="Xr">mld(4)</a>, + <a class="Xr">pim(4)</a></p> +</section> +<section class="Sh"> +<h1 class="Sh" id="HISTORY"><a class="permalink" href="#HISTORY">HISTORY</a></h1> +<p class="Pp">The Distance Vector Multicast Routing Protocol (DVMRP) was the + first developed multicast routing protocol. Later, other protocols such as + Multicast Extensions to OSPF (MOSPF) and Core Based Trees (CBT), were + developed as well. Routers at autonomous system boundaries may now exchange + multicast routes with peers via the Border Gateway Protocol (BGP). Many + other routing protocols are able to redistribute multicast routes for use + with <code class="Dv">PIM-SM</code> and <code class="Dv">PIM-DM</code>.</p> +</section> +<section class="Sh"> +<h1 class="Sh" id="AUTHORS"><a class="permalink" href="#AUTHORS">AUTHORS</a></h1> +<p class="Pp">The original multicast code was written by <span class="An">David + Waitzman</span> (BBN Labs), and later modified by the following individuals: + <span class="An">Steve Deering</span> (Stanford), <span class="An">Mark J. + Steiglitz</span> (Stanford), <span class="An">Van Jacobson</span> (LBL), + <span class="An">Ajit Thyagarajan</span> (PARC), <span class="An">Bill + Fenner</span> (PARC). The IPv6 multicast support was implemented by the KAME + project (<span class="Pa">https://www.kame.net</span>), and was based on the + IPv4 multicast code. The advanced multicast API and the multicast bandwidth + monitoring were implemented by <span class="An">Pavlin Radoslavov</span> + (ICSI) in collaboration with <span class="An">Chris Brown</span> (NextHop). + The IGMPv3 and MLDv2 multicast support was implemented by + <span class="An">Bruce Simpson</span>.</p> +<p class="Pp">This manual page was written by <span class="An">Pavlin + Radoslavov</span> (ICSI).</p> +</section> +</div> +<table class="foot"> + <tr> + <td class="foot-date">February 13, 2026</td> + <td class="foot-os">FreeBSD 15.0</td> + </tr> +</table> |
