diff options
| author | Jacob McDonnell <jacob@jacobmcdonnell.com> | 2026-04-25 19:55:15 -0400 |
|---|---|---|
| committer | Jacob McDonnell <jacob@jacobmcdonnell.com> | 2026-04-25 19:55:15 -0400 |
| commit | 253e67c8b3a72b3a4757fdbc5845297628db0a4a (patch) | |
| tree | adf53b66087aa30dfbf8bf391a1dadb044c3bf4d /static/netbsd/man8/raidctl.8 | |
| parent | a9157ce950dfe2fc30795d43b9d79b9d1bffc48b (diff) | |
docs: Added All NetBSD Manuals
Diffstat (limited to 'static/netbsd/man8/raidctl.8')
| -rw-r--r-- | static/netbsd/man8/raidctl.8 | 1654 |
1 files changed, 1654 insertions, 0 deletions
diff --git a/static/netbsd/man8/raidctl.8 b/static/netbsd/man8/raidctl.8 new file mode 100644 index 00000000..52878be8 --- /dev/null +++ b/static/netbsd/man8/raidctl.8 @@ -0,0 +1,1654 @@ +.\" $NetBSD: raidctl.8,v 1.82 2023/09/25 21:59:38 oster Exp $ +.\" +.\" Copyright (c) 1998, 2002 The NetBSD Foundation, Inc. +.\" All rights reserved. +.\" +.\" This code is derived from software contributed to The NetBSD Foundation +.\" by Greg Oster +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS +.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED +.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR +.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS +.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +.\" POSSIBILITY OF SUCH DAMAGE. +.\" +.\" +.\" Copyright (c) 1995 Carnegie-Mellon University. +.\" All rights reserved. +.\" +.\" Author: Mark Holland +.\" +.\" Permission to use, copy, modify and distribute this software and +.\" its documentation is hereby granted, provided that both the copyright +.\" notice and this permission notice appear in all copies of the +.\" software, derivative works or modified versions, and any portions +.\" thereof, and that both notices appear in supporting documentation. +.\" +.\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" +.\" CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND +.\" FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. +.\" +.\" Carnegie Mellon requests users of this software to return to +.\" +.\" Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU +.\" School of Computer Science +.\" Carnegie Mellon University +.\" Pittsburgh PA 15213-3890 +.\" +.\" any improvements or extensions that they make and grant Carnegie the +.\" rights to redistribute these changes. +.\" +.Dd September 25, 2023 +.Dt RAIDCTL 8 +.Os +.Sh NAME +.Nm raidctl +.Nd configuration utility for the RAIDframe disk driver +.Sh SYNOPSIS +.Nm +.Ar dev +.Ar command +.Op Ar arg Op ... +.Nm +.Op Fl v +.Fl A Op yes | no | forceroot | softroot +.Ar dev +.Nm +.Op Fl v +.Fl a Ar component Ar dev +.Nm +.Op Fl v +.Fl C Ar config_file Ar dev +.Nm +.Op Fl v +.Fl c Ar config_file Ar dev +.Nm +.Op Fl v +.Fl F Ar component Ar dev +.Nm +.Op Fl v +.Fl f Ar component Ar dev +.Nm +.Op Fl v +.Fl G Ar dev +.Nm +.Op Fl v +.Fl g Ar component Ar dev +.Nm +.Op Fl v +.Fl I Ar serial_number Ar dev +.Nm +.Op Fl v +.Fl i Ar dev +.Nm +.Op Fl v +.Fl L Ar dev +.Nm +.Op Fl v +.Fl M +.Oo yes | no | set +.Ar params +.Oc +.Ar dev +.Nm +.Op Fl v +.Fl m Ar dev +.Nm +.Op Fl v +.Fl P Ar dev +.Nm +.Op Fl v +.Fl p Ar dev +.Nm +.Op Fl v +.Fl R Ar component Ar dev +.Nm +.Op Fl v +.Fl r Ar component Ar dev +.Nm +.Op Fl v +.Fl S Ar dev +.Nm +.Op Fl v +.Fl s Ar dev +.Nm +.Op Fl v +.Fl t Ar config_file +.Nm +.Op Fl v +.Fl U Ar unit Ar dev +.Nm +.Op Fl v +.Fl u Ar dev +.Sh DESCRIPTION +.Nm +is the user-land control program for +.Xr raid 4 , +the RAIDframe disk device. +.Nm +is primarily used to dynamically configure and unconfigure RAIDframe disk +devices. +For more information about the RAIDframe disk device, see +.Xr raid 4 . +.Pp +This document assumes the reader has at least rudimentary knowledge of +RAID and RAID concepts. +.Pp +The simplified command-line options for +.Nm +are as follows: +.Bl -tag -width indent +.It Ic create Ar level Ar component1 Ar component2 Ar ... +where +.Ar level +specifies the RAID level and is one of +.Ar 0 +, +.Ar 1 +(or +.Ar mirror +), or +.Ar 5 +and each of +.Ar componentN +specify the devices to be configured into the RAID set. +.El +.Pp +The advanced command-line options for +.Nm +are as follows: +.Bl -tag -width indent +.It Fl A Ic yes Ar dev +Make the RAID set auto-configurable. +The RAID set will be automatically configured at boot +.Ar before +the root file system is mounted. +Note that all components of the set must be of type +.Dv RAID +in the disklabel. +.It Fl A Ic no Ar dev +Turn off auto-configuration for the RAID set. +.It Fl A Ic forceroot Ar dev +Make the RAID set auto-configurable, and also mark the set as being +eligible to be the root partition. +A RAID set configured this way will +.Ar override +the use of the boot disk as the root device. +All components of the set must be of type +.Dv RAID +in the disklabel. +Note that only certain architectures +(currently arc, alpha, amd64, bebox, cobalt, emips, evbarm, i386, landisk, +ofppc, pmax, riscv, sandpoint, sgimips, sparc, sparc64, and vax) +support booting a kernel directly from a RAID set. +Please note that +.Ic forceroot +mode was referred to as +.Ic root +mode on earlier versions of +.Nx . +For compatibility reasons, +.Ic root +can be used as an alias for +.Ic forceroot . +.It Fl A Ic softroot Ar dev +Like +.Ic forceroot , +but only change the root device if the boot device is part of the RAID set. +.It Fl a Ar component Ar dev +Add +.Ar component +as a hot spare for the device +.Ar dev . +Component labels (which identify the location of a given +component within a particular RAID set) are automatically added to the +hot spare after it has been used and are not required for +.Ar component +before it is used. +.It Fl C Ar config_file Ar dev +As for +.Fl c , +but forces the configuration to take place. +Fatal errors due to uninitialized components are ignored. +This is required the first time a RAID set is configured. +.It Fl c Ar config_file Ar dev +Configure the RAIDframe device +.Ar dev +according to the configuration given in +.Ar config_file . +A description of the contents of +.Ar config_file +is given later. +.It Fl F Ar component Ar dev +Fails the specified +.Ar component +of the device, and immediately begin a reconstruction of the failed +disk onto an available hot spare. +This is one of the mechanisms used to start +the reconstruction process if a component does have a hardware failure. +.It Fl f Ar component Ar dev +This marks the specified +.Ar component +as having failed, but does not initiate a reconstruction of that component. +.It Fl G Ar dev +Generate the configuration of the RAIDframe device in a format suitable for +use with the +.Fl c +or +.Fl C +options. +.It Fl g Ar component Ar dev +Get the component label for the specified component. +.It Fl I Ar serial_number Ar dev +Initialize the component labels on each component of the device. +.Ar serial_number +is used as one of the keys in determining whether a +particular set of components belong to the same RAID set. +While not strictly enforced, different serial numbers should be used for +different RAID sets. +This step +.Em MUST +be performed when a new RAID set is created. +.It Fl i Ar dev +Initialize the RAID device. +In particular, (re-)write the parity on the selected device. +This +.Em MUST +be done for +.Em all +RAID sets before the RAID device is labeled and before +file systems are created on the RAID device. +.It Fl L Ar dev +Rescan all devices on the system, looking for RAID sets that can be +auto-configured. The RAID device provided here has to be a valid +device, but does not need to be configured. (e.g. +.Bd -literal -offset indent +raidctl -L raid0 +.Ed +.Pp +is all that is needed to perform a rescan.) +.It Fl M Ic yes Ar dev +.\"XXX should there be a section with more info on the parity map feature? +Enable the use of a parity map on the RAID set; this is the default, +and greatly reduces the time taken to check parity after unclean +shutdowns at the cost of some very slight overhead during normal +operation. +Changes to this setting will take effect the next time the set is +configured. +Note that RAID-0 sets, having no parity, will not use a parity map in +any case. +.It Fl M Ic no Ar dev +Disable the use of a parity map on the RAID set; doing this is not +recommended. +This will take effect the next time the set is configured. +.It Fl M Ic set Ar cooldown Ar tickms Ar regions Ar dev +Alter the parameters of the parity map; parameters to leave unchanged +can be given as 0, and trailing zeroes may be omitted. +.\"XXX should this explanation be deferred to another section as well? +The RAID set is divided into +.Ar regions +regions; each region is marked dirty for at most +.Ar cooldown +intervals of +.Ar tickms +milliseconds each after a write to it, and at least +.Ar cooldown +\- 1 such intervals. +Changes to +.Ar regions +take effect the next time is configured, while changes to the other +parameters are applied immediately. +The default parameters are expected to be reasonable for most workloads. +.It Fl m Ar dev +Display status information about the parity map on the RAID set, if any. +If used with +.Fl v +then the current contents of the parity map will be output (in +hexadecimal format) as well. +.It Fl P Ar dev +Check the status of the parity on the RAID set, and initialize +(re-write) the parity if the parity is not known to be up-to-date. +This is normally used after a system crash (and before a +.Xr fsck 8 ) +to ensure the integrity of the parity. +.It Fl p Ar dev +Check the status of the parity on the RAID set. +Displays a status message, +and returns successfully if the parity is up-to-date. +.It Fl R Ar component Ar dev +Fails the specified +.Ar component , +if necessary, and immediately begins a reconstruction back to +.Ar component . +This is useful for reconstructing back onto a component after +it has been replaced following a failure. +.It Fl r Ar component Ar dev +Remove the specified +.Ar component +from the RAID. The component must be in the failed, spare, or spared state +in order to be removed. +.It Fl S Ar dev +Check the status of parity re-writing and component reconstruction. +The output indicates the amount of progress +achieved in each of these areas. +.It Fl s Ar dev +Display the status of the RAIDframe device for each of the components +and spares. +.It Fl t Ar config_file +Read and parse the +.Ar config_file , +reporting any errors, then exit. +No raidframe operations are performed. +.It Fl U Ar unit Ar dev +Set the +.Dv last_unit +field in all the raid components, so that the next time the raid +will be autoconfigured it uses that +.Ar unit . +.It Fl u Ar dev +Unconfigure the RAIDframe device. +This does not remove any component labels or change any configuration +settings (e.g. auto-configuration settings) for the RAID set. +.It Fl v +Be more verbose, and provide a progress indicator for operations such +as reconstructions and parity re-writing. +.El +.Pp +The device used by +.Nm +is specified by +.Ar dev . +.Ar dev +may be either the full name of the device, e.g., +.Pa /dev/rraid0d , +for the i386 architecture, or +.Pa /dev/rraid0c +for many others, or just simply +.Pa raid0 +(for +.Pa /dev/rraid0[cd] ) . +It is recommended that the partitions used to represent the +RAID device are not used for file systems. +.Ss Simple RAID configuration +For simple RAID configurations using RAID levels 0 (simple striping), +1 (mirroring), or 5 (striping with distributed parity) +.Nm +supports command-line configuration of RAID setups without +the use of a configuration file. For example, +.Bd -literal -offset indent +raidctl raid0 create 0 /dev/wd0e /dev/wd1e /dev/wd2e +.Ed +.Pp +will create a RAID level 0 set on the device named +.Pa raid0 +using the components +.Pa /dev/wd0e , +.Pa /dev/wd1e , +and +.Pa /dev/wd2e . +Similarly, +.Bd -literal -offset indent +raidctl raid0 create mirror absent /dev/wd1e +.Ed +.Pp +will create a RAID level 1 (mirror) set with an absent first component +and +.Pa /dev/wd1e +as the second component. In all cases the resulting RAID device will +be marked as auto-configurable, will have a serial number set (based +on the current time), and parity will be initialized (if the RAID level +has parity and sufficent components are present). Reasonable +performance values are automatically used by default for other +parameters normally specified in the configuration file. +.Pp +.Ss Configuration file +The format of the configuration file is complex, and +only an abbreviated treatment is given here. +In the configuration files, a +.Sq # +indicates the beginning of a comment. +.Pp +There are 4 required sections of a configuration file, and 2 +optional sections. +Each section begins with a +.Sq START , +followed by the section name, +and the configuration parameters associated with that section. +The first section is the +.Sq array +section, and it specifies +the number of columns, and spare disks in the RAID set. +For example: +.Bd -literal -offset indent +START array +3 0 +.Ed +.Pp +indicates an array with 3 columns, and 0 spare disks. +Old configurations specified a 3rd value in front of the +number of columns and spare disks. +This old value, if provided, must be specified as 1: +.Bd -literal -offset indent +START array +1 3 0 +.Ed +.Pp +The second section, the +.Sq disks +section, specifies the actual components of the device. +For example: +.Bd -literal -offset indent +START disks +/dev/sd0e +/dev/sd1e +/dev/sd2e +.Ed +.Pp +specifies the three component disks to be used in the RAID device. +Disk wedges may also be specified with the NAME=<wedge name> syntax. +If any of the specified drives cannot be found when the RAID device is +configured, then they will be marked as +.Sq failed , +and the system will operate in degraded mode. +Note that it is +.Em imperative +that the order of the components in the configuration file does not +change between configurations of a RAID device. +Changing the order of the components will result in data loss +if the set is configured with the +.Fl C +option. +In normal circumstances, the RAID set will not configure if only +.Fl c +is specified, and the components are out-of-order. +.Pp +The next section, which is the +.Sq spare +section, is optional, and, if present, specifies the devices to be used as +.Sq hot spares +\(em devices which are on-line, +but are not actively used by the RAID driver unless +one of the main components fail. +A simple +.Sq spare +section might be: +.Bd -literal -offset indent +START spare +/dev/sd3e +.Ed +.Pp +for a configuration with a single spare component. +If no spare drives are to be used in the configuration, then the +.Sq spare +section may be omitted. +.Pp +The next section is the +.Sq layout +section. +This section describes the general layout parameters for the RAID device, +and provides such information as +sectors per stripe unit, +stripe units per parity unit, +stripe units per reconstruction unit, +and the parity configuration to use. +This section might look like: +.Bd -literal -offset indent +START layout +# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level +32 1 1 5 +.Ed +.Pp +The sectors per stripe unit specifies, in blocks, the interleave +factor; i.e., the number of contiguous sectors to be written to each +component for a single stripe. +Appropriate selection of this value (32 in this example) +is the subject of much research in RAID architectures. +The stripe units per parity unit and +stripe units per reconstruction unit are normally each set to 1. +While certain values above 1 are permitted, a discussion of valid +values and the consequences of using anything other than 1 are outside +the scope of this document. +The last value in this section (5 in this example) +indicates the parity configuration desired. +Valid entries include: +.Bl -tag -width inde +.It 0 +RAID level 0. +No parity, only simple striping. +.It 1 +RAID level 1. +Mirroring. +The parity is the mirror. +.It 4 +RAID level 4. +Striping across components, with parity stored on the last component. +.It 5 +RAID level 5. +Striping across components, parity distributed across all components. +.El +.Pp +There are other valid entries here, including those for Even-Odd +parity, RAID level 5 with rotated sparing, Chained declustering, +and Interleaved declustering, but as of this writing the code for +those parity operations has not been tested with +.Nx . +.Pp +The next required section is the +.Sq queue +section. +This is most often specified as: +.Bd -literal -offset indent +START queue +fifo 100 +.Ed +.Pp +where the queuing method is specified as fifo (first-in, first-out), +and the size of the per-component queue is limited to 100 requests. +Other queuing methods may also be specified, but a discussion of them +is beyond the scope of this document. +.Pp +The final section, the +.Sq debug +section, is optional. +For more details on this the reader is referred to +the RAIDframe documentation discussed in the +.Sx HISTORY +section. +.Pp +Since +.Nx 10 +RAIDframe has been been capable of autoconfiguration of components +originally configured on opposite endian systems. The current label +endianness will be retained. +.Pp +See +.Sx EXAMPLES +for a more complete configuration file example. +.Sh FILES +.Bl -tag -width /dev/XXrXraidX -compact +.It Pa /dev/{,r}raid* +.Cm raid +device special files. +.El +.Sh EXAMPLES +The examples given in this section are for more complex +setups than can be configured with the simplified command-line +configuration option described early. +.Pp +It is highly recommended that before using the RAID driver for real +file systems that the system administrator(s) become quite familiar +with the use of +.Nm , +and that they understand how the component reconstruction process works. +The examples in this section will focus on configuring a +number of different RAID sets of varying degrees of redundancy. +By working through these examples, administrators should be able to +develop a good feel for how to configure a RAID set, and how to +initiate reconstruction of failed components. +.Pp +In the following examples +.Sq raid0 +will be used to denote the RAID device. +Depending on the architecture, +.Pa /dev/rraid0c +or +.Pa /dev/rraid0d +may be used in place of +.Pa raid0 . +.Ss Initialization and Configuration +The initial step in configuring a RAID set is to identify the components +that will be used in the RAID set. +All components should be the same size. +Each component should have a disklabel type of +.Dv FS_RAID , +and a typical disklabel entry for a RAID component might look like: +.Bd -literal -offset indent +f: 1800000 200495 RAID # (Cyl. 405*- 4041*) +.Ed +.Pp +While +.Dv FS_BSDFFS +will also work as the component type, the type +.Dv FS_RAID +is preferred for RAIDframe use, as it is required for features such as +auto-configuration. +As part of the initial configuration of each RAID set, +each component will be given a +.Sq component label . +A +.Sq component label +contains important information about the component, including a +user-specified serial number, the column of that component in +the RAID set, the redundancy level of the RAID set, a +.Sq modification counter , +and whether the parity information (if any) on that +component is known to be correct. +Component labels are an integral part of the RAID set, +since they are used to ensure that components +are configured in the correct order, and used to keep track of other +vital information about the RAID set. +Component labels are also required for the auto-detection +and auto-configuration of RAID sets at boot time. +For a component label to be considered valid, that +particular component label must be in agreement with the other +component labels in the set. +For example, the serial number, +.Sq modification counter , +and number of columns must all be in agreement. +If any of these are different, then the component is +not considered to be part of the set. +See +.Xr raid 4 +for more information about component labels. +.Pp +Once the components have been identified, and the disks have +appropriate labels, +.Nm +is then used to configure the +.Xr raid 4 +device. +To configure the device, a configuration file which looks something like: +.Bd -literal -offset indent +START array +# numCol numSpare +3 1 + +START disks +/dev/sd1e +/dev/sd2e +/dev/sd3e + +START spare +/dev/sd4e + +START layout +# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5 +32 1 1 5 + +START queue +fifo 100 +.Ed +.Pp +is created in a file. +The above configuration file specifies a RAID 5 +set consisting of the components +.Pa /dev/sd1e , +.Pa /dev/sd2e , +and +.Pa /dev/sd3e , +with +.Pa /dev/sd4e +available as a +.Sq hot spare +in case one of the three main drives should fail. +A RAID 0 set would be specified in a similar way: +.Bd -literal -offset indent +START array +# numCol numSpare +4 0 + +START disks +/dev/sd10e +/dev/sd11e +/dev/sd12e +/dev/sd13e + +START layout +# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0 +64 1 1 0 + +START queue +fifo 100 +.Ed +.Pp +In this case, devices +.Pa /dev/sd10e , +.Pa /dev/sd11e , +.Pa /dev/sd12e , +and +.Pa /dev/sd13e +are the components that make up this RAID set. +Note that there are no hot spares for a RAID 0 set, +since there is no way to recover data if any of the components fail. +.Pp +For a RAID 1 (mirror) set, the following configuration might be used: +.Bd -literal -offset indent +START array +# numCol numSpare +2 0 + +START disks +/dev/sd20e +/dev/sd21e + +START layout +# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1 +128 1 1 1 + +START queue +fifo 100 +.Ed +.Pp +In this case, +.Pa /dev/sd20e +and +.Pa /dev/sd21e +are the two components of the mirror set. +While no hot spares have been specified in this +configuration, they easily could be, just as they were specified in +the RAID 5 case above. +Note as well that RAID 1 sets are currently limited to only 2 components. +At present, n-way mirroring is not possible. +.Pp +The first time a RAID set is configured, the +.Fl C +option must be used: +.Bd -literal -offset indent +raidctl -C raid0.conf raid0 +.Ed +.Pp +where +.Pa raid0.conf +is the name of the RAID configuration file. +The +.Fl C +forces the configuration to succeed, even if any of the component +labels are incorrect. +The +.Fl C +option should not be used lightly in +situations other than initial configurations, as if +the system is refusing to configure a RAID set, there is probably a +very good reason for it. +After the initial configuration is done (and +appropriate component labels are added with the +.Fl I +option) then raid0 can be configured normally with: +.Bd -literal -offset indent +raidctl -c raid0.conf raid0 +.Ed +.Pp +When the RAID set is configured for the first time, it is +necessary to initialize the component labels, and to initialize the +parity on the RAID set. +Initializing the component labels is done with: +.Bd -literal -offset indent +raidctl -I 112341 raid0 +.Ed +.Pp +where +.Sq 112341 +is a user-specified serial number for the RAID set. +This initialization step is +.Em required +for all RAID sets. +As well, using different serial numbers between RAID sets is +.Em strongly encouraged , +as using the same serial number for all RAID sets will only serve to +decrease the usefulness of the component label checking. +.Pp +Initializing the RAID set is done via the +.Fl i +option. +This initialization +.Em MUST +be done for +.Em all +RAID sets, since among other things it verifies that the parity (if +any) on the RAID set is correct. +Since this initialization may be quite time-consuming, the +.Fl v +option may be also used in conjunction with +.Fl i : +.Bd -literal -offset indent +raidctl -iv raid0 +.Ed +.Pp +This will give more verbose output on the +status of the initialization: +.Bd -literal -offset indent +Initiating re-write of parity +Parity Re-write status: + 10% |**** | ETA: 06:03 / +.Ed +.Pp +The output provides a +.Sq Percent Complete +in both a numeric and graphical format, as well as an estimated time +to completion of the operation. +.Pp +Since it is the parity that provides the +.Sq redundancy +part of RAID, it is critical that the parity is correct as much as possible. +If the parity is not correct, then there is no +guarantee that data will not be lost if a component fails. +.Pp +Once the parity is known to be correct, it is then safe to perform +.Xr disklabel 8 , +.Xr newfs 8 , +or +.Xr fsck 8 +on the device or its file systems, and then to mount the file systems +for use. +.Pp +Under certain circumstances (e.g., the additional component has not +arrived, or data is being migrated off of a disk destined to become a +component) it may be desirable to configure a RAID 1 set with only +a single component. +This can be achieved by using the word +.Dq absent +to indicate that a particular component is not present. +In the following: +.Bd -literal -offset indent +START array +# numCol numSpare +2 0 + +START disks +absent +/dev/sd0e + +START layout +# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1 +128 1 1 1 + +START queue +fifo 100 +.Ed +.Pp +.Pa /dev/sd0e +is the real component, and will be the second disk of a RAID 1 set. +The first component is simply marked as being absent. +Configuration (using +.Fl C +and +.Fl I Ar 12345 +as above) proceeds normally, but initialization of the RAID set will +have to wait until all physical components are present. +After configuration, this set can be used normally, but will be operating +in degraded mode. +Once a second physical component is obtained, it can be hot-added, +the existing data mirrored, and normal operation resumed. +.Pp +The size of the resulting RAID set will depend on the number of data +components in the set. +Space is automatically reserved for the component labels, and +the actual amount of space used +for data on a component will be rounded down to the largest possible +multiple of the sectors per stripe unit (sectPerSU) value. +Thus, the amount of space provided by the RAID set will be less +than the sum of the size of the components. +.Ss Maintenance of the RAID set +After the parity has been initialized for the first time, the command: +.Bd -literal -offset indent +raidctl -p raid0 +.Ed +.Pp +can be used to check the current status of the parity. +To check the parity and rebuild it necessary (for example, +after an unclean shutdown) the command: +.Bd -literal -offset indent +raidctl -P raid0 +.Ed +.Pp +is used. +Note that re-writing the parity can be done while +other operations on the RAID set are taking place (e.g., while doing a +.Xr fsck 8 +on a file system on the RAID set). +However: for maximum effectiveness of the RAID set, the parity should be +known to be correct before any data on the set is modified. +.Pp +To see how the RAID set is doing, the following command can be used to +show the RAID set's status: +.Bd -literal -offset indent +raidctl -s raid0 +.Ed +.Pp +The output will look something like: +.Bd -literal -offset indent +Components: + /dev/sd1e: optimal + /dev/sd2e: optimal + /dev/sd3e: optimal +Spares: + /dev/sd4e: spare +Component label for /dev/sd1e: + Row: 0 Column: 0 Num Rows: 1 Num Columns: 3 + Version: 2 Serial Number: 13432 Mod Counter: 65 + Clean: No Status: 0 + sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1 + RAID Level: 5 blocksize: 512 numBlocks: 1799936 + Autoconfig: No + Last configured as: raid0 +Component label for /dev/sd2e: + Row: 0 Column: 1 Num Rows: 1 Num Columns: 3 + Version: 2 Serial Number: 13432 Mod Counter: 65 + Clean: No Status: 0 + sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1 + RAID Level: 5 blocksize: 512 numBlocks: 1799936 + Autoconfig: No + Last configured as: raid0 +Component label for /dev/sd3e: + Row: 0 Column: 2 Num Rows: 1 Num Columns: 3 + Version: 2 Serial Number: 13432 Mod Counter: 65 + Clean: No Status: 0 + sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1 + RAID Level: 5 blocksize: 512 numBlocks: 1799936 + Autoconfig: No + Last configured as: raid0 +Parity status: clean +Reconstruction is 100% complete. +Parity Re-write is 100% complete. +.Ed +.Pp +This indicates that all is well with the RAID set. +Of importance here are the component lines which read +.Sq optimal , +and the +.Sq Parity status +line. +.Sq Parity status: clean +indicates that the parity is up-to-date for this RAID set, +whether or not the RAID set is in redundant or degraded mode. +.Sq Parity status: DIRTY +indicates that it is not known if the parity information is +consistent with the data, and that the parity information needs +to be checked. +Note that if there are file systems open on the RAID set, +the individual components will not be +.Sq clean +but the set as a whole can still be clean. +.Pp +To check the component label of +.Pa /dev/sd1e , +the following is used: +.Bd -literal -offset indent +raidctl -g /dev/sd1e raid0 +.Ed +.Pp +The output of this command will look something like: +.Bd -literal -offset indent +Component label for /dev/sd1e: + Row: 0 Column: 0 Num Rows: 1 Num Columns: 3 + Version: 2 Serial Number: 13432 Mod Counter: 65 + Clean: No Status: 0 + sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1 + RAID Level: 5 blocksize: 512 numBlocks: 1799936 + Autoconfig: No + Last configured as: raid0 +.Ed +.Ss Dealing with Component Failures +If for some reason +(perhaps to test reconstruction) it is necessary to pretend a drive +has failed, the following will perform that function: +.Bd -literal -offset indent +raidctl -f /dev/sd2e raid0 +.Ed +.Pp +The system will then be performing all operations in degraded mode, +where missing data is re-computed from existing data and the parity. +In this case, obtaining the status of raid0 will return (in part): +.Bd -literal -offset indent +Components: + /dev/sd1e: optimal + /dev/sd2e: failed + /dev/sd3e: optimal +Spares: + /dev/sd4e: spare +.Ed +.Pp +Note that with the use of +.Fl f +a reconstruction has not been started. +To both fail the disk and start a reconstruction, the +.Fl F +option must be used: +.Bd -literal -offset indent +raidctl -F /dev/sd2e raid0 +.Ed +.Pp +The +.Fl f +option may be used first, and then the +.Fl F +option used later, on the same disk, if desired. +Immediately after the reconstruction is started, the status will report: +.Bd -literal -offset indent +Components: + /dev/sd1e: optimal + /dev/sd2e: reconstructing + /dev/sd3e: optimal +Spares: + /dev/sd4e: used_spare +[...] +Parity status: clean +Reconstruction is 10% complete. +Parity Re-write is 100% complete. +.Ed +.Pp +This indicates that a reconstruction is in progress. +To find out how the reconstruction is progressing the +.Fl S +option may be used. +This will indicate the progress in terms of the +percentage of the reconstruction that is completed. +When the reconstruction is finished the +.Fl s +option will show: +.Bd -literal -offset indent +Components: + /dev/sd1e: optimal + /dev/sd4e: optimal + /dev/sd3e: optimal +No spares. +[...] +Parity status: clean +Reconstruction is 100% complete. +Parity Re-write is 100% complete. +.Ed +.Pp +as +.Pa /dev/sd2e +has been removed and replaced with +.Pa /dev/sd4e . +.Pp +If a component fails and there are no hot spares +available on-line, the status of the RAID set might (in part) look like: +.Bd -literal -offset indent +Components: + /dev/sd1e: optimal + /dev/sd2e: failed + /dev/sd3e: optimal +No spares. +.Ed +.Pp +In this case there are a number of options. +The first option is to add a hot spare using: +.Bd -literal -offset indent +raidctl -a /dev/sd4e raid0 +.Ed +.Pp +After the hot add, the status would then be: +.Bd -literal -offset indent +Components: + /dev/sd1e: optimal + /dev/sd2e: failed + /dev/sd3e: optimal +Spares: + /dev/sd4e: spare +.Ed +.Pp +Reconstruction could then take place using +.Fl F +as described above. +.Pp +A second option is to rebuild directly onto +.Pa /dev/sd2e . +Once the disk containing +.Pa /dev/sd2e +has been replaced, one can simply use: +.Bd -literal -offset indent +raidctl -R /dev/sd2e raid0 +.Ed +.Pp +to rebuild the +.Pa /dev/sd2e +component. +As the rebuilding is in progress, the status will be: +.Bd -literal -offset indent +Components: + /dev/sd1e: optimal + /dev/sd2e: reconstructing + /dev/sd3e: optimal +No spares. +.Ed +.Pp +and when completed, will be: +.Bd -literal -offset indent +Components: + /dev/sd1e: optimal + /dev/sd2e: optimal + /dev/sd3e: optimal +No spares. +.Ed +.Pp +In circumstances where a particular component is completely +unavailable after a reboot, a special component name will be used to +indicate the missing component. +For example: +.Bd -literal -offset indent +Components: + /dev/sd2e: optimal + component1: failed +No spares. +.Ed +.Pp +indicates that the second component of this RAID set was not detected +at all by the auto-configuration code. +The name +.Sq component1 +can be used anywhere a normal component name would be used. +For example, to add a hot spare to the above set, and rebuild to that hot +spare, the following could be done: +.Bd -literal -offset indent +raidctl -a /dev/sd3e raid0 +raidctl -F component1 raid0 +.Ed +.Pp +at which point the data missing from +.Sq component1 +would be reconstructed onto +.Pa /dev/sd3e . +.Pp +When more than one component is marked as +.Sq failed +due to a non-component hardware failure (e.g., loss of power to two +components, adapter problems, termination problems, or cabling issues) it +is quite possible to recover the data on the RAID set. +The first thing to be aware of is that the first disk to fail will +almost certainly be out-of-sync with the remainder of the array. +If any IO was performed between the time the first component is considered +.Sq failed +and when the second component is considered +.Sq failed , +then the first component to fail will +.Em not +contain correct data, and should be ignored. +When the second component is marked as failed, however, the RAID device will +(currently) panic the system. +At this point the data on the RAID set +(not including the first failed component) is still self consistent, +and will be in no worse state of repair than had the power gone out in +the middle of a write to a file system on a non-RAID device. +The problem, however, is that the component labels may now have 3 different +.Sq modification counters +(one value on the first component that failed, one value on the second +component that failed, and a third value on the remaining components). +In such a situation, the RAID set will not autoconfigure, +and can only be forcibly re-configured +with the +.Fl C +option. +To recover the RAID set, one must first remedy whatever physical +problem caused the multiple-component failure. +After that is done, the RAID set can be restored by forcibly +configuring the raid set +.Em without +the component that failed first. +For example, if +.Pa /dev/sd1e +and +.Pa /dev/sd2e +fail (in that order) in a RAID set of the following configuration: +.Bd -literal -offset indent +START array +4 0 + +START disks +/dev/sd1e +/dev/sd2e +/dev/sd3e +/dev/sd4e + +START layout +# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5 +64 1 1 5 + +START queue +fifo 100 + +.Ed +.Pp +then the following configuration (say "recover_raid0.conf") +.Bd -literal -offset indent +START array +4 0 + +START disks +absent +/dev/sd2e +/dev/sd3e +/dev/sd4e + +START layout +# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5 +64 1 1 5 + +START queue +fifo 100 +.Ed +.Pp +can be used with +.Bd -literal -offset indent +raidctl -C recover_raid0.conf raid0 +.Ed +.Pp +to force the configuration of raid0. +A +.Bd -literal -offset indent +raidctl -I 12345 raid0 +.Ed +.Pp +will be required in order to synchronize the component labels. +At this point the file systems on the RAID set can then be checked and +corrected. +To complete the re-construction of the RAID set, +.Pa /dev/sd1e +is simply hot-added back into the array, and reconstructed +as described earlier. +.Ss RAID on RAID +RAID sets can be layered to create more complex and much larger RAID sets. +A RAID 0 set, for example, could be constructed from four RAID 5 sets. +The following configuration file shows such a setup: +.Bd -literal -offset indent +START array +# numCol numSpare +4 0 + +START disks +/dev/raid1e +/dev/raid2e +/dev/raid3e +/dev/raid4e + +START layout +# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0 +128 1 1 0 + +START queue +fifo 100 +.Ed +.Pp +A similar configuration file might be used for a RAID 0 set +constructed from components on RAID 1 sets. +In such a configuration, the mirroring provides a high degree +of redundancy, while the striping provides additional speed benefits. +.Ss Auto-configuration and Root on RAID +RAID sets can also be auto-configured at boot. +To make a set auto-configurable, +simply prepare the RAID set as above, and then do a: +.Bd -literal -offset indent +raidctl -A yes raid0 +.Ed +.Pp +to turn on auto-configuration for that set. +To turn off auto-configuration, use: +.Bd -literal -offset indent +raidctl -A no raid0 +.Ed +.Pp +RAID sets which are auto-configurable will be configured before the +root file system is mounted. +These RAID sets are thus available for +use as a root file system, or for any other file system. +A primary advantage of using the auto-configuration is that RAID components +become more independent of the disks they reside on. +For example, SCSI ID's can change, but auto-configured sets will always be +configured correctly, even if the SCSI ID's of the component disks +have become scrambled. +.Pp +Having a system's root file system +.Pq Pa / +on a RAID set is also allowed, with the +.Sq a +partition of such a RAID set being used for +.Pa / . +To use raid0a as the root file system, simply use: +.Bd -literal -offset indent +raidctl -A forceroot raid0 +.Ed +.Pp +To return raid0a to be just an auto-configuring set simply use the +.Fl A Ar yes +arguments. +.Pp +Note that kernels can only be directly read from RAID 1 components on +architectures that support that +(currently alpha, i386, pmax, sandpoint, sparc, sparc64, and vax). +On those architectures, the +.Dv FS_RAID +file system is recognized by the bootblocks, and will properly load the +kernel directly from a RAID 1 component. +For other architectures, or to support the root file system +on other RAID sets, some other mechanism must be used to get a kernel booting. +For example, a small partition containing only the secondary boot-blocks +and an alternate kernel (or two) could be used. +Once a kernel is booting however, and an auto-configuring RAID set is +found that is eligible to be root, then that RAID set will be +auto-configured and used as the root device. +If two or more RAID sets claim to be root devices, then the +user will be prompted to select the root device. +At this time, RAID 0, 1, 4, and 5 sets are all supported as root devices. +.Pp +A typical RAID 1 setup with root on RAID might be as follows: +.Bl -enum +.It +wd0a - a small partition, which contains a complete, bootable, basic +.Nx +installation. +.It +wd1a - also contains a complete, bootable, basic +.Nx +installation. +.It +wd0e and wd1e - a RAID 1 set, raid0, used for the root file system. +.It +wd0f and wd1f - a RAID 1 set, raid1, which will be used only for +swap space. +.It +wd0g and wd1g - a RAID 1 set, raid2, used for +.Pa /usr , +.Pa /home , +or other data, if desired. +.It +wd0h and wd1h - a RAID 1 set, raid3, if desired. +.El +.Pp +RAID sets raid0, raid1, and raid2 are all marked as auto-configurable. +raid0 is marked as being a root file system. +When new kernels are installed, the kernel is not only copied to +.Pa / , +but also to wd0a and wd1a. +The kernel on wd0a is required, since that +is the kernel the system boots from. +The kernel on wd1a is also +required, since that will be the kernel used should wd0 fail. +The important point here is to have redundant copies of the kernel +available, in the event that one of the drives fail. +.Pp +There is no requirement that the root file system be on the same disk +as the kernel. +For example, obtaining the kernel from wd0a, and using +sd0e and sd1e for raid0, and the root file system, is fine. +It +.Em is +critical, however, that there be multiple kernels available, in the +event of media failure. +.Pp +Multi-layered RAID devices (such as a RAID 0 set made +up of RAID 1 sets) are +.Em not +supported as root devices or auto-configurable devices at this point. +(Multi-layered RAID devices +.Em are +supported in general, however, as mentioned earlier.) +Note that in order to enable component auto-detection and +auto-configuration of RAID devices, the line: +.Bd -literal -offset indent +options RAID_AUTOCONFIG +.Ed +.Pp +must be in the kernel configuration file. +See +.Xr raid 4 +for more details. +.Ss Swapping on RAID +A RAID device can be used as a swap device. +In order to ensure that a RAID device used as a swap device +is correctly unconfigured when the system is shutdown or rebooted, +it is recommended that the line +.Bd -literal -offset indent +swapoff=YES +.Ed +.Pp +be added to +.Pa /etc/rc.conf . +.Ss Unconfiguration +The final operation performed by +.Nm +is to unconfigure a +.Xr raid 4 +device. +This is accomplished via a simple: +.Bd -literal -offset indent +raidctl -u raid0 +.Ed +.Pp +at which point the device is ready to be reconfigured. +.Ss Performance Tuning +Selection of the various parameter values which result in the best +performance can be quite tricky, and often requires a bit of +trial-and-error to get those values most appropriate for a given system. +A whole range of factors come into play, including: +.Bl -enum +.It +Types of components (e.g., SCSI vs. IDE) and their bandwidth +.It +Types of controller cards and their bandwidth +.It +Distribution of components among controllers +.It +IO bandwidth +.It +file system access patterns +.It +CPU speed +.El +.Pp +As with most performance tuning, benchmarking under real-life loads +may be the only way to measure expected performance. +Understanding some of the underlying technology is also useful in tuning. +The goal of this section is to provide pointers to those parameters which may +make significant differences in performance. +.Pp +For a RAID 1 set, a SectPerSU value of 64 or 128 is typically sufficient. +Since data in a RAID 1 set is arranged in a linear +fashion on each component, selecting an appropriate stripe size is +somewhat less critical than it is for a RAID 5 set. +However: a stripe size that is too small will cause large IO's to be +broken up into a number of smaller ones, hurting performance. +At the same time, a large stripe size may cause problems with +concurrent accesses to stripes, which may also affect performance. +Thus values in the range of 32 to 128 are often the most effective. +.Pp +Tuning RAID 5 sets is trickier. +In the best case, IO is presented to the RAID set one stripe at a time. +Since the entire stripe is available at the beginning of the IO, +the parity of that stripe can be calculated before the stripe is written, +and then the stripe data and parity can be written in parallel. +When the amount of data being written is less than a full stripe worth, the +.Sq small write +problem occurs. +Since a +.Sq small write +means only a portion of the stripe on the components is going to +change, the data (and parity) on the components must be updated +slightly differently. +First, the +.Sq old parity +and +.Sq old data +must be read from the components. +Then the new parity is constructed, +using the new data to be written, and the old data and old parity. +Finally, the new data and new parity are written. +All this extra data shuffling results in a serious loss of performance, +and is typically 2 to 4 times slower than a full stripe write (or read). +To combat this problem in the real world, it may be useful +to ensure that stripe sizes are small enough that a +.Sq large IO +from the system will use exactly one large stripe write. +As is seen later, there are some file system dependencies +which may come into play here as well. +.Pp +Since the size of a +.Sq large IO +is often (currently) only 32K or 64K, on a 5-drive RAID 5 set it may +be desirable to select a SectPerSU value of 16 blocks (8K) or 32 +blocks (16K). +Since there are 4 data sectors per stripe, the maximum +data per stripe is 64 blocks (32K) or 128 blocks (64K). +Again, empirical measurement will provide the best indicators of which +values will yield better performance. +.Pp +The parameters used for the file system are also critical to good performance. +For +.Xr newfs 8 , +for example, increasing the block size to 32K or 64K may improve +performance dramatically. +As well, changing the cylinders-per-group +parameter from 16 to 32 or higher is often not only necessary for +larger file systems, but may also have positive performance implications. +.Ss Summary +Despite the length of this man-page, configuring a RAID set is a +relatively straight-forward process. +All that needs to be done is the following steps: +.Bl -enum +.It +Use +.Xr disklabel 8 +to create the components (of type RAID). +.It +Construct a RAID configuration file: e.g., +.Pa raid0.conf +.It +Configure the RAID set with: +.Bd -literal -offset indent +raidctl -C raid0.conf raid0 +.Ed +.It +Initialize the component labels with: +.Bd -literal -offset indent +raidctl -I 123456 raid0 +.Ed +.It +Initialize other important parts of the set with: +.Bd -literal -offset indent +raidctl -i raid0 +.Ed +.It +Get the default label for the RAID set: +.Bd -literal -offset indent +disklabel raid0 > /tmp/label +.Ed +.It +Edit the label: +.Bd -literal -offset indent +vi /tmp/label +.Ed +.It +Put the new label on the RAID set: +.Bd -literal -offset indent +disklabel -R -r raid0 /tmp/label +.Ed +.It +Create the file system: +.Bd -literal -offset indent +newfs /dev/rraid0e +.Ed +.It +Mount the file system: +.Bd -literal -offset indent +mount /dev/raid0e /mnt +.Ed +.It +Use: +.Bd -literal -offset indent +raidctl -c raid0.conf raid0 +.Ed +.Pp +To re-configure the RAID set the next time it is needed, or put +.Pa raid0.conf +into +.Pa /etc +where it will automatically be started by the +.Pa /etc/rc.d +scripts. +.El +.Sh SEE ALSO +.Xr ccd 4 , +.Xr raid 4 , +.Xr rc 8 +.Sh HISTORY +RAIDframe is a framework for rapid prototyping of RAID structures +developed by the folks at the Parallel Data Laboratory at Carnegie +Mellon University (CMU). +A more complete description of the internals and functionality of +RAIDframe is found in the paper "RAIDframe: A Rapid Prototyping Tool +for RAID Systems", by William V. Courtright II, Garth Gibson, Mark +Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the +Parallel Data Laboratory of Carnegie Mellon University. +The +.Nm +command first appeared as a program in CMU's RAIDframe v1.1 distribution. +This version of +.Nm +is a complete re-write, and first appeared in +.Nx 1.4 . +.Sh COPYRIGHT +.Bd -literal +The RAIDframe Copyright is as follows: + +Copyright (c) 1994-1996 Carnegie-Mellon University. +All rights reserved. + +Permission to use, copy, modify and distribute this software and +its documentation is hereby granted, provided that both the copyright +notice and this permission notice appear in all copies of the +software, derivative works or modified versions, and any portions +thereof, and that both notices appear in supporting documentation. + +CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" +CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND +FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. + +Carnegie Mellon requests users of this software to return to + + Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU + School of Computer Science + Carnegie Mellon University + Pittsburgh PA 15213-3890 + +any improvements or extensions that they make and grant Carnegie the +rights to redistribute these changes. +.Ed +.Sh WARNINGS +Certain RAID levels (1, 4, 5, 6, and others) can protect against some +data loss due to component failure. +However the loss of two components of a RAID 4 or 5 system, +or the loss of a single component of a RAID 0 system will +result in the entire file system being lost. +RAID is +.Em NOT +a substitute for good backup practices. +.Pp +Recomputation of parity +.Em MUST +be performed whenever there is a chance that it may have been compromised. +This includes after system crashes, or before a RAID +device has been used for the first time. +Failure to keep parity correct will be catastrophic should a +component ever fail \(em it is better to use RAID 0 and get the +additional space and speed, than it is to use parity, but +not keep the parity correct. +At least with RAID 0 there is no perception of increased data security. +.Pp +When replacing a failed component of a RAID set, it is a good +idea to zero out the first 64 blocks of the new component to insure the +RAIDframe driver doesn't erroneously detect a component label in the +new component. +This is particularly true on +.Em RAID 1 +sets because there is at most one correct component label in a failed RAID +1 installation, and the RAIDframe driver picks the component label with the +highest serial number and modification value as the authoritative source +for the failed RAID set when choosing which component label to use to +configure the RAID set. +.Sh BUGS +Hot-spare removal is currently not available. |
