summaryrefslogtreecommitdiff
path: root/static/netbsd/man4/raid.4 3.html
blob: e8a74045e80aac2368b675c24a7c0b278775eedd (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
<table class="head">
  <tr>
    <td class="head-ltitle">RAID(4)</td>
    <td class="head-vol">Device Drivers Manual</td>
    <td class="head-rtitle">RAID(4)</td>
  </tr>
</table>
<div class="manual-text">
<section class="Sh">
<h1 class="Sh" id="NAME"><a class="permalink" href="#NAME">NAME</a></h1>
<p class="Pp"><code class="Nm">raid</code> &#x2014; <span class="Nd">RAIDframe
    disk driver</span></p>
</section>
<section class="Sh">
<h1 class="Sh" id="SYNOPSIS"><a class="permalink" href="#SYNOPSIS">SYNOPSIS</a></h1>
<p class="Pp"><code class="Cd">options RAID_AUTOCONFIG</code>
  <br/>
  <code class="Cd">options RAID_DIAGNOSTIC</code>
  <br/>
  <code class="Cd">options RF_ACC_TRACE=n</code>
  <br/>
  <code class="Cd">options RF_DEBUG_MAP=n</code>
  <br/>
  <code class="Cd">options RF_DEBUG_PSS=n</code>
  <br/>
  <code class="Cd">options RF_DEBUG_QUEUE=n</code>
  <br/>
  <code class="Cd">options RF_DEBUG_QUIESCE=n</code>
  <br/>
  <code class="Cd">options RF_DEBUG_RECON=n</code>
  <br/>
  <code class="Cd">options RF_DEBUG_STRIPELOCK=n</code>
  <br/>
  <code class="Cd">options RF_DEBUG_VALIDATE_DAG=n</code>
  <br/>
  <code class="Cd">options RF_DEBUG_VERIFYPARITY=n</code>
  <br/>
  <code class="Cd">options RF_INCLUDE_CHAINDECLUSTER=n</code>
  <br/>
  <code class="Cd">options RF_INCLUDE_EVENODD=n</code>
  <br/>
  <code class="Cd">options RF_INCLUDE_INTERDECLUSTER=n</code>
  <br/>
  <code class="Cd">options RF_INCLUDE_PARITY_DECLUSTERING=n</code>
  <br/>
  <code class="Cd">options RF_INCLUDE_PARITY_DECLUSTERING_DS=n</code>
  <br/>
  <code class="Cd">options RF_INCLUDE_PARITYLOGGING=n</code>
  <br/>
  <code class="Cd">options RF_INCLUDE_RAID5_RS=n</code></p>
<p class="Pp">
  <br/>
  <code class="Cd">pseudo-device raid</code></p>
</section>
<section class="Sh">
<h1 class="Sh" id="DESCRIPTION"><a class="permalink" href="#DESCRIPTION">DESCRIPTION</a></h1>
<p class="Pp">The <code class="Nm">raid</code> driver provides RAID 0, 1, 4, and
    5 (and more!) capabilities to <span class="Ux">NetBSD</span>. This document
    assumes that the reader has at least some familiarity with RAID and RAID
    concepts. The reader is also assumed to know how to configure disks and
    pseudo-devices into kernels, how to generate kernels, and how to partition
    disks.</p>
<p class="Pp">RAIDframe provides a number of different RAID levels
  including:</p>
<dl class="Bl-tag">
  <dt>RAID 0</dt>
  <dd>provides simple data striping across the components.</dd>
  <dt>RAID 1</dt>
  <dd>provides mirroring.</dd>
  <dt>RAID 4</dt>
  <dd>provides data striping across the components, with parity stored on a
      dedicated drive (in this case, the last component).</dd>
  <dt>RAID 5</dt>
  <dd>provides data striping across the components, with parity distributed
      across all the components.</dd>
</dl>
<p class="Pp">There are a wide variety of other RAID levels supported by
    RAIDframe. The configuration file options to enable them are briefly
    outlined at the end of this section.</p>
<p class="Pp">Depending on the parity level configured, the device driver can
    support the failure of component drives. The number of failures allowed
    depends on the parity level selected. If the driver is able to handle drive
    failures, and a drive does fail, then the system is operating in
    &quot;degraded mode&quot;. In this mode, all missing data must be
    reconstructed from the data and parity present on the other components. This
    results in much slower data accesses, but does mean that a failure need not
    bring the system to a complete halt.</p>
<p class="Pp">The RAID driver supports and enforces the use of &#x2018;component
    labels&#x2019;. A &#x2018;component label&#x2019; contains important
    information about the component, including a user-specified serial number,
    the row and column of that component in the RAID set, and whether the data
    (and parity) on the component is &#x2018;clean&#x2019;. The component label
    currently lives at the half-way point of the &#x2018;reserved
    section&#x2019; located at the beginning of each component. This
    &#x2018;reserved section&#x2019; is RF_PROTECTED_SECTORS in length (64
    blocks or 32Kbytes) and the component label is currently 1Kbyte in size.</p>
<p class="Pp">If the driver determines that the component labels are very
    inconsistent with respect to each other (e.g. two or more serial numbers do
    not match) or that the component label is not consistent with its assigned
    place in the set (e.g. the component label claims the component should be
    the 3rd one in a 6-disk set, but the RAID set has it as the 3rd component in
    a 5-disk set) then the device will fail to configure. If the driver
    determines that exactly one component label seems to be incorrect, and the
    RAID set is being configured as a set that supports a single failure, then
    the RAID set will be allowed to configure, but the incorrectly labeled
    component will be marked as &#x2018;failed&#x2019;, and the RAID set will
    begin operation in degraded mode. If all of the components are consistent
    among themselves, the RAID set will configure normally.</p>
<p class="Pp">Component labels are also used to support the auto-detection and
    autoconfiguration of RAID sets. A RAID set can be flagged as
    autoconfigurable, in which case it will be configured automatically during
    the kernel boot process. RAID file systems which are automatically
    configured are also eligible to be the root file system. There is currently
    only limited support (alpha, amd64, i386, pmax, sparc, sparc64, and vax
    architectures) for booting a kernel directly from a RAID 1 set, and no
    support for booting from any other RAID sets. To use a RAID set as the root
    file system, a kernel is usually obtained from a small non-RAID partition,
    after which any autoconfiguring RAID set can be used for the root file
    system. See <a class="Xr">raidctl(8)</a> for more information on
    autoconfiguration of RAID sets. Note that with autoconfiguration of RAID
    sets, it is no longer necessary to hard-code SCSI IDs of drives. The
    autoconfiguration code will correctly configure a device even after any
    number of the components have had their device IDs changed or device names
    changed.</p>
<p class="Pp">The driver supports &#x2018;hot spares&#x2019;, disks which are
    on-line, but are not actively used in an existing file system. Should a disk
    fail, the driver is capable of reconstructing the failed disk onto a hot
    spare or back onto a replacement drive. If the components are hot swappable,
    the failed disk can then be removed, a new disk put in its place, and a
    copyback operation performed. The copyback operation, as its name indicates,
    will copy the reconstructed data from the hot spare to the previously failed
    (and now replaced) disk. Hot spares can also be hot-added using
    <a class="Xr">raidctl(8)</a>.</p>
<p class="Pp">If a component cannot be detected when the RAID device is
    configured, that component will be simply marked as 'failed'.</p>
<p class="Pp">The user-land utility for doing all <code class="Nm">raid</code>
    configuration and other operations is <a class="Xr">raidctl(8)</a>. Most
    importantly, <a class="Xr">raidctl(8)</a> must be used with the
    <code class="Fl">-i</code> option to initialize all RAID sets. In
    particular, this initialization includes re-building the parity data. This
    rebuilding of parity data is also required when either a) a new RAID device
    is brought up for the first time or b) after an un-clean shutdown of a RAID
    device. By using the <code class="Fl">-P</code> option to
    <a class="Xr">raidctl(8)</a>, and performing this on-demand recomputation of
    all parity before doing a <a class="Xr">fsck(8)</a> or a
    <a class="Xr">newfs(8)</a>, file system integrity and parity integrity can
    be ensured. It bears repeating again that parity recomputation is
    <var class="Ar">required</var> before any file systems are created or used
    on the RAID device. If the parity is not correct, then missing data cannot
    be correctly recovered.</p>
<p class="Pp">RAID levels may be combined in a hierarchical fashion. For
    example, a RAID 0 device can be constructed out of a number of RAID 5
    devices (which, in turn, may be constructed out of the physical disks, or of
    other RAID devices).</p>
<p class="Pp">The first step to using the <code class="Nm">raid</code> driver is
    to ensure that it is suitably configured in the kernel. This is done by
    adding a line similar to:</p>
<div class="Bd Pp Bd-indent">
<pre>pseudo-device   raid         # RAIDframe disk device</pre>
</div>
<p class="Pp">to the kernel configuration file. The RAIDframe drivers are
    configured dynamically as needed. To turn on component auto-detection and
    autoconfiguration of RAID sets, simply add:</p>
<div class="Bd Pp Bd-indent">
<pre>options RAID_AUTOCONFIG</pre>
</div>
<p class="Pp">to the kernel configuration file.</p>
<p class="Pp">All component partitions must be of the type
    <code class="Dv">FS_BSDFFS</code> (e.g. 4.2BSD) or
    <code class="Dv">FS_RAID</code>. The use of the latter is strongly
    encouraged, and is required if autoconfiguration of the RAID set is desired.
    Since RAIDframe leaves room for disklabels, RAID components can be simply
    raw disks, or partitions which use an entire disk.</p>
<p class="Pp">A more detailed treatment of actually using a
    <code class="Nm">raid</code> device is found in
    <a class="Xr">raidctl(8)</a>. It is highly recommended that the steps to
    reconstruct, copyback, and re-compute parity are well understood by the
    system administrator(s) <var class="Ar">before</var> a component failure.
    Doing the wrong thing when a component fails may result in data loss.</p>
<p class="Pp">Additional internal consistency checking can be enabled by
    specifying:</p>
<div class="Bd Pp Bd-indent">
<pre>options RAID_DIAGNOSTIC</pre>
</div>
<p class="Pp">These assertions are disabled by default in order to improve
    performance.</p>
<p class="Pp">RAIDframe supports an access tracing facility for tracking both
    requests made and performance of various parts of the RAID systems as the
    request is processed. To enable this tracing the following option may be
    specified:</p>
<div class="Bd Pp Bd-indent">
<pre>options RF_ACC_TRACE=1</pre>
</div>
<p class="Pp">For extensive debugging there are a number of kernel options which
    will aid in performing extra diagnosis of various parts of the RAIDframe
    sub-systems. Note that in order to make full use of these options it is
    often necessary to enable one or more debugging options as listed in
    <span class="Pa">src/sys/dev/raidframe/rf_options.h</span>. As well, these
    options are also only typically useful for people who wish to debug various
    parts of RAIDframe. The options include:</p>
<p class="Pp">For debugging the code which maps RAID addresses to physical
    addresses:</p>
<div class="Bd Pp Bd-indent">
<pre>options RF_DEBUG_MAP=1</pre>
</div>
<p class="Pp">Parity stripe status debugging is enabled with:</p>
<div class="Bd Pp Bd-indent">
<pre>options RF_DEBUG_PSS=1</pre>
</div>
<p class="Pp">Additional debugging for queuing is enabled with:</p>
<div class="Bd Pp Bd-indent">
<pre>options RF_DEBUG_QUEUE=1</pre>
</div>
<p class="Pp">Problems with non-quiescent file systems should be easier to debug
    if the following is enabled:</p>
<div class="Bd Pp Bd-indent">
<pre>options RF_DEBUG_QUIESCE=1</pre>
</div>
<p class="Pp">Stripelock debugging is enabled with:</p>
<div class="Bd Pp Bd-indent">
<pre>options RF_DEBUG_STRIPELOCK=1</pre>
</div>
<p class="Pp">Additional diagnostic checks during reconstruction are enabled
    with:</p>
<div class="Bd Pp Bd-indent">
<pre>options RF_DEBUG_RECON=1</pre>
</div>
<p class="Pp">Validation of the DAGs (Directed Acyclic Graphs) used to describe
    an I/O access can be performed when the following is enabled:</p>
<div class="Bd Pp Bd-indent">
<pre>options RF_DEBUG_VALIDATE_DAG=1</pre>
</div>
<p class="Pp">Additional diagnostics during parity verification are enabled
    with:</p>
<div class="Bd Pp Bd-indent">
<pre>options RF_DEBUG_VERIFYPARITY=1</pre>
</div>
<p class="Pp">There are a number of less commonly used RAID levels supported by
    RAIDframe. These additional RAID types should be considered experimental,
    and may not be ready for production use. The various types and the options
    to enable them are shown here:</p>
<p class="Pp">For Even-Odd parity:</p>
<div class="Bd Pp Bd-indent">
<pre>options RF_INCLUDE_EVENODD=1</pre>
</div>
<p class="Pp">For RAID level 5 with rotated sparing:</p>
<div class="Bd Pp Bd-indent">
<pre>options RF_INCLUDE_RAID5_RS=1</pre>
</div>
<p class="Pp">For Parity Logging (highly experimental):</p>
<div class="Bd Pp Bd-indent">
<pre>options RF_INCLUDE_PARITYLOGGING=1</pre>
</div>
<p class="Pp">For Chain Declustering:</p>
<div class="Bd Pp Bd-indent">
<pre>options RF_INCLUDE_CHAINDECLUSTER=1</pre>
</div>
<p class="Pp">For Interleaved Declustering:</p>
<div class="Bd Pp Bd-indent">
<pre>options RF_INCLUDE_INTERDECLUSTER=1</pre>
</div>
<p class="Pp">For Parity Declustering:</p>
<div class="Bd Pp Bd-indent">
<pre>options RF_INCLUDE_PARITY_DECLUSTERING=1</pre>
</div>
<p class="Pp">For Parity Declustering with Distributed Spares:</p>
<div class="Bd Pp Bd-indent">
<pre>options RF_INCLUDE_PARITY_DECLUSTERING_DS=1</pre>
</div>
<p class="Pp">The reader is referred to the RAIDframe documentation mentioned in
    the <a class="Sx" href="#HISTORY">HISTORY</a> section for more detail on
    these various RAID configurations.</p>
</section>
<section class="Sh">
<h1 class="Sh" id="WARNINGS"><a class="permalink" href="#WARNINGS">WARNINGS</a></h1>
<p class="Pp">Certain RAID levels (1, 4, 5, 6, and others) can protect against
    some data loss due to component failure. However the loss of two components
    of a RAID 4 or 5 system, or the loss of a single component of a RAID 0
    system, will result in the entire file systems on that RAID device being
    lost. RAID is <var class="Ar">NOT</var> a substitute for good backup
    practices.</p>
<p class="Pp">Recomputation of parity <var class="Ar">MUST</var> be performed
    whenever there is a chance that it may have been compromised. This includes
    after system crashes, or before a RAID device has been used for the first
    time. Failure to keep parity correct will be catastrophic should a component
    ever fail &#x2014; it is better to use RAID 0 and get the additional space
    and speed, than it is to use parity, but not keep the parity correct. At
    least with RAID 0 there is no perception of increased data security.</p>
</section>
<section class="Sh">
<h1 class="Sh" id="FILES"><a class="permalink" href="#FILES">FILES</a></h1>
<dl class="Bl-tag Bl-compact">
  <dt><span class="Pa">/dev/{,r}raid*</span></dt>
  <dd><code class="Nm">raid</code> device special files.</dd>
</dl>
</section>
<section class="Sh">
<h1 class="Sh" id="SEE_ALSO"><a class="permalink" href="#SEE_ALSO">SEE
  ALSO</a></h1>
<p class="Pp"><a class="Xr">config(1)</a>, <a class="Xr">sd(4)</a>,
    <a class="Xr">fsck(8)</a>, <a class="Xr">MAKEDEV(8)</a>,
    <a class="Xr">mount(8)</a>, <a class="Xr">newfs(8)</a>,
    <a class="Xr">raidctl(8)</a></p>
</section>
<section class="Sh">
<h1 class="Sh" id="HISTORY"><a class="permalink" href="#HISTORY">HISTORY</a></h1>
<p class="Pp">The <code class="Nm">raid</code> driver in
    <span class="Ux">NetBSD</span> is a port of RAIDframe, a framework for rapid
    prototyping of RAID structures developed by the folks at the Parallel Data
    Laboratory at Carnegie Mellon University (CMU). RAIDframe, as originally
    distributed by CMU, provides a RAID simulator for a number of different
    architectures, and a user-level device driver and a kernel device driver for
    Digital Unix. The <code class="Nm">raid</code> driver is a kernelized
    version of RAIDframe v1.1.</p>
<p class="Pp">A more complete description of the internals and functionality of
    RAIDframe is found in the paper &quot;RAIDframe: A Rapid Prototyping Tool
    for RAID Systems&quot;, by William V. Courtright II, Garth Gibson, Mark
    Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the Parallel
    Data Laboratory of Carnegie Mellon University. The
    <code class="Nm">raid</code> driver first appeared in
    <span class="Ux">NetBSD 1.4</span>.</p>
<p class="Pp">RAIDframe was ported to <span class="Ux">NetBSD</span> by Greg
    Oster in 1998, who has maintained it since. In 1999, component labels,
    spares, automatic rebuilding of parity, and autoconfiguration of volumes
    were added. In 2000, root on RAID support was added (initially, with no
    support for loading kernels from RAID volumes, which has been added to many
    ports since.) In 2009, support for parity bimap was added, reducing parity
    resync time after a crash. In 2010, support for larger than 2TiB and non-512
    sector devices was added. In 2018, support for 32-bit userland compatibility
    was added. In 2021, support for autoconfiguration from other-endian raid
    sets was added.</p>
<p class="Pp">Support for loading kernels from RAID 1 partitions was added for
    the pmax, alpha, i386, and vax ports in 2000, the sgimips port in 2001, the
    sparc64 and amd64 ports in 2002, the arc port in 2005, the sparc, and
    landisk ports in 2006, the cobalt port in 2007, the ofppc port in 2008, the
    bebox port in 2010, the emips port in 2011, and the sandpoint port in
  2012.</p>
</section>
<section class="Sh">
<h1 class="Sh" id="COPYRIGHT"><a class="permalink" href="#COPYRIGHT">COPYRIGHT</a></h1>
<div class="Bd">
<pre>The RAIDframe Copyright is as follows:

Copyright (c) 1994-1996 Carnegie-Mellon University.
All rights reserved.

Permission to use, copy, modify and distribute this software and
its documentation is hereby granted, provided that both the copyright
notice and this permission notice appear in all copies of the
software, derivative works or modified versions, and any portions
thereof, and that both notices appear in supporting documentation.

CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS &quot;AS IS&quot;
CONDITION.  CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.

Carnegie Mellon requests users of this software to return to

 Software Distribution Coordinator  or  Software.Distribution@CS.CMU.EDU
 School of Computer Science
 Carnegie Mellon University
 Pittsburgh PA 15213-3890

any improvements or extensions that they make and grant Carnegie the
rights to redistribute these changes.</pre>
</div>
</section>
</div>
<table class="foot">
  <tr>
    <td class="foot-date">May 26, 2021</td>
    <td class="foot-os">NetBSD 10.1</td>
  </tr>
</table>