1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
|
.\" $NetBSD: raidctl.8,v 1.82 2023/09/25 21:59:38 oster Exp $
.\"
.\" Copyright (c) 1998, 2002 The NetBSD Foundation, Inc.
.\" All rights reserved.
.\"
.\" This code is derived from software contributed to The NetBSD Foundation
.\" by Greg Oster
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
.\" POSSIBILITY OF SUCH DAMAGE.
.\"
.\"
.\" Copyright (c) 1995 Carnegie-Mellon University.
.\" All rights reserved.
.\"
.\" Author: Mark Holland
.\"
.\" Permission to use, copy, modify and distribute this software and
.\" its documentation is hereby granted, provided that both the copyright
.\" notice and this permission notice appear in all copies of the
.\" software, derivative works or modified versions, and any portions
.\" thereof, and that both notices appear in supporting documentation.
.\"
.\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
.\" CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
.\" FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
.\"
.\" Carnegie Mellon requests users of this software to return to
.\"
.\" Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU
.\" School of Computer Science
.\" Carnegie Mellon University
.\" Pittsburgh PA 15213-3890
.\"
.\" any improvements or extensions that they make and grant Carnegie the
.\" rights to redistribute these changes.
.\"
.Dd September 25, 2023
.Dt RAIDCTL 8
.Os
.Sh NAME
.Nm raidctl
.Nd configuration utility for the RAIDframe disk driver
.Sh SYNOPSIS
.Nm
.Ar dev
.Ar command
.Op Ar arg Op ...
.Nm
.Op Fl v
.Fl A Op yes | no | forceroot | softroot
.Ar dev
.Nm
.Op Fl v
.Fl a Ar component Ar dev
.Nm
.Op Fl v
.Fl C Ar config_file Ar dev
.Nm
.Op Fl v
.Fl c Ar config_file Ar dev
.Nm
.Op Fl v
.Fl F Ar component Ar dev
.Nm
.Op Fl v
.Fl f Ar component Ar dev
.Nm
.Op Fl v
.Fl G Ar dev
.Nm
.Op Fl v
.Fl g Ar component Ar dev
.Nm
.Op Fl v
.Fl I Ar serial_number Ar dev
.Nm
.Op Fl v
.Fl i Ar dev
.Nm
.Op Fl v
.Fl L Ar dev
.Nm
.Op Fl v
.Fl M
.Oo yes | no | set
.Ar params
.Oc
.Ar dev
.Nm
.Op Fl v
.Fl m Ar dev
.Nm
.Op Fl v
.Fl P Ar dev
.Nm
.Op Fl v
.Fl p Ar dev
.Nm
.Op Fl v
.Fl R Ar component Ar dev
.Nm
.Op Fl v
.Fl r Ar component Ar dev
.Nm
.Op Fl v
.Fl S Ar dev
.Nm
.Op Fl v
.Fl s Ar dev
.Nm
.Op Fl v
.Fl t Ar config_file
.Nm
.Op Fl v
.Fl U Ar unit Ar dev
.Nm
.Op Fl v
.Fl u Ar dev
.Sh DESCRIPTION
.Nm
is the user-land control program for
.Xr raid 4 ,
the RAIDframe disk device.
.Nm
is primarily used to dynamically configure and unconfigure RAIDframe disk
devices.
For more information about the RAIDframe disk device, see
.Xr raid 4 .
.Pp
This document assumes the reader has at least rudimentary knowledge of
RAID and RAID concepts.
.Pp
The simplified command-line options for
.Nm
are as follows:
.Bl -tag -width indent
.It Ic create Ar level Ar component1 Ar component2 Ar ...
where
.Ar level
specifies the RAID level and is one of
.Ar 0
,
.Ar 1
(or
.Ar mirror
), or
.Ar 5
and each of
.Ar componentN
specify the devices to be configured into the RAID set.
.El
.Pp
The advanced command-line options for
.Nm
are as follows:
.Bl -tag -width indent
.It Fl A Ic yes Ar dev
Make the RAID set auto-configurable.
The RAID set will be automatically configured at boot
.Ar before
the root file system is mounted.
Note that all components of the set must be of type
.Dv RAID
in the disklabel.
.It Fl A Ic no Ar dev
Turn off auto-configuration for the RAID set.
.It Fl A Ic forceroot Ar dev
Make the RAID set auto-configurable, and also mark the set as being
eligible to be the root partition.
A RAID set configured this way will
.Ar override
the use of the boot disk as the root device.
All components of the set must be of type
.Dv RAID
in the disklabel.
Note that only certain architectures
(currently arc, alpha, amd64, bebox, cobalt, emips, evbarm, i386, landisk,
ofppc, pmax, riscv, sandpoint, sgimips, sparc, sparc64, and vax)
support booting a kernel directly from a RAID set.
Please note that
.Ic forceroot
mode was referred to as
.Ic root
mode on earlier versions of
.Nx .
For compatibility reasons,
.Ic root
can be used as an alias for
.Ic forceroot .
.It Fl A Ic softroot Ar dev
Like
.Ic forceroot ,
but only change the root device if the boot device is part of the RAID set.
.It Fl a Ar component Ar dev
Add
.Ar component
as a hot spare for the device
.Ar dev .
Component labels (which identify the location of a given
component within a particular RAID set) are automatically added to the
hot spare after it has been used and are not required for
.Ar component
before it is used.
.It Fl C Ar config_file Ar dev
As for
.Fl c ,
but forces the configuration to take place.
Fatal errors due to uninitialized components are ignored.
This is required the first time a RAID set is configured.
.It Fl c Ar config_file Ar dev
Configure the RAIDframe device
.Ar dev
according to the configuration given in
.Ar config_file .
A description of the contents of
.Ar config_file
is given later.
.It Fl F Ar component Ar dev
Fails the specified
.Ar component
of the device, and immediately begin a reconstruction of the failed
disk onto an available hot spare.
This is one of the mechanisms used to start
the reconstruction process if a component does have a hardware failure.
.It Fl f Ar component Ar dev
This marks the specified
.Ar component
as having failed, but does not initiate a reconstruction of that component.
.It Fl G Ar dev
Generate the configuration of the RAIDframe device in a format suitable for
use with the
.Fl c
or
.Fl C
options.
.It Fl g Ar component Ar dev
Get the component label for the specified component.
.It Fl I Ar serial_number Ar dev
Initialize the component labels on each component of the device.
.Ar serial_number
is used as one of the keys in determining whether a
particular set of components belong to the same RAID set.
While not strictly enforced, different serial numbers should be used for
different RAID sets.
This step
.Em MUST
be performed when a new RAID set is created.
.It Fl i Ar dev
Initialize the RAID device.
In particular, (re-)write the parity on the selected device.
This
.Em MUST
be done for
.Em all
RAID sets before the RAID device is labeled and before
file systems are created on the RAID device.
.It Fl L Ar dev
Rescan all devices on the system, looking for RAID sets that can be
auto-configured. The RAID device provided here has to be a valid
device, but does not need to be configured. (e.g.
.Bd -literal -offset indent
raidctl -L raid0
.Ed
.Pp
is all that is needed to perform a rescan.)
.It Fl M Ic yes Ar dev
.\"XXX should there be a section with more info on the parity map feature?
Enable the use of a parity map on the RAID set; this is the default,
and greatly reduces the time taken to check parity after unclean
shutdowns at the cost of some very slight overhead during normal
operation.
Changes to this setting will take effect the next time the set is
configured.
Note that RAID-0 sets, having no parity, will not use a parity map in
any case.
.It Fl M Ic no Ar dev
Disable the use of a parity map on the RAID set; doing this is not
recommended.
This will take effect the next time the set is configured.
.It Fl M Ic set Ar cooldown Ar tickms Ar regions Ar dev
Alter the parameters of the parity map; parameters to leave unchanged
can be given as 0, and trailing zeroes may be omitted.
.\"XXX should this explanation be deferred to another section as well?
The RAID set is divided into
.Ar regions
regions; each region is marked dirty for at most
.Ar cooldown
intervals of
.Ar tickms
milliseconds each after a write to it, and at least
.Ar cooldown
\- 1 such intervals.
Changes to
.Ar regions
take effect the next time is configured, while changes to the other
parameters are applied immediately.
The default parameters are expected to be reasonable for most workloads.
.It Fl m Ar dev
Display status information about the parity map on the RAID set, if any.
If used with
.Fl v
then the current contents of the parity map will be output (in
hexadecimal format) as well.
.It Fl P Ar dev
Check the status of the parity on the RAID set, and initialize
(re-write) the parity if the parity is not known to be up-to-date.
This is normally used after a system crash (and before a
.Xr fsck 8 )
to ensure the integrity of the parity.
.It Fl p Ar dev
Check the status of the parity on the RAID set.
Displays a status message,
and returns successfully if the parity is up-to-date.
.It Fl R Ar component Ar dev
Fails the specified
.Ar component ,
if necessary, and immediately begins a reconstruction back to
.Ar component .
This is useful for reconstructing back onto a component after
it has been replaced following a failure.
.It Fl r Ar component Ar dev
Remove the specified
.Ar component
from the RAID. The component must be in the failed, spare, or spared state
in order to be removed.
.It Fl S Ar dev
Check the status of parity re-writing and component reconstruction.
The output indicates the amount of progress
achieved in each of these areas.
.It Fl s Ar dev
Display the status of the RAIDframe device for each of the components
and spares.
.It Fl t Ar config_file
Read and parse the
.Ar config_file ,
reporting any errors, then exit.
No raidframe operations are performed.
.It Fl U Ar unit Ar dev
Set the
.Dv last_unit
field in all the raid components, so that the next time the raid
will be autoconfigured it uses that
.Ar unit .
.It Fl u Ar dev
Unconfigure the RAIDframe device.
This does not remove any component labels or change any configuration
settings (e.g. auto-configuration settings) for the RAID set.
.It Fl v
Be more verbose, and provide a progress indicator for operations such
as reconstructions and parity re-writing.
.El
.Pp
The device used by
.Nm
is specified by
.Ar dev .
.Ar dev
may be either the full name of the device, e.g.,
.Pa /dev/rraid0d ,
for the i386 architecture, or
.Pa /dev/rraid0c
for many others, or just simply
.Pa raid0
(for
.Pa /dev/rraid0[cd] ) .
It is recommended that the partitions used to represent the
RAID device are not used for file systems.
.Ss Simple RAID configuration
For simple RAID configurations using RAID levels 0 (simple striping),
1 (mirroring), or 5 (striping with distributed parity)
.Nm
supports command-line configuration of RAID setups without
the use of a configuration file. For example,
.Bd -literal -offset indent
raidctl raid0 create 0 /dev/wd0e /dev/wd1e /dev/wd2e
.Ed
.Pp
will create a RAID level 0 set on the device named
.Pa raid0
using the components
.Pa /dev/wd0e ,
.Pa /dev/wd1e ,
and
.Pa /dev/wd2e .
Similarly,
.Bd -literal -offset indent
raidctl raid0 create mirror absent /dev/wd1e
.Ed
.Pp
will create a RAID level 1 (mirror) set with an absent first component
and
.Pa /dev/wd1e
as the second component. In all cases the resulting RAID device will
be marked as auto-configurable, will have a serial number set (based
on the current time), and parity will be initialized (if the RAID level
has parity and sufficent components are present). Reasonable
performance values are automatically used by default for other
parameters normally specified in the configuration file.
.Pp
.Ss Configuration file
The format of the configuration file is complex, and
only an abbreviated treatment is given here.
In the configuration files, a
.Sq #
indicates the beginning of a comment.
.Pp
There are 4 required sections of a configuration file, and 2
optional sections.
Each section begins with a
.Sq START ,
followed by the section name,
and the configuration parameters associated with that section.
The first section is the
.Sq array
section, and it specifies
the number of columns, and spare disks in the RAID set.
For example:
.Bd -literal -offset indent
START array
3 0
.Ed
.Pp
indicates an array with 3 columns, and 0 spare disks.
Old configurations specified a 3rd value in front of the
number of columns and spare disks.
This old value, if provided, must be specified as 1:
.Bd -literal -offset indent
START array
1 3 0
.Ed
.Pp
The second section, the
.Sq disks
section, specifies the actual components of the device.
For example:
.Bd -literal -offset indent
START disks
/dev/sd0e
/dev/sd1e
/dev/sd2e
.Ed
.Pp
specifies the three component disks to be used in the RAID device.
Disk wedges may also be specified with the NAME=<wedge name> syntax.
If any of the specified drives cannot be found when the RAID device is
configured, then they will be marked as
.Sq failed ,
and the system will operate in degraded mode.
Note that it is
.Em imperative
that the order of the components in the configuration file does not
change between configurations of a RAID device.
Changing the order of the components will result in data loss
if the set is configured with the
.Fl C
option.
In normal circumstances, the RAID set will not configure if only
.Fl c
is specified, and the components are out-of-order.
.Pp
The next section, which is the
.Sq spare
section, is optional, and, if present, specifies the devices to be used as
.Sq hot spares
\(em devices which are on-line,
but are not actively used by the RAID driver unless
one of the main components fail.
A simple
.Sq spare
section might be:
.Bd -literal -offset indent
START spare
/dev/sd3e
.Ed
.Pp
for a configuration with a single spare component.
If no spare drives are to be used in the configuration, then the
.Sq spare
section may be omitted.
.Pp
The next section is the
.Sq layout
section.
This section describes the general layout parameters for the RAID device,
and provides such information as
sectors per stripe unit,
stripe units per parity unit,
stripe units per reconstruction unit,
and the parity configuration to use.
This section might look like:
.Bd -literal -offset indent
START layout
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level
32 1 1 5
.Ed
.Pp
The sectors per stripe unit specifies, in blocks, the interleave
factor; i.e., the number of contiguous sectors to be written to each
component for a single stripe.
Appropriate selection of this value (32 in this example)
is the subject of much research in RAID architectures.
The stripe units per parity unit and
stripe units per reconstruction unit are normally each set to 1.
While certain values above 1 are permitted, a discussion of valid
values and the consequences of using anything other than 1 are outside
the scope of this document.
The last value in this section (5 in this example)
indicates the parity configuration desired.
Valid entries include:
.Bl -tag -width inde
.It 0
RAID level 0.
No parity, only simple striping.
.It 1
RAID level 1.
Mirroring.
The parity is the mirror.
.It 4
RAID level 4.
Striping across components, with parity stored on the last component.
.It 5
RAID level 5.
Striping across components, parity distributed across all components.
.El
.Pp
There are other valid entries here, including those for Even-Odd
parity, RAID level 5 with rotated sparing, Chained declustering,
and Interleaved declustering, but as of this writing the code for
those parity operations has not been tested with
.Nx .
.Pp
The next required section is the
.Sq queue
section.
This is most often specified as:
.Bd -literal -offset indent
START queue
fifo 100
.Ed
.Pp
where the queuing method is specified as fifo (first-in, first-out),
and the size of the per-component queue is limited to 100 requests.
Other queuing methods may also be specified, but a discussion of them
is beyond the scope of this document.
.Pp
The final section, the
.Sq debug
section, is optional.
For more details on this the reader is referred to
the RAIDframe documentation discussed in the
.Sx HISTORY
section.
.Pp
Since
.Nx 10
RAIDframe has been been capable of autoconfiguration of components
originally configured on opposite endian systems. The current label
endianness will be retained.
.Pp
See
.Sx EXAMPLES
for a more complete configuration file example.
.Sh FILES
.Bl -tag -width /dev/XXrXraidX -compact
.It Pa /dev/{,r}raid*
.Cm raid
device special files.
.El
.Sh EXAMPLES
The examples given in this section are for more complex
setups than can be configured with the simplified command-line
configuration option described early.
.Pp
It is highly recommended that before using the RAID driver for real
file systems that the system administrator(s) become quite familiar
with the use of
.Nm ,
and that they understand how the component reconstruction process works.
The examples in this section will focus on configuring a
number of different RAID sets of varying degrees of redundancy.
By working through these examples, administrators should be able to
develop a good feel for how to configure a RAID set, and how to
initiate reconstruction of failed components.
.Pp
In the following examples
.Sq raid0
will be used to denote the RAID device.
Depending on the architecture,
.Pa /dev/rraid0c
or
.Pa /dev/rraid0d
may be used in place of
.Pa raid0 .
.Ss Initialization and Configuration
The initial step in configuring a RAID set is to identify the components
that will be used in the RAID set.
All components should be the same size.
Each component should have a disklabel type of
.Dv FS_RAID ,
and a typical disklabel entry for a RAID component might look like:
.Bd -literal -offset indent
f: 1800000 200495 RAID # (Cyl. 405*- 4041*)
.Ed
.Pp
While
.Dv FS_BSDFFS
will also work as the component type, the type
.Dv FS_RAID
is preferred for RAIDframe use, as it is required for features such as
auto-configuration.
As part of the initial configuration of each RAID set,
each component will be given a
.Sq component label .
A
.Sq component label
contains important information about the component, including a
user-specified serial number, the column of that component in
the RAID set, the redundancy level of the RAID set, a
.Sq modification counter ,
and whether the parity information (if any) on that
component is known to be correct.
Component labels are an integral part of the RAID set,
since they are used to ensure that components
are configured in the correct order, and used to keep track of other
vital information about the RAID set.
Component labels are also required for the auto-detection
and auto-configuration of RAID sets at boot time.
For a component label to be considered valid, that
particular component label must be in agreement with the other
component labels in the set.
For example, the serial number,
.Sq modification counter ,
and number of columns must all be in agreement.
If any of these are different, then the component is
not considered to be part of the set.
See
.Xr raid 4
for more information about component labels.
.Pp
Once the components have been identified, and the disks have
appropriate labels,
.Nm
is then used to configure the
.Xr raid 4
device.
To configure the device, a configuration file which looks something like:
.Bd -literal -offset indent
START array
# numCol numSpare
3 1
START disks
/dev/sd1e
/dev/sd2e
/dev/sd3e
START spare
/dev/sd4e
START layout
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
32 1 1 5
START queue
fifo 100
.Ed
.Pp
is created in a file.
The above configuration file specifies a RAID 5
set consisting of the components
.Pa /dev/sd1e ,
.Pa /dev/sd2e ,
and
.Pa /dev/sd3e ,
with
.Pa /dev/sd4e
available as a
.Sq hot spare
in case one of the three main drives should fail.
A RAID 0 set would be specified in a similar way:
.Bd -literal -offset indent
START array
# numCol numSpare
4 0
START disks
/dev/sd10e
/dev/sd11e
/dev/sd12e
/dev/sd13e
START layout
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
64 1 1 0
START queue
fifo 100
.Ed
.Pp
In this case, devices
.Pa /dev/sd10e ,
.Pa /dev/sd11e ,
.Pa /dev/sd12e ,
and
.Pa /dev/sd13e
are the components that make up this RAID set.
Note that there are no hot spares for a RAID 0 set,
since there is no way to recover data if any of the components fail.
.Pp
For a RAID 1 (mirror) set, the following configuration might be used:
.Bd -literal -offset indent
START array
# numCol numSpare
2 0
START disks
/dev/sd20e
/dev/sd21e
START layout
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
128 1 1 1
START queue
fifo 100
.Ed
.Pp
In this case,
.Pa /dev/sd20e
and
.Pa /dev/sd21e
are the two components of the mirror set.
While no hot spares have been specified in this
configuration, they easily could be, just as they were specified in
the RAID 5 case above.
Note as well that RAID 1 sets are currently limited to only 2 components.
At present, n-way mirroring is not possible.
.Pp
The first time a RAID set is configured, the
.Fl C
option must be used:
.Bd -literal -offset indent
raidctl -C raid0.conf raid0
.Ed
.Pp
where
.Pa raid0.conf
is the name of the RAID configuration file.
The
.Fl C
forces the configuration to succeed, even if any of the component
labels are incorrect.
The
.Fl C
option should not be used lightly in
situations other than initial configurations, as if
the system is refusing to configure a RAID set, there is probably a
very good reason for it.
After the initial configuration is done (and
appropriate component labels are added with the
.Fl I
option) then raid0 can be configured normally with:
.Bd -literal -offset indent
raidctl -c raid0.conf raid0
.Ed
.Pp
When the RAID set is configured for the first time, it is
necessary to initialize the component labels, and to initialize the
parity on the RAID set.
Initializing the component labels is done with:
.Bd -literal -offset indent
raidctl -I 112341 raid0
.Ed
.Pp
where
.Sq 112341
is a user-specified serial number for the RAID set.
This initialization step is
.Em required
for all RAID sets.
As well, using different serial numbers between RAID sets is
.Em strongly encouraged ,
as using the same serial number for all RAID sets will only serve to
decrease the usefulness of the component label checking.
.Pp
Initializing the RAID set is done via the
.Fl i
option.
This initialization
.Em MUST
be done for
.Em all
RAID sets, since among other things it verifies that the parity (if
any) on the RAID set is correct.
Since this initialization may be quite time-consuming, the
.Fl v
option may be also used in conjunction with
.Fl i :
.Bd -literal -offset indent
raidctl -iv raid0
.Ed
.Pp
This will give more verbose output on the
status of the initialization:
.Bd -literal -offset indent
Initiating re-write of parity
Parity Re-write status:
10% |**** | ETA: 06:03 /
.Ed
.Pp
The output provides a
.Sq Percent Complete
in both a numeric and graphical format, as well as an estimated time
to completion of the operation.
.Pp
Since it is the parity that provides the
.Sq redundancy
part of RAID, it is critical that the parity is correct as much as possible.
If the parity is not correct, then there is no
guarantee that data will not be lost if a component fails.
.Pp
Once the parity is known to be correct, it is then safe to perform
.Xr disklabel 8 ,
.Xr newfs 8 ,
or
.Xr fsck 8
on the device or its file systems, and then to mount the file systems
for use.
.Pp
Under certain circumstances (e.g., the additional component has not
arrived, or data is being migrated off of a disk destined to become a
component) it may be desirable to configure a RAID 1 set with only
a single component.
This can be achieved by using the word
.Dq absent
to indicate that a particular component is not present.
In the following:
.Bd -literal -offset indent
START array
# numCol numSpare
2 0
START disks
absent
/dev/sd0e
START layout
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
128 1 1 1
START queue
fifo 100
.Ed
.Pp
.Pa /dev/sd0e
is the real component, and will be the second disk of a RAID 1 set.
The first component is simply marked as being absent.
Configuration (using
.Fl C
and
.Fl I Ar 12345
as above) proceeds normally, but initialization of the RAID set will
have to wait until all physical components are present.
After configuration, this set can be used normally, but will be operating
in degraded mode.
Once a second physical component is obtained, it can be hot-added,
the existing data mirrored, and normal operation resumed.
.Pp
The size of the resulting RAID set will depend on the number of data
components in the set.
Space is automatically reserved for the component labels, and
the actual amount of space used
for data on a component will be rounded down to the largest possible
multiple of the sectors per stripe unit (sectPerSU) value.
Thus, the amount of space provided by the RAID set will be less
than the sum of the size of the components.
.Ss Maintenance of the RAID set
After the parity has been initialized for the first time, the command:
.Bd -literal -offset indent
raidctl -p raid0
.Ed
.Pp
can be used to check the current status of the parity.
To check the parity and rebuild it necessary (for example,
after an unclean shutdown) the command:
.Bd -literal -offset indent
raidctl -P raid0
.Ed
.Pp
is used.
Note that re-writing the parity can be done while
other operations on the RAID set are taking place (e.g., while doing a
.Xr fsck 8
on a file system on the RAID set).
However: for maximum effectiveness of the RAID set, the parity should be
known to be correct before any data on the set is modified.
.Pp
To see how the RAID set is doing, the following command can be used to
show the RAID set's status:
.Bd -literal -offset indent
raidctl -s raid0
.Ed
.Pp
The output will look something like:
.Bd -literal -offset indent
Components:
/dev/sd1e: optimal
/dev/sd2e: optimal
/dev/sd3e: optimal
Spares:
/dev/sd4e: spare
Component label for /dev/sd1e:
Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
Version: 2 Serial Number: 13432 Mod Counter: 65
Clean: No Status: 0
sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
RAID Level: 5 blocksize: 512 numBlocks: 1799936
Autoconfig: No
Last configured as: raid0
Component label for /dev/sd2e:
Row: 0 Column: 1 Num Rows: 1 Num Columns: 3
Version: 2 Serial Number: 13432 Mod Counter: 65
Clean: No Status: 0
sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
RAID Level: 5 blocksize: 512 numBlocks: 1799936
Autoconfig: No
Last configured as: raid0
Component label for /dev/sd3e:
Row: 0 Column: 2 Num Rows: 1 Num Columns: 3
Version: 2 Serial Number: 13432 Mod Counter: 65
Clean: No Status: 0
sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
RAID Level: 5 blocksize: 512 numBlocks: 1799936
Autoconfig: No
Last configured as: raid0
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
.Ed
.Pp
This indicates that all is well with the RAID set.
Of importance here are the component lines which read
.Sq optimal ,
and the
.Sq Parity status
line.
.Sq Parity status: clean
indicates that the parity is up-to-date for this RAID set,
whether or not the RAID set is in redundant or degraded mode.
.Sq Parity status: DIRTY
indicates that it is not known if the parity information is
consistent with the data, and that the parity information needs
to be checked.
Note that if there are file systems open on the RAID set,
the individual components will not be
.Sq clean
but the set as a whole can still be clean.
.Pp
To check the component label of
.Pa /dev/sd1e ,
the following is used:
.Bd -literal -offset indent
raidctl -g /dev/sd1e raid0
.Ed
.Pp
The output of this command will look something like:
.Bd -literal -offset indent
Component label for /dev/sd1e:
Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
Version: 2 Serial Number: 13432 Mod Counter: 65
Clean: No Status: 0
sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
RAID Level: 5 blocksize: 512 numBlocks: 1799936
Autoconfig: No
Last configured as: raid0
.Ed
.Ss Dealing with Component Failures
If for some reason
(perhaps to test reconstruction) it is necessary to pretend a drive
has failed, the following will perform that function:
.Bd -literal -offset indent
raidctl -f /dev/sd2e raid0
.Ed
.Pp
The system will then be performing all operations in degraded mode,
where missing data is re-computed from existing data and the parity.
In this case, obtaining the status of raid0 will return (in part):
.Bd -literal -offset indent
Components:
/dev/sd1e: optimal
/dev/sd2e: failed
/dev/sd3e: optimal
Spares:
/dev/sd4e: spare
.Ed
.Pp
Note that with the use of
.Fl f
a reconstruction has not been started.
To both fail the disk and start a reconstruction, the
.Fl F
option must be used:
.Bd -literal -offset indent
raidctl -F /dev/sd2e raid0
.Ed
.Pp
The
.Fl f
option may be used first, and then the
.Fl F
option used later, on the same disk, if desired.
Immediately after the reconstruction is started, the status will report:
.Bd -literal -offset indent
Components:
/dev/sd1e: optimal
/dev/sd2e: reconstructing
/dev/sd3e: optimal
Spares:
/dev/sd4e: used_spare
[...]
Parity status: clean
Reconstruction is 10% complete.
Parity Re-write is 100% complete.
.Ed
.Pp
This indicates that a reconstruction is in progress.
To find out how the reconstruction is progressing the
.Fl S
option may be used.
This will indicate the progress in terms of the
percentage of the reconstruction that is completed.
When the reconstruction is finished the
.Fl s
option will show:
.Bd -literal -offset indent
Components:
/dev/sd1e: optimal
/dev/sd4e: optimal
/dev/sd3e: optimal
No spares.
[...]
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
.Ed
.Pp
as
.Pa /dev/sd2e
has been removed and replaced with
.Pa /dev/sd4e .
.Pp
If a component fails and there are no hot spares
available on-line, the status of the RAID set might (in part) look like:
.Bd -literal -offset indent
Components:
/dev/sd1e: optimal
/dev/sd2e: failed
/dev/sd3e: optimal
No spares.
.Ed
.Pp
In this case there are a number of options.
The first option is to add a hot spare using:
.Bd -literal -offset indent
raidctl -a /dev/sd4e raid0
.Ed
.Pp
After the hot add, the status would then be:
.Bd -literal -offset indent
Components:
/dev/sd1e: optimal
/dev/sd2e: failed
/dev/sd3e: optimal
Spares:
/dev/sd4e: spare
.Ed
.Pp
Reconstruction could then take place using
.Fl F
as described above.
.Pp
A second option is to rebuild directly onto
.Pa /dev/sd2e .
Once the disk containing
.Pa /dev/sd2e
has been replaced, one can simply use:
.Bd -literal -offset indent
raidctl -R /dev/sd2e raid0
.Ed
.Pp
to rebuild the
.Pa /dev/sd2e
component.
As the rebuilding is in progress, the status will be:
.Bd -literal -offset indent
Components:
/dev/sd1e: optimal
/dev/sd2e: reconstructing
/dev/sd3e: optimal
No spares.
.Ed
.Pp
and when completed, will be:
.Bd -literal -offset indent
Components:
/dev/sd1e: optimal
/dev/sd2e: optimal
/dev/sd3e: optimal
No spares.
.Ed
.Pp
In circumstances where a particular component is completely
unavailable after a reboot, a special component name will be used to
indicate the missing component.
For example:
.Bd -literal -offset indent
Components:
/dev/sd2e: optimal
component1: failed
No spares.
.Ed
.Pp
indicates that the second component of this RAID set was not detected
at all by the auto-configuration code.
The name
.Sq component1
can be used anywhere a normal component name would be used.
For example, to add a hot spare to the above set, and rebuild to that hot
spare, the following could be done:
.Bd -literal -offset indent
raidctl -a /dev/sd3e raid0
raidctl -F component1 raid0
.Ed
.Pp
at which point the data missing from
.Sq component1
would be reconstructed onto
.Pa /dev/sd3e .
.Pp
When more than one component is marked as
.Sq failed
due to a non-component hardware failure (e.g., loss of power to two
components, adapter problems, termination problems, or cabling issues) it
is quite possible to recover the data on the RAID set.
The first thing to be aware of is that the first disk to fail will
almost certainly be out-of-sync with the remainder of the array.
If any IO was performed between the time the first component is considered
.Sq failed
and when the second component is considered
.Sq failed ,
then the first component to fail will
.Em not
contain correct data, and should be ignored.
When the second component is marked as failed, however, the RAID device will
(currently) panic the system.
At this point the data on the RAID set
(not including the first failed component) is still self consistent,
and will be in no worse state of repair than had the power gone out in
the middle of a write to a file system on a non-RAID device.
The problem, however, is that the component labels may now have 3 different
.Sq modification counters
(one value on the first component that failed, one value on the second
component that failed, and a third value on the remaining components).
In such a situation, the RAID set will not autoconfigure,
and can only be forcibly re-configured
with the
.Fl C
option.
To recover the RAID set, one must first remedy whatever physical
problem caused the multiple-component failure.
After that is done, the RAID set can be restored by forcibly
configuring the raid set
.Em without
the component that failed first.
For example, if
.Pa /dev/sd1e
and
.Pa /dev/sd2e
fail (in that order) in a RAID set of the following configuration:
.Bd -literal -offset indent
START array
4 0
START disks
/dev/sd1e
/dev/sd2e
/dev/sd3e
/dev/sd4e
START layout
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
64 1 1 5
START queue
fifo 100
.Ed
.Pp
then the following configuration (say "recover_raid0.conf")
.Bd -literal -offset indent
START array
4 0
START disks
absent
/dev/sd2e
/dev/sd3e
/dev/sd4e
START layout
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
64 1 1 5
START queue
fifo 100
.Ed
.Pp
can be used with
.Bd -literal -offset indent
raidctl -C recover_raid0.conf raid0
.Ed
.Pp
to force the configuration of raid0.
A
.Bd -literal -offset indent
raidctl -I 12345 raid0
.Ed
.Pp
will be required in order to synchronize the component labels.
At this point the file systems on the RAID set can then be checked and
corrected.
To complete the re-construction of the RAID set,
.Pa /dev/sd1e
is simply hot-added back into the array, and reconstructed
as described earlier.
.Ss RAID on RAID
RAID sets can be layered to create more complex and much larger RAID sets.
A RAID 0 set, for example, could be constructed from four RAID 5 sets.
The following configuration file shows such a setup:
.Bd -literal -offset indent
START array
# numCol numSpare
4 0
START disks
/dev/raid1e
/dev/raid2e
/dev/raid3e
/dev/raid4e
START layout
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
128 1 1 0
START queue
fifo 100
.Ed
.Pp
A similar configuration file might be used for a RAID 0 set
constructed from components on RAID 1 sets.
In such a configuration, the mirroring provides a high degree
of redundancy, while the striping provides additional speed benefits.
.Ss Auto-configuration and Root on RAID
RAID sets can also be auto-configured at boot.
To make a set auto-configurable,
simply prepare the RAID set as above, and then do a:
.Bd -literal -offset indent
raidctl -A yes raid0
.Ed
.Pp
to turn on auto-configuration for that set.
To turn off auto-configuration, use:
.Bd -literal -offset indent
raidctl -A no raid0
.Ed
.Pp
RAID sets which are auto-configurable will be configured before the
root file system is mounted.
These RAID sets are thus available for
use as a root file system, or for any other file system.
A primary advantage of using the auto-configuration is that RAID components
become more independent of the disks they reside on.
For example, SCSI ID's can change, but auto-configured sets will always be
configured correctly, even if the SCSI ID's of the component disks
have become scrambled.
.Pp
Having a system's root file system
.Pq Pa /
on a RAID set is also allowed, with the
.Sq a
partition of such a RAID set being used for
.Pa / .
To use raid0a as the root file system, simply use:
.Bd -literal -offset indent
raidctl -A forceroot raid0
.Ed
.Pp
To return raid0a to be just an auto-configuring set simply use the
.Fl A Ar yes
arguments.
.Pp
Note that kernels can only be directly read from RAID 1 components on
architectures that support that
(currently alpha, i386, pmax, sandpoint, sparc, sparc64, and vax).
On those architectures, the
.Dv FS_RAID
file system is recognized by the bootblocks, and will properly load the
kernel directly from a RAID 1 component.
For other architectures, or to support the root file system
on other RAID sets, some other mechanism must be used to get a kernel booting.
For example, a small partition containing only the secondary boot-blocks
and an alternate kernel (or two) could be used.
Once a kernel is booting however, and an auto-configuring RAID set is
found that is eligible to be root, then that RAID set will be
auto-configured and used as the root device.
If two or more RAID sets claim to be root devices, then the
user will be prompted to select the root device.
At this time, RAID 0, 1, 4, and 5 sets are all supported as root devices.
.Pp
A typical RAID 1 setup with root on RAID might be as follows:
.Bl -enum
.It
wd0a - a small partition, which contains a complete, bootable, basic
.Nx
installation.
.It
wd1a - also contains a complete, bootable, basic
.Nx
installation.
.It
wd0e and wd1e - a RAID 1 set, raid0, used for the root file system.
.It
wd0f and wd1f - a RAID 1 set, raid1, which will be used only for
swap space.
.It
wd0g and wd1g - a RAID 1 set, raid2, used for
.Pa /usr ,
.Pa /home ,
or other data, if desired.
.It
wd0h and wd1h - a RAID 1 set, raid3, if desired.
.El
.Pp
RAID sets raid0, raid1, and raid2 are all marked as auto-configurable.
raid0 is marked as being a root file system.
When new kernels are installed, the kernel is not only copied to
.Pa / ,
but also to wd0a and wd1a.
The kernel on wd0a is required, since that
is the kernel the system boots from.
The kernel on wd1a is also
required, since that will be the kernel used should wd0 fail.
The important point here is to have redundant copies of the kernel
available, in the event that one of the drives fail.
.Pp
There is no requirement that the root file system be on the same disk
as the kernel.
For example, obtaining the kernel from wd0a, and using
sd0e and sd1e for raid0, and the root file system, is fine.
It
.Em is
critical, however, that there be multiple kernels available, in the
event of media failure.
.Pp
Multi-layered RAID devices (such as a RAID 0 set made
up of RAID 1 sets) are
.Em not
supported as root devices or auto-configurable devices at this point.
(Multi-layered RAID devices
.Em are
supported in general, however, as mentioned earlier.)
Note that in order to enable component auto-detection and
auto-configuration of RAID devices, the line:
.Bd -literal -offset indent
options RAID_AUTOCONFIG
.Ed
.Pp
must be in the kernel configuration file.
See
.Xr raid 4
for more details.
.Ss Swapping on RAID
A RAID device can be used as a swap device.
In order to ensure that a RAID device used as a swap device
is correctly unconfigured when the system is shutdown or rebooted,
it is recommended that the line
.Bd -literal -offset indent
swapoff=YES
.Ed
.Pp
be added to
.Pa /etc/rc.conf .
.Ss Unconfiguration
The final operation performed by
.Nm
is to unconfigure a
.Xr raid 4
device.
This is accomplished via a simple:
.Bd -literal -offset indent
raidctl -u raid0
.Ed
.Pp
at which point the device is ready to be reconfigured.
.Ss Performance Tuning
Selection of the various parameter values which result in the best
performance can be quite tricky, and often requires a bit of
trial-and-error to get those values most appropriate for a given system.
A whole range of factors come into play, including:
.Bl -enum
.It
Types of components (e.g., SCSI vs. IDE) and their bandwidth
.It
Types of controller cards and their bandwidth
.It
Distribution of components among controllers
.It
IO bandwidth
.It
file system access patterns
.It
CPU speed
.El
.Pp
As with most performance tuning, benchmarking under real-life loads
may be the only way to measure expected performance.
Understanding some of the underlying technology is also useful in tuning.
The goal of this section is to provide pointers to those parameters which may
make significant differences in performance.
.Pp
For a RAID 1 set, a SectPerSU value of 64 or 128 is typically sufficient.
Since data in a RAID 1 set is arranged in a linear
fashion on each component, selecting an appropriate stripe size is
somewhat less critical than it is for a RAID 5 set.
However: a stripe size that is too small will cause large IO's to be
broken up into a number of smaller ones, hurting performance.
At the same time, a large stripe size may cause problems with
concurrent accesses to stripes, which may also affect performance.
Thus values in the range of 32 to 128 are often the most effective.
.Pp
Tuning RAID 5 sets is trickier.
In the best case, IO is presented to the RAID set one stripe at a time.
Since the entire stripe is available at the beginning of the IO,
the parity of that stripe can be calculated before the stripe is written,
and then the stripe data and parity can be written in parallel.
When the amount of data being written is less than a full stripe worth, the
.Sq small write
problem occurs.
Since a
.Sq small write
means only a portion of the stripe on the components is going to
change, the data (and parity) on the components must be updated
slightly differently.
First, the
.Sq old parity
and
.Sq old data
must be read from the components.
Then the new parity is constructed,
using the new data to be written, and the old data and old parity.
Finally, the new data and new parity are written.
All this extra data shuffling results in a serious loss of performance,
and is typically 2 to 4 times slower than a full stripe write (or read).
To combat this problem in the real world, it may be useful
to ensure that stripe sizes are small enough that a
.Sq large IO
from the system will use exactly one large stripe write.
As is seen later, there are some file system dependencies
which may come into play here as well.
.Pp
Since the size of a
.Sq large IO
is often (currently) only 32K or 64K, on a 5-drive RAID 5 set it may
be desirable to select a SectPerSU value of 16 blocks (8K) or 32
blocks (16K).
Since there are 4 data sectors per stripe, the maximum
data per stripe is 64 blocks (32K) or 128 blocks (64K).
Again, empirical measurement will provide the best indicators of which
values will yield better performance.
.Pp
The parameters used for the file system are also critical to good performance.
For
.Xr newfs 8 ,
for example, increasing the block size to 32K or 64K may improve
performance dramatically.
As well, changing the cylinders-per-group
parameter from 16 to 32 or higher is often not only necessary for
larger file systems, but may also have positive performance implications.
.Ss Summary
Despite the length of this man-page, configuring a RAID set is a
relatively straight-forward process.
All that needs to be done is the following steps:
.Bl -enum
.It
Use
.Xr disklabel 8
to create the components (of type RAID).
.It
Construct a RAID configuration file: e.g.,
.Pa raid0.conf
.It
Configure the RAID set with:
.Bd -literal -offset indent
raidctl -C raid0.conf raid0
.Ed
.It
Initialize the component labels with:
.Bd -literal -offset indent
raidctl -I 123456 raid0
.Ed
.It
Initialize other important parts of the set with:
.Bd -literal -offset indent
raidctl -i raid0
.Ed
.It
Get the default label for the RAID set:
.Bd -literal -offset indent
disklabel raid0 > /tmp/label
.Ed
.It
Edit the label:
.Bd -literal -offset indent
vi /tmp/label
.Ed
.It
Put the new label on the RAID set:
.Bd -literal -offset indent
disklabel -R -r raid0 /tmp/label
.Ed
.It
Create the file system:
.Bd -literal -offset indent
newfs /dev/rraid0e
.Ed
.It
Mount the file system:
.Bd -literal -offset indent
mount /dev/raid0e /mnt
.Ed
.It
Use:
.Bd -literal -offset indent
raidctl -c raid0.conf raid0
.Ed
.Pp
To re-configure the RAID set the next time it is needed, or put
.Pa raid0.conf
into
.Pa /etc
where it will automatically be started by the
.Pa /etc/rc.d
scripts.
.El
.Sh SEE ALSO
.Xr ccd 4 ,
.Xr raid 4 ,
.Xr rc 8
.Sh HISTORY
RAIDframe is a framework for rapid prototyping of RAID structures
developed by the folks at the Parallel Data Laboratory at Carnegie
Mellon University (CMU).
A more complete description of the internals and functionality of
RAIDframe is found in the paper "RAIDframe: A Rapid Prototyping Tool
for RAID Systems", by William V. Courtright II, Garth Gibson, Mark
Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the
Parallel Data Laboratory of Carnegie Mellon University.
The
.Nm
command first appeared as a program in CMU's RAIDframe v1.1 distribution.
This version of
.Nm
is a complete re-write, and first appeared in
.Nx 1.4 .
.Sh COPYRIGHT
.Bd -literal
The RAIDframe Copyright is as follows:
Copyright (c) 1994-1996 Carnegie-Mellon University.
All rights reserved.
Permission to use, copy, modify and distribute this software and
its documentation is hereby granted, provided that both the copyright
notice and this permission notice appear in all copies of the
software, derivative works or modified versions, and any portions
thereof, and that both notices appear in supporting documentation.
CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
Carnegie Mellon requests users of this software to return to
Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU
School of Computer Science
Carnegie Mellon University
Pittsburgh PA 15213-3890
any improvements or extensions that they make and grant Carnegie the
rights to redistribute these changes.
.Ed
.Sh WARNINGS
Certain RAID levels (1, 4, 5, 6, and others) can protect against some
data loss due to component failure.
However the loss of two components of a RAID 4 or 5 system,
or the loss of a single component of a RAID 0 system will
result in the entire file system being lost.
RAID is
.Em NOT
a substitute for good backup practices.
.Pp
Recomputation of parity
.Em MUST
be performed whenever there is a chance that it may have been compromised.
This includes after system crashes, or before a RAID
device has been used for the first time.
Failure to keep parity correct will be catastrophic should a
component ever fail \(em it is better to use RAID 0 and get the
additional space and speed, than it is to use parity, but
not keep the parity correct.
At least with RAID 0 there is no perception of increased data security.
.Pp
When replacing a failed component of a RAID set, it is a good
idea to zero out the first 64 blocks of the new component to insure the
RAIDframe driver doesn't erroneously detect a component label in the
new component.
This is particularly true on
.Em RAID 1
sets because there is at most one correct component label in a failed RAID
1 installation, and the RAIDframe driver picks the component label with the
highest serial number and modification value as the authoritative source
for the failed RAID set when choosing which component label to use to
configure the RAID set.
.Sh BUGS
Hot-spare removal is currently not available.
|