diff options
| author | Jacob McDonnell <jacob@jacobmcdonnell.com> | 2026-04-25 21:07:28 -0400 |
|---|---|---|
| committer | Jacob McDonnell <jacob@jacobmcdonnell.com> | 2026-04-25 21:07:28 -0400 |
| commit | 711594636704defae873be1a355a292505585afd (patch) | |
| tree | 59ee13f863830d8beba6cfd02bbe813dd486c26f /static/v10/man8/crash.8 | |
| parent | 3258a063c1f189d7b019e40e525b46bef9b9a7b1 (diff) | |
docs: Added UNIX V10 Manuals
Diffstat (limited to 'static/v10/man8/crash.8')
| -rw-r--r-- | static/v10/man8/crash.8 | 268 |
1 files changed, 268 insertions, 0 deletions
diff --git a/static/v10/man8/crash.8 b/static/v10/man8/crash.8 new file mode 100644 index 00000000..e38a694c --- /dev/null +++ b/static/v10/man8/crash.8 @@ -0,0 +1,268 @@ +.TH CRASH 8 VAX-11 +.UC 4 +.SH NAME +crash \- what happens when the system crashes +.SH DESCRIPTION +This section explains what happens when the system crashes and how +you can get a crash dump for analysis of non-transient problems. +.PP +When the system crashes voluntarily it prints a message of the form +.IP +panic: why i gave up the ghost +.LP +on the console, and then invokes an automatic reboot procedure as +described in +.IR reboot (8). +If the auto-reboot switch is off on the console, then the processor +will simply halt at this point. +Otherwise the registers and the top few locations of the stack will +be printed on the console, and then the system will check the disks +and (unless some unexpected inconsistency is encountered), resume +multi-user operations. +.PP +The system has a large number of internal consistency checks; if one +of these fails, then it will panic with a very short message indicating +which one failed. In the absence of a dump, little can be done about +one of these. If the problem recurs, you should arrange to get a dump +for further analysis by running with auto-reboot disabled during normal +working hours and then following the procedure described below. +.PP +The most common cause of system failures is hardware failure, which +can reflect itself in different ways. Here are the messages which +you are likely to encounter, with some hints as to causes. +Left unstated in all cases is the possibility that hardware or software +error produced the message in some unexpected way. +.TP +IO err in push +.ns +.TP +hard IO err in swap +The system encountered an error trying to write to the paging device +or an error in reading critical information from a disk drive. +You should fix your disk if it is broken or unreliable. +.TP +Timeout table overflow +.ns +.TP +ran out of bdp's +.ns +.TP +ran out of uba map +These really shouldn't be panics, but until we fix up the data structures +involved, running out of entries causes a crash. If the timeout table +overflows, you should make it bigger. If you run out of bdp's or uba map +you probably have a buggy device driver in your system, allocating and +not releasing UNIBUS resources. +.TP +KSP not valid +.ns +.TP +SBI fault +.ns +.TP +Machine check +.ns +.TP +CHM? in kernel +These indicate either a serious bug in the system or, more often, +a glitch or failing hardware. For the machine check, the top part of +the resulting stack frame gives more information. You can refer to a +VAX 11/780 System Maintenance Guide for information on machine checks. +If machine checks or SBI faults recur, check out the hardware or call +field service. If the other faults recur, there is likely a bug somewhere +in the system, although these can be caused by a flakey processor. +Run processor microdiagnostics. +.TP +trap type %d, code=%d +A unexpected trap has occurred within the system; the trap types are: +.RS +.TP 10 +0 +reserved addressing mode +.br +.ns +.TP 10 +1 +privileged instruction +.br +.ns +.TP 10 +2 +BPT +.br +.ns +.TP 10 +3 +XFC +.br +.ns +.TP 10 +4 +reserved operand +.br +.ns +.TP 10 +5 +CHMK (system call) +.br +.ns +.TP 10 +6 +arithmetic trap +.br +.ns +.TP 10 +7 +reschedule trap (software level 3) +.br +.ns +.TP 10 +8 +segmentation fault +.br +.ns +.TP 10 +9 +protection fault +.br +.ns +.TP 10 +10 +trace pending (TP bit) +.RE +.IP +The favorite trap type in system crashes is trap type 9, indicating +a wild reference. The code is the referenced address. If you look +down the stack, just after the trap type and the code are the pc and +the ps of the processor when it trapped, showing you where in the +system the problem occurred. These problems tend to be easy to track +down if they are kernel bugs since the processor stops cold, but random +flakiness seems to cause this sometimes, e.g. we have trapped with +code 80000800 three times in six months as an instruction fetch went across +this page boundary in the kernel but have been unable to find any reason +for this to have happened. +.TP +init died +The system initialization process has exited. This is bad news, as no new +users will then be able to log in. Rebooting is the only fix, so the +system just does it right away. +.PP +That completes the list of panic types you are likely to see. +Now for the crash dump procedure: +.PP +At the moment a dump can be taken only on magnetic tape. +Before you do anything, be sure that a clean tape is mounted with a ring-in +on the tape drive if you plan to make a dump. +.PP +Write the date and time on the console log. +Use the console commands to examine the registers, program status long word, +and the top several locations on the stack. +A suggested command sequence, which is executed by the \*(lq@DUMP\*(rq +console command script, is: +.DS +.nf + E PSL<return> + E R0/NE:F<return> + E SP<return> + E/V @ /NE:40<return> +.fi +.DE +If hardware problems dictate a special set of commands be executed when +the system crashes, a sequence of commands can be saved using the console +command \*(lqLINK\*(rq to be reexecuted with \*(lqPERFORM\*(rq (which can be +abbreviated \*(lqP\*(rq). +If a dump is to be taken on magnetic tape (this is a good idea +in most any case where the cause of the crash is not immediately obvious) +then the following commands will (should) be executed: +.DS +.nf + D PSL 0<return> + D PC 80000200<return> + C<return> +.fi +.DE +These commands are actually part of the standard \*(lq@DUMP\*(rq script. +This should write a copy of all of memory +on the tape, followed by two EOF marks. +Caution: +Any error is taken to mean the end of memory has been reached. +This means that you must be sure the ring is in, +the tape is ready, and the tape is clean and new. +.PP +If there are not 40(hex) locations active on the kernel stack when the +procedure is begun, then the console may begin to print error diagnostics. +You can stop this by hitting \*(lq^C\*(rq (control-C), and then give the +last three commands above. +.PP +If the dump fails, you can try again, +but some of the registers will be lost. +See below for what to do with the tape. +.PP +To restart after a crash, follow the directions in +.IR reboot (8); +if the virtual memory subsystem is suspected as the cause of the crash, +then a version of the system other than \*(lqvmunix\*(rq should be booted +which will leave the paging areas temporarily intact +for use by the post-mortem analysis program +.I analyze. +After checking your root file system consistency with +.IR fsck (8), +you can read the core dump tape into the file /vmcore with +.IP +dd if=/dev/rmt0 of=/vmcore bs=20b +.LP +It does not work to use just +.IR cp (1), +as the tape is blocked. +With the system still in single-user mode, run the analysis program +.I analyze, +e.g.: +.IP +analyze \-s /dev/drum /vmcore /vmunix +.LP +and save the output. +Then boot up +\*(lqvmunix\*(rq +and let it do the automatic reboot, i.e. to boot multi-user from +an RM03/RM05/RP06 on the MASSBUS +.IP +>>> BOOT RPM +.PP +After rebooting, to analyze a dump you should execute +.I "ps \-alxk" +to print the process table at the time of the crash. +Use +.IR adb (1) +to examine +.IR /vmcore . +The location +.I dumpstack\-80000000 +is the bottom of a stack onto which were pushed the stack pointer +.BR sp , +.B PCBB +(containing the physical address of a +.IR u_area ), +.BR MAPEN , +.BR IPL , +and registers +.BR r13 \- r0 +(in that order). +.BR r13 (fp) +is the system frame pointer and the stack is used in standard +.B calls +format. Use +.IR adb (1) +to get a reverse calling order. +In most cases this procedure will give +an idea of what is wrong. +A more complete discussion +of system debugging is impossible here. +See, however, +.IR analyze (8) +for some more hints. +.SH "SEE ALSO" +analyze(8), reboot(8) +.br +.I "VAX 11/780 System Maintenance Guide" +for more information about machine checks. +.SH BUGS |
