[sf-lug] SF-LUG non-list stuff too: Fwd: BALUG VM was down for fair while earlier today. Has now been up again for over 7 hours now.
Michael Paoli
Michael.Paoli at cal.berkeley.edu
Sun Sep 2 21:08:46 PDT 2018
And, since the non-list SF-LUG stuff is also hosted on the BALUG
VM, that did take that out for that time too. SF-LUG lists
unimpacted, and DNS of course amply redundant.
----- Forwarded message from Michael.Paoli at cal.berkeley.edu -----
Date: Sun, 02 Sep 2018 21:03:22 -0700
From: "Michael Paoli" <Michael.Paoli at cal.berkeley.edu>
Subject: BALUG VM was down for fair while earlier today. Has now
been up again for over 7 hours now.
To: BALUG-Admin <balug-admin at lists.balug.org>
BALUG VM was down for fair while earlier today.
Has now been up again for over 7 hours now.
Looks like there was an I/O hiccup on the physical host,
which didn't particularly impact the physical hosts, but
was enough of an interruption (delay) that the BALUG VM kernel
paniced.
Did have a 3rd hard drive testing, etc. on the physical host
at the time ... might've hit issues and possibly it did a bus
reset? Who knows for sure. Anyway ...
Went down sometime after:
2018-09-02T01:27:36-07:00
and was brought back up around:
2018-09-02T13:39:30-07:00
Various bits I noted in log:
$ curl -s --range 375155-378925 http://www.archive.balug.org/log.txt
2018-09-02 Michael Paoli
host crashed sometime after:
2018-09-02T01:27:36-07:00
but probably before about:
2018-09-02T01:35:00-07:00
on console, we got:
# [54894.969741] sd 0:0:0:0: [sda] tag#3 ABORT operation started
[54900.078084] sd 0:0:0:0: ABORT operation timed-out.
[54900.080312] sd 0:0:0:0: [sda] tag#2 ABORT operation started
[54905.198438] sd 0:0:0:0: ABORT operation timed-out.
[54905.200517] sd 0:0:0:0: [sda] tag#1 ABORT operation started
[54905.357128] Kernel panic - not syncing: assertion "i &&
sym_get_cam_status(cp->cmd) == DID_SOFT_ERROR" failed: file
"/build/linux-AcJpTp/linux-4.9.110/drivers/scsi/sym53c8xx_2/sym_hipd.c", line
3399
[54905.357128]
[54905.367774] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-8-amd64
#1 Debian 4.9.110-3+deb9u4
[54905.370776] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[54905.372768] 0000000000000000 ffffffff84f31e54 ffff9e2f75d5a300
ffff9e2f7fc03e50
[54905.375471] ffffffff84d7f6ad 0000000000000020 ffff9e2f7fc03e60
ffff9e2f7fc03df8
[54905.378226] 3ea9db08406f9671 0000000100d04ae4 ffffffffc048a250
ffffffffc0489e80
[54905.380982] Call Trace:
[54905.381867] <IRQ> [54905.382541] [<ffffffff84f31e54>] ?
dump_stack+0x5c/0x78
[54905.384428] [<ffffffff84d7f6ad>] ? panic+0xe4/0x23f
[54905.386164] [<ffffffffc048512e>] ? sym_interrupt+0x1c9e/0x1e80 [sym53c8xx]
[54905.388543] [<ffffffffc03aa010>] ?
usb_hcd_poll_rh_status+0x170/0x170 [usbcore]
[54905.391102] [<ffffffffc03a9fc9>] ?
usb_hcd_poll_rh_status+0x129/0x170 [usbcore]
[54905.393627] [<ffffffffc03aa010>] ?
usb_hcd_poll_rh_status+0x170/0x170 [usbcore]
[54905.396144] [<ffffffff84ce7562>] ? call_timer_fn+0x32/0x120
[54905.398071] [<ffffffffc047ea4b>] ? sym53c8xx_intr+0x3b/0x70 [sym53c8xx]
[54905.400386] [<ffffffff84cd418e>] ? __handle_irq_event_percpu+0x7e/0x1a0
[54905.402673] [<ffffffff84cd42e0>] ? handle_irq_event_percpu+0x30/0x70
[54905.404898] [<ffffffff84cd4359>] ? handle_irq_event+0x39/0x60
[54905.406901] [<ffffffff84cd7870>] ? handle_fasteoi_irq+0xa0/0x170
[54905.409001] [<ffffffff84c27faf>] ? handle_irq+0x1f/0x30
[54905.410834] [<ffffffff852187ee>] ? do_IRQ+0x4e/0xe0
[54905.412528] [<ffffffff85216556>] ? common_interrupt+0x96/0x96
[54905.414523] <EOI> [54905.415216] [<ffffffff852151f0>] ?
__sched_text_end+0x1/0x1
[54905.417231] [<ffffffff852154c2>] ? native_safe_halt+0x2/0x10
[54905.419235] [<ffffffff8521520a>] ? default_idle+0x1a/0xd0
[54905.421137] [<ffffffff84cbc7da>] ? cpu_startup_entry+0x1ca/0x240
[54905.423215] [<ffffffff8593df5e>] ? start_kernel+0x447/0x467
[54905.425186] [<ffffffff8593d120>] ? early_idt_handler_array+0x120/0x120
[54905.427438] [<ffffffff8593d408>] ? x86_64_start_kernel+0x14c/0x170
[54905.429842] Kernel Offset: 0x3c00000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[54905.433484] ---[ end Kernel panic - not syncing: assertion "i &&
sym_get_cam_status(cp->cmd) == DID_SOFT_ERROR" failed: file
"/build/linux-AcJpTp/linux-4.9.110/drivers/scsi/sym53c8xx_2/sym_hipd.c", line
3399
[54905.433484]
... also noted within that same timeframe, on physical host, there
were some storage related events ... but no hard failues seen on that
physical host and no outages or failures or such observed on that
physical host:
Sep 2 01:29:04 vicki smartd[1093]: Device: /dev/sda [SAT], SMART
Usage Attribute: 195 Hardware_ECC_Recovered changed from 63 to 69
Sep 2 01:29:04 vicki smartd[1093]: Device: /dev/sdb [SAT], SMART
Usage Attribute: 190 Airflow_Temperature_Cel changed from 69 to 70
Sep 2 01:29:04 vicki smartd[1093]: Device: /dev/sdb [SAT], SMART
Usage Attribute: 194 Temperature_Celsius changed from 31 to 30
Sep 2 01:29:04 vicki smartd[1093]: Device: /dev/sdb [SAT], SMART
Usage Attribute: 195 Hardware_ECC_Recovered changed from 63 to 66
$
----- End forwarded message -----
More information about the sf-lug
mailing list