[sf-lug] disk is cheap, don't be stingy ... oops! ;-)

Michael Paoli Michael.Paoli at cal.berkeley.edu
Sat Mar 20 22:29:09 PDT 2021


So, routine maintenance (on the "vicki" host) and ...:

Setting up grub-pc (2.02+dfsg1-20+deb10u4) ...
Installing for i386-pc platform.
grub-install: warning: your core.img is unusually large.  It won't fit  
in the embedding area.
grub-install: error: embedding is not possible, but this is required  
for RAID and LVM install.
Configuring grub-pc
-------------------

GRUB failed to install to the following devices:

/dev/sda

So ... should'a set that up wee bit better.
That host has 2 matched Hard Disk Drives (HDDs), partitioned
identically, with md RAID-1 on all partitions, and MBR and GRUB
installed to both drives ... so it will boot off either drive and run
fine, even if one drive completely fails ... the host's BIOS/CMOS even
quite supports that ... both drives in the list of devices to attempt to
boot from, and yes, have even well tested that (though not particularly
recently).

So, time to do a wee bit more maintenance, as GRUB wasn't able to
update / reinstall itself onto the target drive - in addition to MBR,
GRUB wants to squirrel away a bunch of other stuff it needs ... and
enough so that it can load that up and then understand stuff like
filesystems and md RAID, etc., so it can then figure out where the /boot
filesystem is, and how to access that and load kernel and initrd and
such.  So, quite a bit more than can fit in just MBR.  Well, GRUB didn't
have enough space to put it in there this time around, so let's examine
our situation:

# sfdisk -uS -l /dev/sda
Disk /dev/sda: 232.9 GiB, 250059350016 bytes, 488397168 sectors
Disk model: ST3250620NS
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x000ec460

Device     Boot     Start       End   Sectors   Size Id Type
/dev/sda1              63    498014    497952 243.1M fd Linux raid autodetect
/dev/sda2          498015  35648234  35150220  16.8G fd Linux raid autodetect
...
# sfdisk -uS -l /dev/sdb
Disk /dev/sdb: 232.9 GiB, 250059350016 bytes, 488397168 sectors
Disk model: ST3250620NS
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00004139

Device     Boot     Start       End   Sectors   Size Id Type
/dev/sdb1              63    498014    497952 243.1M fd Linux raid autodetect
/dev/sdb2          498015  35648234  35150220  16.8G fd Linux raid autodetect
...
#

So ... should've left more space before that first partition.

So, what have we on /dev/sd[ab]1 ...

# blkid /dev/sd[ab]1
/dev/sda1: UUID="414a9238-eb3d-bf42-9a18-401709e2bed3"  
UUID_SUB="b1cb72e0-3fc4-1b5b-2c33-d3339ec67989" LABEL="vicki:1"  
TYPE="linux_raid_member" PARTUUID="000ec460-01"
/dev/sdb1: UUID="414a9238-eb3d-bf42-9a18-401709e2bed3"  
UUID_SUB="8745d148-0a84-0afd-922e-6eddae26f2b4" LABEL="vicki:1"  
TYPE="linux_raid_member" PARTUUID="00004139-01"
#

Note in the above, both identify as md RAID, and both show the same UUID.

# mdadm -Q /dev/sd[ab]1
/dev/sda1: is not an md array
/dev/sda1: device 1 in 2 device active raid1 /dev/md1.  Use mdadm  
--examine for more detail.
/dev/sdb1: is not an md array
/dev/sdb1: device 0 in 2 device active raid1 /dev/md1.  Use mdadm  
--examine for more detail.
# mdadm -E /dev/sd[ab]1
/dev/sda1:
...
         Version : 1.2
...
      Array UUID : 414a9238:eb3dbf42:9a184017:09e2bed3
            Name : vicki:1  (local to host vicki)
...
      Raid Level : raid1
    Raid Devices : 2

  Avail Dev Size : 497664 (243.00 MiB 254.80 MB)
      Array Size : 248832 (243.00 MiB 254.80 MB)
     Data Offset : 288 sectors
    Super Offset : 8 sectors
    Unused Space : before=200 sectors, after=0 sectors
           State : clean
     Device UUID : b1cb72e0:3fc41b5b:2c33d333:9ec67989
...
    Device Role : Active device 1
    Array State : AA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdb1:
...
         Version : 1.2
...
      Array UUID : 414a9238:eb3dbf42:9a184017:09e2bed3
            Name : vicki:1  (local to host vicki)
...
      Raid Level : raid1
    Raid Devices : 2

  Avail Dev Size : 497664 (243.00 MiB 254.80 MB)
      Array Size : 248832 (243.00 MiB 254.80 MB)
     Data Offset : 288 sectors
    Super Offset : 8 sectors
    Unused Space : before=200 sectors, after=0 sectors
           State : clean
     Device UUID : 8745d148:0a840afd:922e6edd:ae26f2b4
...
    Device Role : Active device 0
    Array State : AA ('A' == active, '.' == missing, 'R' == replacing)
#
$ mount | fgrep md1
/dev/md1 on /boot type ext3 (ro,nosuid,nodev,relatime)
$

To fix it, and probably mostly leave ourselves still bootable as we
do so, let's proceed approximately as follows:
o unmount our /boot filesystem
o break the mirror, removing sdb1
o Let's make it slightly more challenging, as we do have backups,
   and we can also fall back to using/coping what we need from sda1
o shrink and relocate our filesystem on sdb1
o redo our sdb1 partition
o turn sdb1 back into our md1 md device, RAID-1, but initially missing
   its mirror device.
o reinstall GRUB onto sdb
o redo our sda1 partition
o add sda1 to our md1 RAID-1
o reinstall GRUB onto sda
o well/thoroughly test - notably shutdown, disconnect sda, boot,
   check/inspect, shutdown, reconnect sda and disconnect sdb, boot
   and check/inspect - this should be the sda drive, then power down,
   reconnect sdb, and boot from sda again.  Follow through to completion
   of remirroring.

First let's well figure out sizing, placement, notably if we get rid of
partition 1, then, let's use fdisk to replace it, letting it use default
placement and size - that should give us ample empty space before that
partition, and good alignment.  And then figure out putting fresh md for
md RAID-1 atop that - exactly what does that leave us within the md
for start placement of filesystem and exact sizing for its data.

We'll use a sparse flat file and loopback device to figure out our
sizing/placement.

$ df -h /tmp
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           997M     0  997M   0% /tmp
$ swapon -s
Filename                                Type            Size    Used    
  Priority
/dev/dm-1                               partition       524284  36620   -2
/dev/dm-5                               partition       524284  0       -3
/dev/dm-7                               partition       524284  0       -4
/dev/dm-6                               partition       524284  0       -5
$

We have ample space on /tmp, and free swap to back that tmpfs space,
so let's use /tmp for that.

# mktemp -d
/tmp/tmp.imrUC9Rm2U
# cd /tmp/tmp.imrUC9Rm2U
# >sdc
#
$ grep . /sys/block/sd[ab]/size
/sys/block/sda/size:488397168
/sys/block/sdb/size:488397168
$ expr 488397168 \* 512
250059350016
$
# truncate -s 250059350016 sdc
# losetup -f --show /tmp/tmp.imrUC9Rm2U/sdc
/dev/loop0
#
$ cat /sys/block/loop0/size
488397168
# sfdisk -uS -d /dev/sdb
label: dos
label-id: 0x00004139
device: /dev/sdb
unit: sectors

/dev/sdb1 : start=          63, size=      497952, type=fd
/dev/sdb2 : start=      498015, size=    35150220, type=fd
...
# sfdisk -uS -d /dev/sdb | sed -e '/^label-id:/d;/^\/dev\/sdb1 :/d
> s/sdb/loop0p/' | sfdisk -uS /dev/loop0
Checking that no-one is using this disk right now ... OK

Disk /dev/loop0: 232.9 GiB, 250059350016 bytes, 488397168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

>>> Script header accepted.
>>> Script header accepted.
>>> Script header accepted.
>>> Created a new DOS disklabel with disk identifier 0x82e85af2.
/dev/loop0p1: Created a new partition 2 of type 'Linux raid  
autodetect' and of size 16.8 GiB.
...
/dev/loop0p14: Done.

New situation:
Disklabel type: dos
Disk identifier: 0x82e85af2

Device        Boot     Start       End   Sectors   Size Id Type
/dev/loop0p2          498015  35648234  35150220  16.8G fd Linux raid  
autodetect
...
The partition table has been altered.
Calling ioctl() to re-read partition table.
Re-reading the partition table failed.: Invalid argument
The kernel still uses the old table. The new table will be used at the  
next reboot or after you run partprobe(8) or kpartx(8).
Syncing disks.
# fdisk /dev/loop0

Welcome to fdisk (util-linux 2.33.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help): n
Partition type
    p   primary (1 primary, 1 extended, 2 free)
    l   logical (numbered from 5)
Select (default p):

Using default response p.
Partition number (1,4, default 1):
First sector (2048-488397167, default 2048):
Last sector, +/-sectors or +/-size{K,M,G,T,P} (2048-498014, default 498014):

Created a new partition 1 of type 'Linux' and of size 242.2 MiB.

Command (m for help): t
Partition number (1-3,5-13, default 13): 1
Hex code (type L to list all codes): fd

Changed type of partition 'Linux' to 'Linux raid autodetect'.

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Re-reading the partition table failed.: Invalid argument

The kernel still uses the old table. The new table will be used at the  
next reboot or after you run partprobe(8) or kpartx(8).

# sfdisk -uS -l /dev/loop0
Disk /dev/loop0: 232.9 GiB, 250059350016 bytes, 488397168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x82e85af2

Device        Boot     Start       End   Sectors   Size Id Type
/dev/loop0p1            2048    498014    495967 242.2M fd Linux raid  
autodetect
/dev/loop0p2          498015  35648234  35150220  16.8G fd Linux raid  
autodetect
...
# partx -a /dev/loop0
# (cd /dev && echo $(ls -d loop0p* | sort -tp -k 3,3n))
loop0p1 loop0p2 loop0p3 loop0p5 loop0p6 loop0p7 loop0p8 loop0p9  
loop0p10 loop0p11 loop0p12 loop0p13
#

Create RAID-1 in non-degraded mode with only one drive:
# mdadm --create --level=raid1 --force --raid-devices=1 /dev/md101  
--metadata=1.2 /dev/loop0p1
mdadm: array /dev/md101 started.
#
# mdadm -E /dev/loop0p1
/dev/loop0p1:
...
         Version : 1.2
...
      Array UUID : 1aea00d9:9f9371c1:b983ab82:5c2bfd46
            Name : vicki:101  (local to host vicki)
...
      Raid Level : raid1
    Raid Devices : 1

  Avail Dev Size : 493919 (241.17 MiB 252.89 MB)
      Array Size : 246912 (241.13 MiB 252.84 MB)
   Used Dev Size : 493824 (241.13 MiB 252.84 MB)
     Data Offset : 2048 sectors
    Super Offset : 8 sectors
    Unused Space : before=1968 sectors, after=95 sectors
           State : clean
     Device UUID : 40830792:43f3f13c:843806fa:de3294c8
...
    Device Role : Active device 0
    Array State : A ('A' == active, '.' == missing, 'R' == replacing)
#

Now, presumably Array Size would be total space we have for our
filesystem within our md device 246912 KiB.  Let's also check the
math and see if it agrees with what mdadm is telling us:
$ echo '246912/1024;246912*1024/1000/1000' | bc -l
241.12500000000000000000
252.83788800000000000000
$
And, rounded to two decimal places that mdadm reported, those match the
MiB and MB numbers mdadm gave us.
Let's also check /sys/block:
$ cat /sys/block/md101/size
493824
$ expr 493824 / 2
246912
$
So, that is our exact size in KiB, or 493824 in 512 byte blocks.
And, our offset and unused space at end ... mdadm tells us:
     Data Offset : 2048 sectors
    Super Offset : 8 sectors
    Unused Space : before=1968 sectors, after=95 sectors
But mdadm gives us data in both 512 byte blocks, and 1 KiB blocks,
so, which is it for that?  Well, we should be able to figure that
out.  For partition 1 we have, from sfdisk -uS -l:
Device        Boot     Start       End   Sectors   Size Id Type
/dev/loop0p1            2048    498014    495967 242.2M fd Linux raid  
autodetect
That's 495967 512 byte blocks.
We should also be able to check that via /sys/block:
$ cat /sys/block/loop0/loop0p1/size
495967
$
for md101 we have:
493824
So:
$ expr 495967 - 493824
2143
$ expr 2048 + 95
2143
$
So, when we take the before and after sectors mdadm gives us and add
them up, the account for the difference between sizes of partition 1
and md101, in 512 byte blocks, so those sectors mdadm is reporting are
512 byte blocks.
So, we know the offset, and size, so we should then know exactly
where the bounds are where we can safely have filesystem data, and
not have it damaged or lost by the partition and mdadm manipulations
we'll do, so let's also check ourselves on that.
Let's write a marker block at the very start and very end of that space,
and then see that it exists exactly where we expect it to be regarding
offsets and length.
# yes 'START ' | tr -d '\012' | dd bs=512 count=1 of=/dev/md101
1+0 records in
1+0 records out
512 bytes copied, 0.00101608 s, 504 kB/s
#
So, again, space for filesystem:
493824
#
For the end, we omit the count, so dd will help reconfirm that
we place this at the very last 512 byte block, and thus should
only be able to write exactly 1 such block:
# yes 'END ' | tr -d '\012' | dd bs=512 seek=493823 of=/dev/md101
dd: error writing '/dev/md101': No space left on device
2+0 records in
1+0 records out
512 bytes copied, 0.00106452 s, 481 kB/s
#
The 1+0 records out and 512 bytes copied confirms we positioned and
wrote exactly where we expected.
And, let's check those marker blocks we wrote, are exactly where we
expect them, at our offsets on loop0/sdc and loop0p1/sdc1:
Device        Boot     Start       End   Sectors   Size Id Type
/dev/loop0p1            2048    498014    495967 242.2M fd Linux raid  
autodetect
     Data Offset : 2048 sectors
# dd bs=512 if=/dev/loop0p1 skip=2048 count=1 | cut -c-72
1+0 records in
1+0 records out
START START START START START START START START START START START START
512 bytes copied, 0.000258839 s, 2.0 MB/s
#
$ expr 2048 + 2048
4096
$
# dd bs=512 if=/dev/loop0 skip=4096 count=1 | cut -c-72
1+0 records in
1+0 records out
START START START START START START START START START START START START
512 bytes copied, 0.000123994 s, 4.1 MB/s
#
$ expr 493823 + 2048
495871
$ expr 493823 + 2048 + 2048
497919
$
# dd bs=512 if=/dev/loop0p1 skip=495871 count=1 | cut -c-72
1+0 records in
1+0 records out
END END END END END END END END END END END END END END END END END END
512 bytes copied, 0.000336053 s, 1.5 MB/s
# dd bs=512 if=/dev/loop0 skip=497919 count=1 | cut -c-72
1+0 records in
1+0 records out
END END END END END END END END END END END END END END END END END END
512 bytes copied, 0.000113067 s, 4.5 MB/s
#
Cool - we know exactly where we can safely have the filesystem data.
We're done with testing, let's undo those bits, and then proceed to
do it "for real".  On the "for real", we'll do a bit more with UUIDs,
notably so we reuse same UUID on our replacement md device, and change
or wipe/zero it on our old, that will also make it a bit more
convenient, as other configuration bits such as our
/etc/mdadm/mdadm.conf and /etc/fstab, etc. files can remain exactly the
same.

Anyway, undo the test bits, backing it out recursively bottom-up:
# mdadm -S /dev/md101
mdadm: stopped /dev/md101
# partx -d /dev/loop0
# losetup -d /dev/loop0
# rm sdc
# cd
# rmdir /tmp/tmp.imrUC9Rm2U/
#

o unmount our /boot filesystem
# umount /boot
#

o break the mirror, removing sdb1
# mdadm /dev/md1 --fail /dev/sdb1 --remove /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md1
mdadm: hot removed /dev/sdb1 from /dev/md1
# mdadm --grow /dev/md1 --force --raid-devices=1
raid_disks for /dev/md1 set to 1
#
$ sed -ne '/^md1 : /,/^ *$/{/^ *$/q;p}' /proc/mdstat
md1 : active raid1 sda1[1]
       248832 blocks super 1.2 [1/1] [U]
$

o Let's make it slightly more challenging, as we do have backups,
   and we can also fall back to using/coping what we need from sda1
o shrink and relocate our filesystem on sdb1
 From our earlier mdadm -E /dev/sd[ab]1 we have for sdb1:
  Avail Dev Size : 497664 (243.00 MiB 254.80 MB)
     Data Offset : 288 sectors
    Unused Space : before=200 sectors, after=0 sectors
And let's also check:
$ cat /sys/block/md1/size
497664
$
So, let's set up a loop device to access that space and only that space
within sdb, and not relative to sdb1, as we'll be removing and recreating
sdb1.
 From our earlier sfdisk -uS -l /dev/sdb
we also have:
Device     Boot     Start       End   Sectors   Size Id Type
/dev/sdb1              63    498014    497952 243.1M fd Linux raid autodetect
so, relative to sdb for offset for existing filesystem:
$ expr 63 + 288
351
$ expr 351 \* 512
179712
$ expr 497664 \* 512
254803968
$
# losetup -o 179712 --sizelimit 254803968 --show -f /dev/sdb
/dev/loop0
#
We should now have our filesystem there, but we still also have exact
matching filesystem on md1 with same UUID ... we don't want two
filesystems with the same UUID, so let's first check, then change UUID
on md1/sda1.
# blkid /dev/loop0 /dev/md1
/dev/loop0: LABEL="boot" UUID="1ec4b295-7f05-472f-9509-b7b415ab83a0"  
SEC_TYPE="ext2" TYPE="ext3"
/dev/md1: LABEL="boot" UUID="1ec4b295-7f05-472f-9509-b7b415ab83a0"  
SEC_TYPE="ext2" TYPE="ext3"
# cmp /dev/loop0 /dev/md1 && echo exact match
exact match
# tune2fs -U random /dev/md1
tune2fs 1.44.5 (15-Dec-2018)
# blkid /dev/md1 /dev/loop0
/dev/md1: LABEL="boot" UUID="3d889778-5d69-4118-a73c-7ceda9eebdb5"  
SEC_TYPE="ext2" TYPE="ext3"
/dev/loop0: LABEL="boot" UUID="1ec4b295-7f05-472f-9509-b7b415ab83a0"  
SEC_TYPE="ext2" TYPE="ext3"
#
At this point our system isn't quite bootable, as md data on sdb1 says
sdb1 is faulty/failed (even though the data itself is fine, as we marked
it as failed to be able to remove it from the RAID-1 array), and our
filesystem on md1 no longer has the UUID expected for /boot.
But if we were to crash and reboot now, we could easily enough
select our device manually to boot from and all would come up fine ...
well, except the mount of /boot would also fail (UUID mismatch), but
likewise we could manually work around that too.  But we'll be fixin'
all that up again soon anyway, and hopefully we don't crash in the
meantime, but no biggie if we do ... would just be inconvenient.
So, let's shrink our filesystem ... but it'll want an fsck first
anyway, so we'll do that first:
# fsck -t ext3 /dev/loop0 -f -y
fsck from util-linux 2.33.1
e2fsck 1.44.5 (15-Dec-2018)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
boot: 340/62248 files (24.7% non-contiguous), 63619/248832 blocks
# resize2fs -M /dev/loop0
resize2fs 1.44.5 (15-Dec-2018)
Resizing the filesystem on /dev/loop0 to 56516 (1k) blocks.
The filesystem on /dev/loop0 is now 56516 (1k) blocks long.

#
Our filesystem is now reduced in size by much more than enough (we'll
grow it later) to fit into the new target location.
However, we need to reposition it to a bit further along on the device.
our old partition 1 and md offset, we have:
Device     Boot     Start       End   Sectors   Size Id Type
/dev/sdb1              63    498014    497952 243.1M fd Linux raid autodetect
     Data Offset : 288 sectors
For our new situation, we want to move that to:
Device        Boot     Start       End   Sectors   Size Id Type
/dev/loop0p1            2048    498014    495967 242.2M fd Linux raid  
autodetect
     Data Offset : 2048 sectors
... except that will be on and relative to sdb
So, before, our offset from start of drive:
$ expr 63 + 288
351
$
And after, we want:
$ expr 2048 + 2048
4096
$
A difference of:
$ expr 4096 - 351
3745
$
Hmmm, ... tempting, could do it the hard way ... copy block-by-block
from old position, to new, in reverse order - that would be a way to
shuffle it to the later position.  But the filesystem, and especially
since not only was it relatively small to start with, but after being
reduced in size, it's even smaller, and we have more than ample space
on /tmp for a temporary copy of such, so, let's just be lazy/efficient
and do it that way, and maybe show block-by-block reverse order
copying later (and which would be a necessity or much faster if we
didn't otherwise have the space).  And /tmp is tmpfs, so it'll be
quite sufficiently fast and efficient, even with the additional
write to and read from /tmp filesystem.

So, let's proceed to use /tmp for intermediary temporary file,
and reposition our filesystem via such copy.
# mktemp
/tmp/tmp.85PdeINcp5
#
 From earlier, we have:
The filesystem on /dev/loop0 is now 56516 (1k) blocks long.
So:
# dd if=/dev/loop0 bs=1024 count=56516 of=/tmp/tmp.85PdeINcp5
56516+0 records in
56516+0 records out
57872384 bytes (58 MB, 55 MiB) copied, 0.223318 s, 259 MB/s
#
No longer need our loop device, so:
# losetup -d /dev/loop0
#
And now, to write it to exactly where we want it to be repositioned,
essentially shifting the filesystem data a bit further along on the
device:
$ expr 56516 \* 2
113032
$
# dd if=/tmp/tmp.85PdeINcp5 bs=512 count=113032 seek=4096 of=/dev/sdb
113032+0 records in
113032+0 records out
57872384 bytes (58 MB, 55 MiB) copied, 0.300071 s, 193 MB/s
#
No longer need that temporary file, so:
# rm /tmp/tmp.85PdeINcp5

o redo our sdb1 partition
# fdisk /dev/sdb

Welcome to fdisk (util-linux 2.33.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help): d
Partition number (1-3,5-13, default 13): 1

Partition 1 has been deleted.

Command (m for help): n
Partition type
    p   primary (1 primary, 1 extended, 2 free)
    l   logical (numbered from 5)
Select (default p):

Using default response p.
Partition number (1,4, default 1):
First sector (2048-488397167, default 2048):
Last sector, +/-sectors or +/-size{K,M,G,T,P} (2048-498014, default 498014):

Created a new partition 1 of type 'Linux' and of size 242.2 MiB.

Command (m for help): t
Partition number (1-3,5-13, default 13): 1
Hex code (type L to list all codes): fd

Changed type of partition 'Linux' to 'Linux raid autodetect'.

Command (m for help): w
The partition table has been altered.
Syncing disks.

#
Check that it's exactly where we want and expect it:
# sfdisk -uS -l /dev/sdb | grep -e '^Device' -e 'sdb1 '
Device     Boot     Start       End   Sectors   Size Id Type
/dev/sdb1            2048    498014    495967 242.2M fd Linux raid autodetect
#
And yes, exactly matches our earlier as we did in our testing:
/dev/loop0p1            2048    498014    495967 242.2M fd Linux raid  
autodetect

o turn sdb1 back into our md1 md device, RAID-1, but initially missing
   its mirror device.
First we get rid of our existing md1:
# mdadm -S /dev/md1
mdadm: stopped /dev/md1
# mdadm --zero-superblock /dev/sda1
#
We also zapped its superblock for good measure, so nothing will pick up
its old UUID.
For good measure, let's do that also where sdb1 was ... but we already
changed the partitioning ... but no problem, we can set up and access
the old start location (and length) via a loop device.
 From earlier:
# sfdisk -uS -l /dev/sdb
Device     Boot     Start       End   Sectors   Size Id Type
/dev/sdb1              63    498014    497952 243.1M fd Linux raid autodetect
So ...
$ expr 63 \* 512; expr 497952 \* 512
32256
254951424
$
# losetup -o 32256 --sizelimit 254951424 --show -f /dev/sdb
/dev/loop0
# mdadm --zero-superblock /dev/loop0
mdadm: Couldn't open /dev/loop0 for write - not zeroing
#
Ah, buggers, it automagically detected and stated md1 on loop0,
let's stop that:
$ sed -ne '/^md1 : /,/^ *$/{/^ *$/q;p}' /proc/mdstat
md1 : active (auto-read-only) raid1 loop0[0]
       248832 blocks super 1.2 [2/1] [U_]
$
# mdadm -S /dev/md1
mdadm: stopped /dev/md1
# mdadm --zero-superblock /dev/loop0
#
Yup, we would've been better off zeroing that before we changed sdb1.
In any case, it's now done, and thus again no longer need loop device:
# losetup -d /dev/loop0
#
Now to create our new md1 device - with same UUID as before:
# mdadm --create --level=raid1 --force --raid-devices=1  
--uuid=414a9238-eb3d-bf42-9a18-401709e2bed3 /dev/md1 --metadata=1.2  
/dev/sdb1
mdadm: array /dev/md1 started.
#
$ sed -ne '/^md1 : /,/^ *$/{/^ *$/q;p}' /proc/mdstat
md1 : active raid1 sdb1[0]
       246912 blocks super 1.2 [1/1] [U]
$
# blkid /dev/md1
/dev/md1: LABEL="boot" UUID="1ec4b295-7f05-472f-9509-b7b415ab83a0"  
SEC_TYPE="ext2" TYPE="ext3"
#
And we see in the above, we find out filesystem exactly where we expect
to, as we positioned it at the start of where the md device data would
be.  Let's also check md UUID and offset is exactly as we expect:
# mdadm -E /dev/sdb1 | fgrep -e 'Array UUID' -e 'Data Offset'
      Array UUID : 414a9238:eb3dbf42:9a184017:09e2bed3
     Data Offset : 2048 sectors
#
Both filesystem and md UUIDs match as we expect, and offset is also as
we expect.

o reinstall GRUB onto sdb
First we mount /boot:
# mount /boot
# mount | fgrep /boot
# mount /boot && echo $?

# mount | fgrep /boot
# ls -A /boot
# ls -A /mnt
# fgrep /boot /etc/fstab
UUID=1ec4b295-7f05-472f-9509-b7b415ab83a0 /boot           ext3     
ro,nosuid,nodev        0       2 # /dev/md1
# blkid /dev/md1
/dev/md1: LABEL="boot" UUID="1ec4b295-7f05-472f-9509-b7b415ab83a0"  
SEC_TYPE="ext2" TYPE="ext3"
# mount -t ext3 -o ro,nosuid,nodev /dev/md1 /boot
# ls -A /boot
# mount | fgrep /boot
# mount -t ext3 -o ro,nosuid,nodev /dev/md1 /mnt
# mount | fgrep /mnt
/dev/md1 on /mnt type ext3 (ro,nosuid,nodev,relatime)
#
Well, something funky is going on with /boot ... filesystem itself all
seems fine and we can mount it, but when we try to mount it on /boot,
it gives no errors, yet fails to mount there.
# (ls -ld /boot && rmdir /boot && umask 022 && mkdir /boot && ls -ld /boot)
# mount /boot && { fgrep /boot /proc/mounts; echo $?; }
/dev/md1 /boot ext3 ro,nosuid,nodev,relatime 0 0

# umount /mnt
# umount /boot
umount: /boot: not mounted.
# df -h /mnt /boot
Filesystem              Size  Used Avail Use% Mounted on
/dev/mapper/vicki-root  922M  380M  495M  44% /
/dev/mapper/vicki-root  922M  380M  495M  44% /
# fgrep /boot /etc/fstab
UUID=1ec4b295-7f05-472f-9509-b7b415ab83a0 /boot           ext3     
ro,nosuid,nodev        0       2 # /dev/md1
# blkid | fgrep 1ec4b295-7f05-472f-9509-b7b415ab83a0
/dev/md1: LABEL="boot" UUID="1ec4b295-7f05-472f-9509-b7b415ab83a0"  
SEC_TYPE="ext2" TYPE="ext3"
# ls -A /boot
# mount /boot
# ls -A /boot
# mount -t ext3 -o ro,nosuid,nodev /dev/md1 /mnt
# df -h /mnt
Filesystem      Size  Used Avail Use% Mounted on
/dev/md1         50M   50M     0 100% /mnt
# mount -t ext3 -o ro,nosuid,nodev /dev/md1 /boot
# df -h /boot
Filesystem              Size  Used Avail Use% Mounted on
/dev/mapper/vicki-root  922M  380M  495M  44% /
# fgrep /boot /proc/mounts
# fgrep md1 /proc/mounts
/dev/md1 /mnt ext3 ro,nosuid,nodev,relatime 0 0
#
Well, still something peculiar happening with /boot, though the
filesystem itself seems fine.
Let's see if we can just work around that for now, so we can
continue our progress ... and presumably it will all automagically
look and work fine and sane again after a reboot ... once we're
ready for that.
# df /mnt
Filesystem     1K-blocks  Used Available Use% Mounted on
/dev/md1           50640 50639         0 100% /mnt
# mount | fgrep /mnt
/dev/md1 on /mnt type ext3 (ro,nosuid,nodev,relatime)
# mount -o remount,rw /mnt && mount | fgrep /mnt
/dev/md1 on /mnt type ext3 (rw,nosuid,nodev,relatime)
# mount -o bind -t none /mnt /boot
# df -h /boot
Filesystem              Size  Used Avail Use% Mounted on
/dev/mapper/vicki-root  922M  380M  495M  44% /
# ls -A /boot
#
Bizarre - we can't even do a bind mount atop /boot
# rmdir /boot
# (cd / && ln -s /mnt /boot && ls -ld /boot && cd /boot && pwd -P && df -h .)
lrwxrwxrwx 1 root root 4 Mar 20 16:18 /boot -> /mnt
/mnt
Filesystem      Size  Used Avail Use% Mounted on
/dev/md1         50M   50M     0 100% /mnt
# df /dev/md1
Filesystem     1K-blocks  Used Available Use% Mounted on
/dev/md1           50640 50639         0 100% /mnt
#
Well, that should be enough to work around it for now, continuing ...
# resize2fs /dev/md1
resize2fs 1.44.5 (15-Dec-2018)
Filesystem at /dev/md1 is mounted on /mnt; on-line resizing required
old_desc_blocks = 1, new_desc_blocks = 1
The filesystem on /dev/md1 is now 246912 (1k) blocks long.

# grub-install /dev/sdb
Installing for i386-pc platform.
Installation finished. No error reported.
#
At this point, once we remove /boot and recreate it as directory,
we should now be able to boot fine from sdb, but before doing that,
let's proceed to do our fixes/changes to sda.

o redo our sda1 partition
# fdisk /dev/sda

Welcome to fdisk (util-linux 2.33.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help): d
Partition number (1-3,5-13, default 13): 1

Partition 1 has been deleted.

Command (m for help): n
Partition type
    p   primary (1 primary, 1 extended, 2 free)
    l   logical (numbered from 5)
Select (default p):

Using default response p.
Partition number (1,4, default 1):
First sector (2048-488397167, default 2048):
Last sector, +/-sectors or +/-size{K,M,G,T,P} (2048-498014, default 498014):

Created a new partition 1 of type 'Linux' and of size 242.2 MiB.

Command (m for help): t
Partition number (1-3,5-13, default 13): 1
Hex code (type L to list all codes): fd

Changed type of partition 'Linux' to 'Linux raid autodetect'.

Command (m for help): w
The partition table has been altered.
Syncing disks.

#
Our /dev/sd[ab]1 partitions should now exactly match:
# sfdisk -uS -l /dev/sd[ab] | grep -e '^Device' -e 'sd[ab]1 ' | sort  
-u -t/ -k 3,3
Device     Boot     Start       End   Sectors   Size Id Type
/dev/sda1            2048    498014    495967 242.2M fd Linux raid autodetect
/dev/sdb1            2048    498014    495967 242.2M fd Linux raid autodetect
#

o add sda1 to our md1 RAID-1
# mdadm --grow /dev/md1 --add /dev/sda1 --raid-devices=2
mdadm: added /dev/sda1
raid_disks for /dev/md1 set to 2
#

o reinstall GRUB onto sda
# grub-install /dev/sda
Installing for i386-pc platform.
Installation finished. No error reported.
#

And now:
# rm /boot && (umask 022 && mkdir /boot && ls -ld /boot)
drwxr-xr-x 2 root root 4096 Mar 20 16:35 /boot
#

Check our remirror of md1 has completed:
$ sed -ne '/^md1 : /,/^ *$/{/^ *$/q;p}' /proc/mdstat
md1 : active raid1 sda1[1] sdb1[0]
       246912 blocks super 1.2 [2/2] [UU]
$

We should be all good to reboot now, with full redundancy, and
being able to (re)boot or continue with either drive totally
failing.  Let's just do a regular reboot first and make sure
all is well (and particularly the strange funkiness we hit with the
/boot directory and mount) - maybe it's some systemd insanity,
it likes to muck about with managing mounts 'n such.
# cd / && shutdown -r now
...
and boots fine and /boot directory seems fine and sane and mount
mounts there just fine all automagically:
$ df -h /boot && mount | fgrep /boot && fgrep /boot /proc/mounts
Filesystem      Size  Used Avail Use% Mounted on
/dev/md1        230M   51M  170M  23% /boot
/dev/md1 on /boot type ext3 (ro,nosuid,nodev,relatime)
/dev/md1 /boot ext3 ro,nosuid,nodev,relatime 0 0
$
/proc/mdstat shows all good and everything fully mirrored,
now on to the "acid" tests.

o well/thoroughly test - notably shutdown, disconnect sda, boot,
   check/inspect, shutdown, reconnect sda and disconnect sdb, boot
   and check/inspect - this should be the sda drive, then power down,
   reconnect sdb, and boot from sda again.  Follow through to completion
   of remirroring.

And, all tested out fine, and just about done.  Only remnant bit,
since both sda, and sdb, each ran separately and independently,
and had some rw filesystems / md RAID, and may have had some changes
there, md is reasonably and rightfully slightly confused and a bit
cautious, as for some of the devices, both have independent changes
after they were last together in the same RAID-1, so to not default to
clobbering changes on one with changes on the other, it doesn't do the
remirroring on those.  So, we have in relevant part:
$ cat /proc/mdstat
...
md1 : active raid1 sdb1[0]
       246912 blocks super 1.2 [2/1] [U_]

md2 : active raid1 sdb2[0]
       17558720 blocks super 1.2 [2/1] [U_]

md9 : active raid1 sdb9[0]
       28265984 blocks super 1.2 [2/1] [U_]
...
$

Those require teensy bit 'o manual intervention to get remirrored.
# mdadm -D /dev/md[129] | sed -ne '/Major/,/removed/p'
     Number   Major   Minor   RaidDevice State
        0       8       17        0      active sync   /dev/sdb1
        -       0        0        1      removed
     Number   Major   Minor   RaidDevice State
        0       8       18        0      active sync   /dev/sdb2
        -       0        0        1      removed
     Number   Major   Minor   RaidDevice State
        0       8       25        0      active sync   /dev/sdb9
        -       0        0        1      removed
# mdadm /dev/md1 --add /dev/sda1
mdadm: added /dev/sda1
# mdadm /dev/md2 --add /dev/sda2
mdadm: added /dev/sda2
# mdadm /dev/md9 --add /dev/sda9
mdadm: added /dev/sda9
# mdadm -D /dev/md[129] | sed -ne '/Rebuild Status/p;/Major/{N;N;p}'
     Number   Major   Minor   RaidDevice State
        0       8       17        0      active sync   /dev/sdb1
        2       8        1        1      active sync   /dev/sda1
     Rebuild Status : 54% complete
     Number   Major   Minor   RaidDevice State
        0       8       18        0      active sync   /dev/sdb2
        2       8        2        1      spare rebuilding   /dev/sda2
     Number   Major   Minor   RaidDevice State
        0       8       25        0      active sync   /dev/sdb9
        2       8        9        1      spare rebuilding   /dev/sda9
#
...
And all mirrors fully synced up again.  So we're done.
If I'd only disconnected and reconnected a drive, md would've likely
just seen that as a stale mirror, and updated.  But since both got
separately updated since they got split, once md sees both again and
as part of what should be same RAID-1, md is like, "Uh, wait a minute.
How do I know you would want me to clobber the changes on one drive
when there are changes on both, not just one of 'em being stale.  So,
yeah, not gonna automagically remirror in this case, but will remove
the device (and continue to warn about the missing mirror), and you can
take a wee bit 'o manual steps to rectify the situation (perhaps there
are independent changes on the removed mirror, that you want to save
and merge in before remirroring?)"
Ah well, would've been better to have set that spare space up to begin
with, and avoided need to later change the start location of partition
1.  Then again, not exactly an everyday activity.  This physical machine
is over 13 years old, and I believe only once before got any significant
repartitioning done ... perhaps not even at all since its very
first install ... so, ... something like that about once every 6 or 7
years or more isn't too bad ... but even better is not having need to
change that at all.  So ... about 1 MiB more offset to the first
partition earlier, and would've avoided the need to do these changes.

TLDR / Moral of the story is ...
Disk is cheap, don't be (too) stingy (or you'll generally pay for it
later, one way or another).




More information about the sf-lug mailing list