自从用上zfs之后就开始关心硬盘坏掉之后如何更换,今天终止有机会了。
我在zfs上安装的pve,用的mirror。有段时间一直报有一块硬盘有READ EROR,我也一直没当回事。都是用zpool clear
清除掉,反正都是READ错误,没有WRITE错误,而且也就在10左右。
直到今天,一下子冒出这么多的错误,我知道这块盘是保不住了,本来这两块盘就是人家淘汰下来的(穷哭)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
root@pve:~# zpool status
pool: rpool
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: scrub repaired 6.81M in 02:41:15 with 0 errors on Sun Mar 13 03:05:17 2022
config:
NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
ata-WDC_WD5000BPKT-75PK4T0_WD-WX11A43M7165-part3 DEGRADED 857 0 367 too many errors
ata-WDC_WD5000BPKT-75PK4T0_WD-WX11A43M7336-part3 ONLINE 0 0 0
errors: No known data errors
|
虽然SATA可以热插拔,保险起见我还是关机了。不过忘了在关机前将坏的硬盘offline,这也没什么问题。
将旧硬盘拆下,新硬盘装上后开机。
此时zfs已经识别到旧硬盘UNAVAILABLE了,这时将它OFFLINE:
zpool offline rpool 10409275789507660143
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
root@pve:~# zpool status
pool: rpool
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: scrub repaired 6.81M in 02:41:15 with 0 errors on Sun Mar 13 03:05:17 2022
config:
NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
10409275789507660143 OFFLINE 0 0 0 was /dev/disk/by-id/ata-WDC_WD5000BPKT-75PK4T0_WD-WX11A43M7165-part3
ata-WDC_WD5000BPKT-75PK4T0_WD-WX11A43M7336-part3 ONLINE 0 0 0
errors: No known data errors
|
由于用过的硬盘,需要先擦除。
在pve管理界面,选择pve -> disks,找到新硬盘,选择Wipe Disk:
更好的方法是从好的硬盘复制分区表:
1
2
|
sgdisk /dev/sdb -R /dev/sdc ## sdb的分区表复制到sdc
sgdisk -G /dev/sdc ## 重新生成UUID
|
用/dev/sda这样的方式不太安全,因为更换线缆位置后可能会出现变化,最好用id更保险。
进入pve的shell,ls /dev/disk/by-id
,找到新硬盘的id。
1
2
3
4
5
6
|
root@pve:/dev/disk/by-id# ls
ata-ST500LX012-1LM162-SSHD_W3N179MS ata-WDC_WD5000BPKT-75PK4T0_WD-WX11A43M7336 wwn-0x50014ee00410d1af wwn-0x50014ee658bf9463-part1
ata-WDC_WD30PURX-64P6ZY0_WD-WMC4N0L041D9 ata-WDC_WD5000BPKT-75PK4T0_WD-WX11A43M7336-part1 wwn-0x50014ee00410d1af-part1 wwn-0x50014ee658bf9463-part2
ata-WDC_WD30PURX-64P6ZY0_WD-WMC4N0L041D9-part1 ata-WDC_WD5000BPKT-75PK4T0_WD-WX11A43M7336-part2 wwn-0x50014ee00410d1af-part2 wwn-0x50014ee658bf9463-part3
ata-WDC_WD30PURX-64P6ZY0_WD-WMC4N0L041D9-part2 ata-WDC_WD5000BPKT-75PK4T0_WD-WX11A43M7336-part3 wwn-0x50014ee00410d1af-part3
ata-WDC_WD30PURX-64P6ZY0_WD-WMC4N0L041D9-part3 wwn-0x5000c50082b85d27 wwn-0x50014ee658bf9463
|
zpool replace -f rpool ata-WDC_WD5000BPKT-75PK4T0_WD-WX11A43M7336-part3 ata-HGST_HUS726T4TALE6L4_V1G7DWXC-part3 ## 注意这里是第三分区,如果是新增硬盘则将replace换成attach,后接pool的剩下的分区和新分区
-f – 强制
上面的命令没有输出,需要再次查看:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
root@pve:/dev/disk/by-id# zpool status
pool: rpool
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sun Mar 13 23:33:50 2022
88.1G scanned at 534M/s, 2.38G issued at 14.4M/s, 187G total
2.45G resilvered, 1.28% done, 03:37:59 to go
config:
NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
replacing-0 DEGRADED 0 0 0
10409275789507660143 OFFLINE 0 0 0 was /dev/disk/by-id/ata-WDC_WD5000BPKT-75PK4T0_WD-WX11A43M7165-part3
ata-ST500LX012-1LM162-SSHD_W3N179MS ONLINE 0 0 0 (resilvering)
ata-WDC_WD5000BPKT-75PK4T0_WD-WX11A43M7336-part3 ONLINE 0 0 0
errors: No known data errors
|
此时已经在重建mirror,大概需要几个小时吧,看硬盘大小。
重建结束后可能会自动online,否则就手动online一下。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
root@pve:~# zpool status
pool: rpool
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: resilvered 190G in 02:59:24 with 0 errors on Mon Mar 14 02:33:14 2022
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-ST500LX012-1LM162-SSHD_W3N179MS ONLINE 0 0 0
ata-WDC_WD5000BPKT-75PK4T0_WD-WX11A43M7336-part3 ONLINE 0 0 0
errors: No known data errors
|
在新磁盘重建好之前,这段时间不能重启。
1
2
3
|
proxmox-boot-tool format /dev/sdc2
proxmox-boot-tool init /dev/sdc2 ## 如果失败,则apt install systemd-boot
proxmox-boot-tool refresh
|
如果是换的更大的硬盘,则需要调整大小
1
2
3
4
5
6
7
8
9
10
11
|
# resize partition 3 of sdc to use 50% of the available space (partition 3 is the ZFS partition)
parted /dev/sdb resizepart 3 50%
# expand zfs on sdc to use the entire expanded partition
zpool online -e rpool ata-HGST_HUS726T4TALE6L4_V1G7DWXC-part3
# resize partition 3 of sdd to use 50% of the available space (partition 3 is the ZFS partition)
parted /dev/sdc resizepart 3 50%
# expand zfs on sdd to use the entire expanded partition
zpool online -e rpool ata-ST4000NM0115-1YZ107_ZC123Y73-part3
|
-e – 扩容
smartctl -a /dev/sda