ZFS, surprise resilvering, dumb tricks

December 19, 2025 - 8 mins read

Once upon a time I got a 5-bay QNAP, and all was well until it got sulky and one of its SATA connectors decided to give up on life. Drafted in its stead, I’ve had a ZFS NAS setup for ages–going back to Fall of 2019. 8 x 4TB drives, Norco ITX-S8, 32GB ram, Ryzen 2400G, nothing too fancy. NixOS, ZFS with raidz2, great.

I ran a handy little tool I slopped up and lo-and-behold all the drives are nearly at five and a half years of online time; this is solidly in their golden years. I’d been meaning to upgrade the array to larger disks, since a good amount of space is going (cheerfully!) to error-correction and parity, and so had begun collecting replacement disks in anticipation of an upgrade. Rebuilding the array would be time-consuming and annoying and so I’d been putting it off.

Well, the week before Christmas and finally hit with inspiration, I started reorganizing all of my different linux ISOs by source and genre and album and whatnot, and also checking through old backups. Out of curiosity, I ran my health script again, and now SMART errors cropped up. Worse, ZFS had now decided that one of the disks was degraded, and so the whole zpool had begun its swan song.

It appeared that it was now time to do something.

Interlude: ZFS? Is that some kinky BSD thing?

ZFS is one of the projects that escaped from the lab over at Sun Microsystems. It’s probably the last filesystem you’ll ever need unless you’re doing embedded stuff or networked filesystem stuff. While I am not an expert in all the things it can do, the things that it can do that I care about are:

Data integrity and checksumming. Everything in the filesystem is monitored for bitrot, and even if you can’t fix the problem you at least can find out that it happened.
Super-easy JBOD. By default, you can create a pool of storage out of whatever disks you have lying around. More advanced stuff requires forethought, but if you just want to make a JBOD this is great.
Built-in RAID features. If you’re willing to do a tiny bit of work and sacrifice some storage (which I emphatically am, for reasons that will become obvious as this tale continues), you can easily configure disk mirroring and striping and parity, and use something called RAID-Z.
Snapshotting. You can take easy snapshots of the filesystem almost instantly at any point, and fall back to them. Further, you can trivially explore those as read-only directories without any mounting shenanigans.

The organization of ZFS is:

You have some pile of physical disks.
You combine one or more of the physical disks into a vdev (virtual device). This is where you do things like raid striping and mirroring.
You then combine vdevs into pools.
Pools can then be chopped up into one or more datasets or ZVOLS (z volumes, which are block devices that can be mounted as though they were a physical disk and another FS layered over them…or used as the backing store for VMs). Either of those is where we apply encryption, compression, and so forth.
Finally, datasets are mounted and act as normal directories.

So, you can think of it as disk -> vdev -> pool -> dataset (or zvol).

To add to the fun, you can further divide datasets into sub-datasets (and sub-sub-datasets, ad infinitum), which can have different properties (keys, compression, max sizes, etc.).

The “Doing Something”

I had the drives (8 new 8TB drives, accumulated over a few years and waiting–boxed–for their moment). I had the NAS (only one disk failing!). I had Claude to spot-check me on some things, and manuals for the rest. It was time to go.

Step zero was to setup a constant watch of the array status via watch -n 1 zpool status.

Aside

zpool status gives you output like this:

[root@myserver:/home/user]# zpool status
  pool: tank
 state: ONLINE
  scan: scrub repaired 0B in 03:29:38 with 0 errors on Sat Jan  3 13:50:49 2026
config:

	NAME                                        STATE     READ WRITE CKSUM
	tank                                        ONLINE       0     0     0
	  raidz2-0                                  ONLINE       0     0     0
	    ata-ACME_HD8000_7KX9B284PLMN            ONLINE       0     0     0
	    ata-INITECH_IV8000-2A_QRS44TN7          ONLINE       0     0     0
	    ata-STARDRV_SD8005X-22B_SD-VV73HNQX     ONLINE       0     0     0
	    ata-ACME_HD8000_2NW3C197PLMN            ONLINE       0     0     0
	    ata-ACME_HDX8000_88K2B003RTPQ           ONLINE       0     0     0
	    ata-ACME_HD8800EZSVA_55X7C2MGJK92       ONLINE       0     0     0
	    ata-ACME_HDX8000_33J7B448RTPQ           ONLINE       0     0     0
	    ata-ACME_HD8000_9KX1B773PLMN            ONLINE       0     0     0

errors: No known data errors

In my case, one of the drives had started reporting read errors, so that’s what kicked all this off.

The first step was to yank the failing drive, swap a new one into its caddy, and reinsert. Two choices by Past Chris here were very helpful:

Deciding to purchase a case that used hard-drive caddies/sleds instead of needing to open the case and unscrew things (so, I got to leave the NAS in situ).
Deciding–and this was a lifesaver–to print out little labels with the leading parts of the serial numbers and place them next to the drive slots. Failure to have labeled drives would’ve been a big annoyance and would’ve led to trouble.

The second step was to run zpool replace tank ata-INITECH_IV8000 /dev/disk/by-id/ata-INITECH_VI9876 (ignore the silly serials). Theoretically we could’ve/should’ve used zpool offline tank ata-INITECH_IV8000, but yanking the disk out and replacing it left ZFS in a state to be like “oh, um, guess we’re replacing disks!”.

Aside

As a sanity check, it’s helpful to figure out which drives maps to which ID, and so:

[root@myserver:/home/user]# lsblk -o NAME,SIZE,MODEL,SERIAL,MOUNTPOINT
NAME          SIZE MODEL                  SERIAL         MOUNTPOINT
sda           7.3T ACME HDX8000           33J7B448RTPQ
├─sda1        7.3T
└─sda9          8M
sdb           7.3T ACME HD8800EZSVA       55X7C2MGJK92
├─sdb1        7.3T
└─sdb9          8M
sdc           7.3T INITECH IV8000-2A      QRS44TN7
├─sdc1        7.3T
└─sdc9          8M
sdd           7.3T ACME HD8000            2NW3C197PLMN
├─sdd1        7.3T
└─sdd9          8M
sde           7.3T ACME HD8000            9KX1B773PLMN
├─sde1        7.3T
└─sde9          8M
sdf           7.3T ACME HD8000            7KX9B284PLMN
├─sdf1        7.3T
└─sdf9          8M
sdg           7.3T ACME HDX8000           88K2B003RTPQ
├─sdg1        7.3T
└─sdg9          8M
sdh           7.3T STARDRV SD8005X-22B    VV73HNQX
├─sdh1        7.3T
└─sdh9          8M
zd0           410G
nvme0n1     953.9G BLAZEDISK NV1000       44N82910573718
├─nvme0n1p1 938.5G                                       /
├─nvme0n1p2  14.9G                                       [SWAP]
└─nvme0n1p3   487M                                       /boot

This, for the blank new disks, would show empty partitions, and provide a nice confirmation that the disk we were about to swap was correct.

The thing to note about all of this process is that each disk, being 4TB, took a long time to replace (resilver, in the terminology of ZFS). Hours and hours–pop it last thing in the wee hours of the night, and maybe it’d be done by afternoon. This was annoying to schedule around (especially as my holiday trip drew near) but otherwise uneventful.

Uneventful until it’s not

Except. Except.

The resilvering process involves basically reading and writing across all the other disks in the array, generating a tremendous amount of I/O traffic. Because of how hard-disks are implemented, this creates a tremendous amount of heat. The case I’d chosen did not have good air circulation.

And so, about 97% of the way through my third disk swap, another disk started throwing errors.

I checked smart, and sure enough that disk had registered a max temp of 91C. Disk drives are not meant to run at 91C–you really want them in the 30s or worst-case 40s if you can avoid it. The poor disk was cooking, and due to that cooking it was having reallocated sectors (a number that kept climbing every time I checked on it, which was terrifying). That sector reallocation additionally slowed down the whole process and so I watched as the array crawled through the finish line, sweating the entire time.

If I had lost a third disk, I would’ve lost all my data (unless I wanted to pay somebody to attempt recovery), and that would’ve been a catastrophe. Using raidz2 saved my ass.

Investigating the cooling problems, I discovered that both fans on the back had died and stopped turning at some point in the past, and so I swiped my partner’s office fan to blow on the array while doing all the work. I’d later swap the fans out for Noctuas and then pin them at max, keeping thermals reasonable.

At the very end, I have a single failure on one of the new drives (according to ZFS; smart thinks it was fine), and cleared it out via zpool clear tank. It’s been quiet ever since.

Expanding the pool

So, at this point, I’ve gotten all the drives swapped over. Since the new drives are all 8TB instead of 4TB, I need to expand the raid. This is a relatively recent feature in ZFS, and I’ve been waiting on it for years. This, finally, was the time to do it.

You might expect that this would be complicated. However, you see, ZFS is magic:

# zpool set autoexpand=on tank

Then, zpool scrub for good measure, and we’re all set. Expanded pool is 12.8TB used and 28.4TB available. Great success!

Conclusion

ZFS is awesome, and higher raidz is good for stress levels even while your array is literally cooking itself.

Also, maybe check on your drive arrays if they’ve been continuously spinning for more than 5 years…

Want to discuss this post? | Discuss via email | Hacker News | Lobsters |