Disk Maintenance
This page collects all sorts of information about hard disks and file systems, and how to maintain them.
Glossary
- CHS
- Cylinder-head-sector. A low-level addressing scheme that locates blocks of data on a disk by assigning them an address that is modelled after the physical layout of the disk. The head identifies the disk in a vertical stack of disks, the cylinder identifies the circular track on the disk, and the sector identifies the sector on that track. The main difficulty of the CHS addressing scheme is that it depends on the geometry of the disk, and thus is highly disk/vendor-specific. The CHS addressing scheme has been superseded by the simpler LBA scheme.
- Block
- File systems organize data in blocks. These are different from LBA blocks, both in terms of address (because a file system can start in the middle of a disk) and size.
- Degraded
- A RAID array is degraded if it is operational but some of its devices are down.
- Disk geometry
- Describes the physical properties of a disk, e.g. how many heads the disk has. The disk geometry is important when using the CHS addressing scheme.
- GPT
- GUID Partition Table. A standard for the layout of the partition table on a physical hard disk, using globally unique identifiers (GUID). This partition table format is more modern and overcomes the limitations of the MBR partition table format because it uses 64 bits for storing addresses and size values.
- inode
- A data structure that represents a filesystem object, such as a file or a directory. The inode stores information about the filesystem object, not its content, thus it can be seen as storing the metadata of the filesystem object. Every inode has a unique inode number. inodes are exposed by various UNIX commands, such as
ls -i /path/to/file
. inodes and filesystem blocks are not the same, inodes are on a higher abstraction level. - Hot spare disk
- A hard disk connected to a RAID system that is acting as a reserve disk. A hot spare disk does not take part in normal RAID activities, it waits on standby until one of the active disks fails, at which time the hot spare disk takes over for the failed disk. Takeover can happen automatically or on manual request, depending on the configuration of the RAID system.
- LBA
- Logical Block Addressing. A simple addressing scheme that linearly assigns integer indexes to blocks of data on a disk, with the first block being LBA 0, the second LBA 1, etc. This addressing scheme no longer requires any knowledge about the geometry of a disk. The Wikipedia article has information how to calculate the LBA address from a CHS address. There is no standard size for LBA blocks.
- LUKS
- Linux Unified Key Setup. A disk encryption specification. The spec describes the platform-independent on-disk format.
- MBR
- Master Boot Record. A special type of boot sector at the very beginning of a hard disk. Among other things, the MBR also contains the disk's partition table in a format that is different from the more modern GPT format. For disks with 512-byte sectors, an MBR partition table is limited to a size of 2 TB because it uses 32 bits for storing addresses and size values.
- Reallocation
- Disk drives have a relatively small area of spare room that is not accessible under normal circumstances. If a bad sector is encountered, the disk drive will reallocate the bad sector to a good sector from the spare area - from now on whenever someone tries to read the bad sector, the drive transparently re-routes the request to the good sector from the spare area. Although reallocation keeps the drive in a usable state, it also has a serious drawback: Reading from / writing to reallocated sectors will slow down the drive's read/write performance. How does reallocation occur? When a disk drive encounters a read error while reading a sector, that sector is added to a list of sectors that are pending reallocation. The number of sectors pending reallocation is reported in SMART attribute 197. Additional read errors for the same sector won't change anything, but if another read of the same sector succeeds, the reallocation occurs, with the successfully read data being stored in the good sector from the spare area. If a write error occurs for a sector, that sector is immediately reallocated, regardless of whether it is currently pending reallocation or not.
Disk partitioning
TODO: Write about partition table formats (specifically GPT and MBR) and the two utilities fdisk and parted. Also investigate LVM.
File system
TODO: Write about ext2, ext3, ext4. Provide details about file system checks and how to reboot the system into a state where these can be performed.
This command performs a non-destructive bad blocks file system check. If -c
is specified only once the check is performed read-only. The option -k
specifies that the existing list of bad blocks on the file system should be preserved and any newly found bad blocks should be added.
e2fsck -c -c -k -v /dev/sda8
This command lists the bad blocks of a file system:
dumpe2fs -b /dev/sda8
This command lists detailed information about the file system, such as its UUID, inode count/size and block count/size:
tune2fs -l /dev/sda8
Utilities
hdparm
hdparm
is a utility that bypasses the filesystem and directly interacts with the disk via kernel interfaces. It's normal usage is to get/set SATA/IDE device parameters.
Display a large number of information about the disk:
# hdparm -I /dev/sda /dev/sda: ATA device, with non-removable media Model Number: Hitachi HTS545050B9SA02 Serial Number: 091213PBL400Q7JVN3ZV Firmware Revision: PB4AC60Q Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6; Revision: ATA8-AST T13 Project D1697 Revision 0b Standards: Used: unknown (minor revision code 0x0028) Supported: 8 7 6 5 Likely used: 8 Configuration: Logical max current cylinders 16383 16383 heads 16 16 sectors/track 63 63 -- CHS current addressable sectors: 16514064 LBA user addressable sectors: 268435455 LBA48 user addressable sectors: 976773168 Logical/Physical Sector size: 512 bytes device size with M = 1024*1024: 476940 MBytes device size with M = 1000*1000: 500107 MBytes (500 GB) cache/buffer size = 7208 KBytes (type=DualPortCache) Form Factor: 2.5 inch Nominal Media Rotation Rate: 5400 [...]
Read the contents of a sector by issuing a low-level read. The sector number is specified as a base10 value. The data is printed in hex format.
# hdparm –-read-sector 170418806 /dev/sda /dev/sda: reading sector 170418806: succeeded 0c36 3139 312e 3132 312e 2e33 3639 ca08 [...]
Overwrite the contents of a sector by issuing a low-level write. The sector number is specified as a base10 value. The data written consists of zero-value bytes. WARNING: EXTREMELY DANGEROUS !!!
# hdparm –-write-sector 170418806 /dev/sda
dd
TODO: Write about the low-level utility dd
.
blkid
The utility blkid
is useful to list block devices and their UUIDs. A device's UUID can then be used to add a stable entry to /etc/fstab
, i.e. an entry which does not need to be changed even if the block device is a removable disk of some sort whose device node might change each time it is plugged in.
Note: blkid
does not list dm-crypt
devices, the UUIDs for these devices must be found by a filesystem-specific utility which examines the device node under /dev/mapper
. Example: btrfs filesystem show /dev/mapper/encrypted-backup
.
SMART
SMART (Self-Monitoring, Analysis and Reporting Technology) is a low-level monitoring technology built into most modern hard disk drives. Debian provides two packages with utilities that can be used to control the SMART features of a harddisk:
smartmontools
libatasmart-bin
I have only tried out smartmontools
, so here's some information about the package. Installing the package provides
- A continuously running daemon
smartd
- The command-line utility
smartctl
Print SMART information
With the 500 GB Hitachi (IBM) drives in my MacMini I can output all sorts of interesting information with this command. Using -x
also prints out non-SMART information about the drive (e.g. temperature statistics), if you only want SMART data then use -a
instead of -x
.
smartctl -x /dev/sda
Drive detection fails
For some drives smartctl
fails to autodetect the device type. Example:
$ sudo smartctl -a /dev/sdd [...] /dev/sdd: Unknown USB bridge [0x1058:0x2620 (0x1020)] Please specify device type with the -d option. [...]
In such a case you must manually specify the device type using the -d
option. Read the man page for details on what values can be specified for the -d
option.
If you don't know anything about the device type, but you see a filesystem-specific device file (e.g. /dev/sdd1
instead of just /dev/sdd
), then you can try to specify this specific device file to smartctl
to obtain some basic initial information about the drive:
$ sudo smartctl -a /dev/sdd1 [...] === START OF INFORMATION SECTION === Vendor: WD Product: Elements 2620 [...] SMART support is: Unavailable - device lacks SMART capability. [...]
Equipped with this information you may be able to find out more about the drive via Internet research. A typical device type to try is sat
(for SCSI to ATA Translation (SAT)). This can be specified as follows:
smartctl -a -d sat /dev/sdd
or even more specific by forcing a byte-length for the "ATA PASS THROUGH SCSI command":
smartctl -a -d sat,16 /dev/sdd
Testing a drive
SMART offers a number of different types of tests that you can perform:
- Offline test. A relatively quick test (ca. 10 minutes on one of my Hitachi drives). I don't know exactly what checks this test performs.
- Short self-test: Another relatively quick test (ca. 2 minutes on one of my Hitachi drives) that checks the electrical and mechanical performance as well as the read performance of the entire disk.
- Long self-test: A long test (ca. 2.5 hours on one of my Hitachi drives) that performs more thorough checks than the short self-test.
- Selective self-test: With this test you can specify up to five ranges of LBAs to check instead of the entire disk. The duration of this test obviously varies with the number of LBAs that need to be checked. I am not sure if the checks performed are the same as in the short or in the long self-test, but I assume it's the latter.
- For more tests see "man smartctl"
All tests are initiated with the -t
option. Use -c
to check whether a test is still running. Once complete the results of a test can be viewed with the -l
option, followed by the appropriate log type. Only one test can be running at the same time; a running test can be aborted with -X
. Here are the commands:
# Run offline test smartctl -t offline /dev/sda # Check whether the test is still running smartctl -c /dev/sda # View test results once the test is complete smartctl -l xerror /dev/sda # Run short self-test, check whether it's still running, then view final results smartctl -t short /dev/sda smartctl -c /dev/sda smartctl -l xselftest /dev/sda # Run long self-test, check whether it's still running, then view final results smartctl -t long /dev/sda smartctl -c /dev/sda smartctl -l xselftest /dev/sda # Run selective self-test, check whether it's still running, then view final results smartctl -t select,154135527-154135527 /dev/sda smartctl -c /dev/sda smartctl -l xselftest /dev/sda # View details of the five LBA spans smartctl -l selective /dev/sda
Here's the example output after I ran a couple of self-tests:
$ smartctl -l xselftest /dev/sda === START OF READ SMART DATA SECTION === SMART Extended Self-test Log Version: 1 (1 sectors) Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 50% 43263 153908424 # 2 Selective offline Completed without error 00% 43262 - # 3 Short offline Completed without error 00% 43262 -
md
References
Summary
md
stands for "multiple device" and is usually used to run a software-level RAID system. mdadm
is the command line utility used to configure and maintain the RAID system.
md
assembles same-size partitions on multiple physical devices into a single logical md
device. The conventional physical device is a SATA hard disk, but the md
system is flexible enough to allow any block device to be used to form a md
device - even USB disks or network devices. However, if a device is prone to become unresponsive even for short times (e.g. a USB hard disk that is spinning down) then it is probably unsuitable for md
because the md
system might decide that the device has failed.
md devices
md
devices are exposed by the Linux kernel as /dev/mdX
block devices. The devices are also available as /dev/md/X
, which are symlinks pointing to the actual device nodes. Example:
root@pelargir:~# ls -l /dev/md total 0 lrwxrwxrwx 1 root root 6 Jun 2 03:28 0 -> ../md0 lrwxrwxrwx 1 root root 6 Jun 2 03:28 1 -> ../md1 lrwxrwxrwx 1 root root 6 Jun 2 03:28 2 -> ../md2 root@pelargir:~# ls -l /dev/md? brw-rw---- 1 root disk 9, 0 Jun 2 03:28 /dev/md0 brw-rw---- 1 root disk 9, 1 Jun 2 03:28 /dev/md1 brw-rw---- 1 root disk 9, 2 Jun 2 03:28 /dev/md2
/proc/mdstat
The Linux kernel exposes the status of all md
devices in the file /proc/mdstat
. However, if an md
device is stopped it is not listed there. Example:
root@pelargir:~# cat /proc/mdstat Personalities : [raid1] md2 : active raid1 sda6[0](F) sdb6[1] 482394112 blocks super 1.2 [2/1] [_U] bitmap: 3/4 pages [12KB], 65536KB chunk md1 : active raid1 sda5[0] sdb5[1] 3903488 blocks super 1.2 [2/2] [UU] md0 : active raid1 sda1[0] sdb1[1] 1950720 blocks super 1.2 [2/2] [UU] unused devices: <none>
Discussion (taken from the Linux RAID wiki):
- The "Personalities" line states which RAID levels the Linux kernel currently supports. Kernel modules add support for more RAID levels.
- The details for each
md
device are given on multiple lines. The first line is prefixed with the name of themd
device (e.g. "md2:"). - Line 1
- The first line states whether the device is "active" or "inactive". An inactive device is usually faulty.
- The first line also states the device's RAID level
- Finally, the first line lists all physical devices that are part of the
md
device. Each physical device has a suffix "[n]" which indicates the physical device's role in the RAID system. For instance, in a RAID system that consists of 2 devices:- The roles [0] and [1] denote the first and the second disk of the RAID system
- Any roles with higher numbers (e.g. [2]) denote spare disks
- If a physical device has failed, the indicator "(F)" appears after the physical device's RAID role suffix. This can be observed in the example above for the
md2
device.
- Line 2
- The second line provides information about the
md
device's size (e.g. "482394112 blocks") and the superblock format that is in use (e.g. "super 1.2") - The "[n/m]" indicator on the second line states that the RAID system consists of "n" devices, of which "m" devices are currently active. If "m" is lower than "n" the RAID system is in trouble!
- The "[UU]" indicator lists the status of each physical device. "U" = the device is up, and "_" = the device is down.
- The second line provides information about the
- Line 3
- This line appears only if the
md
device has a so-called "write-intent bitmap". I won't pretend to understand what this is, so read it up yourself.
Replacing a disk
If the md
system detects a problem with a physical device, it marks that device as "faulty" and stops using the device. The device is said to have failed.
Depending on the RAID level and the current situation, the md
device as a whole may still be able to function. If not, then data recovery of a whole different level is in order. This is not described here.
Now that we have established that the md
device still works, the following situations can be distinguished:
- If the
md
device has spare disks, themd
system automatically starts to use the first available spare disk and begins reconstruction of the data that was on the failed disk. - If the
md
device has no spare disks, themd
device is now running in "degraded" mode. This is a dangerous situation because there is now no redundancy left in the RAID system. If another disk fails, the entiremd
device will fail!
Regardless of whether there are spare disks, the faulty disk must now be replaced. The procedure works like this:
- The faulty disk must first be removed from the
md
device. Example:
mdadm /dev/md2 --remove /dev/sda6
- Now you can phyiscally remove the faulty disk from the system and replace with a good one
- TODO: Do I need to power down the system to do this? Apparently the SATA interface was designed with hotplug support, but not all chipsets may support it.
- TODO: What do we need to do with the new disk?
- Boot sector?
- Partitioning?
- UUIDs in
/etc/mdadm/mdadm.conf
?
- Add the new device. This triggers a rebuild of the data onto the new device:
mdadm /dev/md2 --add /dev/sda6
One might decide that a physical device needs to be proactively replaced (e.g. because SMART reports indicate that the device has a problem or nears end-of-life). In such a case the device needs to be manually marked as faulty. After that, the same procedure as described above applies. The following example command marks the physical device sdc2
as faulty on the md1
logical device:
mdadm --manage --set-faulty /dev/md1 /dev/sdc2
Monitoring
mdadm
can be run in automated monitoring mode. On a Debian system this is enabled in /etc/default/mdadm
by setting
START_DAEMON=true
On Debian you also have the option to periodically run redundancy checks over md
devices using the utility checkarray
. This is enabled in /etc/default/mdadm
by setting
AUTOCHECK=true
(the actual check is run by default on the first Sunday each month, see /etc/cron.d/mdadm
)
What happens if I forget to remove a disk from the RAID array before it is replaced?
The situation
In the section "Replacing a disk" I wrote that you need to remove a disk from the md
device before you can replace it. But what happens if you "forget" to perform this removal?
This is the situation:
md
has marked a disk faulty and the RAID array is in a degraded state- You don't remove the disk from the RAID array, instead you simply shut down the system
- You replace the faulty disk with a new, good one and reboot the system
The result is that the system reboots fine, but neither the old, faulty disk nor the new, good disk are listed in /proc/mdstat:
root@pelargir:~# cat /proc/mdstat Personalities : [raid1] md2 : active raid1 sda6[1] 482394112 blocks super 1.2 [2/1] [_U] bitmap: 4/4 pages [16KB], 65536KB chunk md1 : active (auto-read-only) raid1 sda5[1] 3903488 blocks super 1.2 [2/1] [_U] md0 : active raid1 sda1[1] 1950720 blocks super 1.2 [2/1] [_U] unused devices: <none>
mdadm --detail
reports the array as degraded and lists the missing device with state "removed":
root@pelargir:~# mdadm --detail /dev/md2 /dev/md2: Version : 1.2 Creation Time : Wed Jun 1 11:44:45 2016 Raid Level : raid1 Array Size : 482394112 (460.05 GiB 493.97 GB) Used Dev Size : 482394112 (460.05 GiB 493.97 GB) Raid Devices : 2 Total Devices : 1 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Thu Nov 10 20:44:50 2016 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Name : GDP-LIN-228:2 UUID : ab9e3d36:c8fd57b7:99789424:8e585f00 Events : 577151 Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 6 1 active sync /dev/sda6
The new, good disk is not formatted, nor even partitioned correctly:
root@pelargir:~# fdisk -l /dev/sdb Disk /dev/sdb: 465.8 GiB, 500107862016 bytes, 976773168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0x00071581 Device Boot Start End Sectors Size Id Type /dev/sdb1 2048 976769023 976766976 465.8G 7 HPFS/NTFS/exFAT
The solution, part 1
First, remove the missing device from the RAID array. The problem here is that the block device is no longer present, so you can't use the normal command because that would references a non-existing device:
mdadm /dev/md2 --remove /dev/sdb6
Instead you can issue the following command, which removes all devices that are no longer connected to the system:
mdadm /dev/md0 --remove detached
Kudos: Found this here: http://www.linuxquestions.org/questions/linux-newbie-8/mdadm-cannot-remove-failed-drive-drive-name-changed-782827/
Note that the keyword "detached" can also be used with "mdadm --fail", to mark disconnected devices as failed so that they can then be removed.
The solution, part 2
Now you can partition the disk using fdisk
to match the partitions of the other disks. The following command lists the partitioning of the disk that is still good:
root@pelargir:~# fdisk -l /dev/sda Disk /dev/sda: 465.8 GiB, 500107862016 bytes, 976773168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0x3ba6d316 Device Boot Start End Sectors Size Id Type /dev/sda1 * 2048 3905535 3903488 1.9G fd Linux raid autodetect /dev/sda2 3907582 976771071 972863490 463.9G 5 Extended /dev/sda5 3907584 11718655 7811072 3.7G fd Linux raid autodetect /dev/sda6 11720704 976771071 965050368 460.2G fd Linux raid autodetect
All you have to do now is to manually re-create this partitioning in fdisk
's interactive mode, and then to format the partitions with a filesystem. This is left as an exercise to the reader.
Note: I don't know if there is some automated way how to clone the partition table from one disk to another. I read something about a utility called sfdisk
, but when I read that even the partition UUIDs are cloned I became hesitant and in the end decided to perform the partitioning manually (including the partition type "fd" = Linux raid autodetect, and marking the first partition as bootable).
The solution, part 3
Finally, add the newly partitioned and formatted block devices to the RAID array
mdadm /dev/md2 --add /dev/sdb6
Note that when you do this, the RAID array immediately starts to rebuild. You can watch the progress with this command:
root@pelargir:~# mdadm --detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Wed Jun 1 11:43:31 2016 Raid Level : raid1 Array Size : 482394112 (460.05 GiB 493.97 GB) Used Dev Size : 482394112 (460.05 GiB 493.97 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Update Time : Thu Nov 10 21:31:30 2016 State : clean, degraded, recovering Active Devices : 1 Working Devices : 2 Failed Devices : 0 Spare Devices : 1 Rebuild Status : 30% complete Name : GDP-LIN-228:2 UUID : ab9e3d36:c8fd57b7:99789424:8e585f00 Events : 577151 Number Major Minor RaidDevice State 2 8 22 0 spare rebuilding /dev/sdb6 1 8 6 1 active sync /dev/sda6
A final note: The configuration file /etc/mdadm/mdadm.conf
lists md devices with a UUID. When you run blkid
, this utility also prints out UUIDs for the md devices. I noticed that the two sets of UUIDs do not match, but apparently this is OK - at least I had no problem running such a setup for over a year.
Hard disk encryption with LUKS
References
- HOWTO article that shows step-by-step how to configure a LUKS/dm-crypt partition using
cryptsetup
. Almost all of the information in the following sections comes from this article. - Block-layer Encryption (archive.org link). An in-depth article about LUKS/dm-crypt, starting with theory and ending with many practical examples about its usage.
man cryptsetup
Glossary
- LUKS
- Linux Unified Key Setup. A specification that describes a platform-independent on-disk format. The format consists of three parts:
- A standardized header at the start of the device
- A key-slot area directly behind the header
- And the bulk data area behind that.
- LUKS container
- The set consisting of three parts (header, key-slot area, data area) described by the LUKS specification.
- LUKS device
- The physical device that a LUKS container resides on.
- dm-crypt
- A disk encryption backend in the Linux kernel.
dm-crypt
can be used to create a partition that conforms to the LUKS specification, or it can be used raw/standalone in which case thedm-crypt
partition does not use the LUKS format. - cryptsetup
- Is the user front-end tool to configure a LUKS/
dm-crypt
partition.
Packages
The Debian package to install is
cryptsetup
An optional package to monitor dd
is
pv
Initialize a LUKS partition
The following command initializes an existing partition to use the LUKS format. Warning: This will destroy all data on the partition!!!
cryptsetup --verify-passphrase --verbose luksFormat /dev/sda1
Discussion:
- You must enter the passphrase interactively. Because of
--verify-passphrase
, the command asks for the passphrase twice. The passphrase you enter here cannot be recovered later, so make sure you write it down somewhere secure. - Alternatively you can use the
--key-file
option to specify a file that contains the passphrase. The file name "-" causes the command to read the passphrase fromstdin
. - The encryption key that is protected by the passphrase is assigned to key-slot 0. You can use the
--key-slot
option to override this. The LUKS partition has a total of 8 key-slots. - The cipher used for encryption is the LUKS default cipher (
aes-xts-plain64
at the time of writing). Use the--cipher
option to specify a different cipher.
Open / close a LUKS partition
The next step is to "open" the LUKS partition, making it accessible as a normal block device under a user-friendly name ("encrypted-backup" in this example). Note: You have to enter the passphrase interactively in order to open the LUKS partition. Use the --key-file
option to specify a file that contains the passphrase.
cryptsetup luksOpen /dev/sda1 encrypted-backup
The LUKS partition is now accessible to any filesystem-related commands (typically mount
) as
/dev/mapper/encrypted-backup
When you no longer need the LUKS partition you can close it:
cryptsetup luksClose encrypted-backup
Behind the scenes, opening a LUKS partition generates a dm-crypt
device and adds a mapping from that device to the requested user-friendly name. If you want to know the dm-crypt
device name you can have a look at the mapping simply like this:
pi@raspberrypi1:~$ ls -l /dev/mapper/encrypted-backup lrwxrwxrwx 1 root root 7 Dec 29 18:01 /dev/mapper/encrypted-backup -> ../dm-0
Inspect a LUKS partition
To see the status of a LUKS partition, including the cipher used for encryption, you use this command:
cryptsetup status encrypted-backup
The next command dumps the LUKS header information. Warning: From my limited understanding of the cryptographic background, this information may contain sensitive material!
cryptsetup luksDump /dev/sda1
Managing passphrases
A LUKS container can have up to 8 passphrases. Each passphrase is assigned to a key-slot numbered in the range from 0 to 9.
The following command changes the passphrase for key-slot 0. Use the --key-slot
option to specify a different key-slot. Obviously, the old passphrase must be entered to authorize the change.
cryptsetup luksChangeKey /dev/sda1
Warning: A media failure during this operation can cause the overwrite to fail after the old passphrase has been wiped and make the LUKS container permanently inaccessible! For this reason it is better/safer to first add a new passphrase, and then remove the old passphrase after the new one was added successfully. See the next two commands.
The following command adds a new passphrase to the first empty key-slot. Use the --key-slot
option to specify a different key-slot. One of the passphrases already present on the LUKS device must be entered to authorize the change.
cryptsetup luksAddKey /dev/sda1
The following command removes a passphrase from the LUKS device. Whichever passphrase is entered is removed.
cryptsetup luksRemoveKey /dev/sda1
Warning: Removing the last passphrase makes a LUKS container permanently inaccessible! If you enter the passphrase interactively or supply it via the --key-file
option, the command will warn you about the impending disaster (but you can still go ahead and ignore the warning). If you supply the passphrase via stdin
, the command will give no warning because it works in batch mode.
Create a filesystem on a LUKS partition
In the previous sections we have seen how to
- Initialize a LUKS device and create a LUKS container on the device (
luksFormat
) - Open the LUKS container (
luksOpen
)
We are now ready to create a filesystem in the data area on the LUKS container. Example for btrfs
:
mkfs.btrfs -f /dev/mapper/encrypted-backup
After that we can mount the filesystem as usual:
mount /dev/mapper/encrypted-backup /mnt/encrypted-backup
Randomize data on LUKS container
Before we create a filesystem on a LUKS container, we should make sure that anyone looking at the raw device will only see random data.
The simplest way to achieve this is to write zeroes into the data area using the dd
tool:
dd if=/dev/zero of=/dev/mapper/encrypted-backup
Note that we are writing to the dm-crypt
device, not the raw device! This ensures that the stream of zeroes is first encrypted before it is written to the physical disk. Also, the LUKS header and key-slot areas remain untouched.
The process of randomizing the LUKS container can take a very long time. If you want to have a progress report you can send the SIGUSR1
signal to the dd
process, or you can use this alternative command line:
pv -tpreb /dev/zero | dd of=/dev/mapper/encrypted-backup bs=128M
Recipes
Map a physical sector to a filesystem block
If the disk drive encounters a read/write error, it will report the number of the bad sector to the kernel and thus to the syslog. If the filesystem does not also report the filesystem block, it may be necessary to find out which filesystem block contains the physical sector.
TODO: Write how to do this.
Force reallocation of a bad sector
The glossary explains that reallocation only happens if a write error occurs, so in order to force reallocation we must write data to the bad sector. Let's look at a syslog fragment:
May 15 06:28:17 pelargir kernel: [6849466.884776] ata1.00: exception Emask 0x0 SAct 0x7000003f SErr 0x0 action 0x0 May 15 06:28:17 pelargir kernel: [6849466.885148] ata1.00: irq_stat 0x40000008 May 15 06:28:17 pelargir kernel: [6849466.885738] ata1.00: failed command: READ FPDMA QUEUED May 15 06:28:17 pelargir kernel: [6849466.886572] ata1.00: cmd 60/08:e0:6f:62:28/00:00:0a:00:00/40 tag 28 ncq 4096 in May 15 06:28:17 pelargir kernel: [6849466.886572] res 51/40:08:76:62:28/00:00:0a:00:00/40 Emask 0x409 (media error) <F> May 15 06:28:17 pelargir kernel: [6849466.888280] ata1.00: status: { DRDY ERR } May 15 06:28:17 pelargir kernel: [6849466.889113] ata1.00: error: { UNC } May 15 06:28:17 pelargir kernel: [6849466.892172] ata1.00: configured for UDMA/133 May 15 06:28:17 pelargir kernel: [6849466.892197] sd 0:0:0:0: [sda] tag#28 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE May 15 06:28:17 pelargir kernel: [6849466.892203] sd 0:0:0:0: [sda] tag#28 Sense Key : Medium Error [current] [descriptor] May 15 06:28:17 pelargir kernel: [6849466.892207] sd 0:0:0:0: [sda] tag#28 Add. Sense: Unrecovered read error - auto reallocate failed May 15 06:28:17 pelargir kernel: [6849466.892212] sd 0:0:0:0: [sda] tag#28 CDB: Read(10) 28 00 0a 28 62 6f 00 00 08 00 May 15 06:28:17 pelargir kernel: [6849466.892216] blk_update_request: I/O error, dev sda, sector 170418806
First we see that "auto reallocate failed". We also see that the bad sector is 170418806. To force the reallocate we must write to the sector - this will fail and thus trigger the drive's reallocation process.
Before we begin, let's make sure that the sector is actually bad by reading its content.
# hdparm --read-sector 170418806 /dev/sda /dev/sda: reading sector 170418806: FAILED: Input/output error
Only if the read failed do we continue with the write. IMPORTANT: Writing to the sector causes data loss !!!
hdparm –-write-sector 170418806 /dev/sda
Mark a filesystem block as bad
Marking a filesystem block as bad will prevent the filesystem from accessing the physical sectors in that block. This has the following effects:
- Sectors that are pending reallocation will remain in that state because neither reads nor writes will occur anymore for these sectors. This is good because it does not waste the disk's spare area for unnecessary reallocations. This can also be annoying because SMART monitoring tools may unnecessarily warn about the disk's perpetual state of pending reallocations (SMART attribute 197).
- Sectors that are already reallocated will no longer be accessed, thus the drive's performance will not degrade.
TODO: Write how to do this.