Disk Maintenance

From HerzbubeWiki
Jump to navigation Jump to search

This page collects all sorts of information about hard disks and file systems, and how to maintain them.


Glossary

CHS
Cylinder-head-sector. A low-level addressing scheme that locates blocks of data on a disk by assigning them an address that is modelled after the physical layout of the disk. The head identifies the disk in a vertical stack of disks, the cylinder identifies the circular track on the disk, and the sector identifies the sector on that track. The main difficulty of the CHS addressing scheme is that it depends on the geometry of the disk, and thus is highly disk/vendor-specific. The CHS addressing scheme has been superseded by the simpler LBA scheme.
Block
File systems organize data in blocks. These are different from LBA blocks, both in terms of address (because a file system can start in the middle of a disk) and size.
Degraded
A RAID array is degraded if it is operational but some of its devices are down.
Disk geometry
Describes the physical properties of a disk, e.g. how many heads the disk has. The disk geometry is important when using the CHS addressing scheme.
GPT
GUID Partition Table. A standard for the layout of the partition table on a physical hard disk, using globally unique identifiers (GUID). This partition table format is more modern and overcomes the limitations of the MBR partition table format because it uses 64 bits for storing addresses and size values.
inode
A data structure that represents a filesystem object, such as a file or a directory. The inode stores information about the filesystem object, not its content, thus it can be seen as storing the metadata of the filesystem object. Every inode has a unique inode number. inodes are exposed by various UNIX commands, such as ls -i /path/to/file. inodes and filesystem blocks are not the same, inodes are on a higher abstraction level.
Hot spare disk
A hard disk connected to a RAID system that is acting as a reserve disk. A hot spare disk does not take part in normal RAID activities, it waits on standby until one of the active disks fails, at which time the hot spare disk takes over for the failed disk. Takeover can happen automatically or on manual request, depending on the configuration of the RAID system.
LBA
Logical Block Addressing. A simple addressing scheme that linearly assigns integer indexes to blocks of data on a disk, with the first block being LBA 0, the second LBA 1, etc. This addressing scheme no longer requires any knowledge about the geometry of a disk. The Wikipedia article has information how to calculate the LBA address from a CHS address. There is no standard size for LBA blocks.
LUKS
Linux Unified Key Setup. A disk encryption specification. The spec describes the platform-independent on-disk format.
MBR
Master Boot Record. A special type of boot sector at the very beginning of a hard disk. Among other things, the MBR also contains the disk's partition table in a format that is different from the more modern GPT format. For disks with 512-byte sectors, an MBR partition table is limited to a size of 2 TB because it uses 32 bits for storing addresses and size values.
Reallocation
Disk drives have a relatively small area of spare room that is not accessible under normal circumstances. If a bad sector is encountered, the disk drive will reallocate the bad sector to a good sector from the spare area - from now on whenever someone tries to read the bad sector, the drive transparently re-routes the request to the good sector from the spare area. Although reallocation keeps the drive in a usable state, it also has a serious drawback: Reading from / writing to reallocated sectors will slow down the drive's read/write performance. How does reallocation occur? When a disk drive encounters a read error while reading a sector, that sector is added to a list of sectors that are pending reallocation. The number of sectors pending reallocation is reported in SMART attribute 197. Additional read errors for the same sector won't change anything, but if another read of the same sector succeeds, the reallocation occurs, with the successfully read data being stored in the good sector from the spare area. If a write error occurs for a sector, that sector is immediately reallocated, regardless of whether it is currently pending reallocation or not.


Disk partitioning

TODO: Write about partition table formats (specifically GPT and MBR) and the two utilities fdisk and parted. Also investigate LVM.


File system

TODO: Write about ext2, ext3, ext4. Provide details about file system checks and how to reboot the system into a state where these can be performed.

This command performs a non-destructive bad blocks file system check. If -c is specified only once the check is performed read-only. The option -k specifies that the existing list of bad blocks on the file system should be preserved and any newly found bad blocks should be added.

e2fsck -c -c -k -v /dev/sda8

This command lists the bad blocks of a file system:

dumpe2fs -b /dev/sda8

This command lists detailed information about the file system, such as its UUID, inode count/size and block count/size:

tune2fs -l /dev/sda8


Utilities

hdparm

hdparm is a utility that bypasses the filesystem and directly interacts with the disk via kernel interfaces. It's normal usage is to get/set SATA/IDE device parameters.

Display a large number of information about the disk:

# hdparm -I /dev/sda

/dev/sda:

ATA device, with non-removable media
	Model Number:       Hitachi HTS545050B9SA02                 
	Serial Number:      091213PBL400Q7JVN3ZV
	Firmware Revision:  PB4AC60Q
	Transport:          Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6; Revision: ATA8-AST T13 Project D1697 Revision 0b
Standards:
	Used: unknown (minor revision code 0x0028) 
	Supported: 8 7 6 5 
	Likely used: 8
Configuration:
	Logical		max	current
	cylinders	16383	16383
	heads		16	16
	sectors/track	63	63
	--
	CHS current addressable sectors:   16514064
	LBA    user addressable sectors:  268435455
	LBA48  user addressable sectors:  976773168
	Logical/Physical Sector size:           512 bytes
	device size with M = 1024*1024:      476940 MBytes
	device size with M = 1000*1000:      500107 MBytes (500 GB)
	cache/buffer size  = 7208 KBytes (type=DualPortCache)
	Form Factor: 2.5 inch
	Nominal Media Rotation Rate: 5400
[...]

Read the contents of a sector by issuing a low-level read. The sector number is specified as a base10 value. The data is printed in hex format.

# hdparm –-read-sector 170418806 /dev/sda

/dev/sda:
reading sector 170418806: succeeded
0c36 3139 312e 3132 312e 2e33 3639 ca08
[...]

Overwrite the contents of a sector by issuing a low-level write. The sector number is specified as a base10 value. The data written consists of zero-value bytes. WARNING: EXTREMELY DANGEROUS !!!

# hdparm –-write-sector 170418806 /dev/sda


dd

TODO: Write about the low-level utility dd.


blkid

The utility blkid is useful to list block devices and their UUIDs. A device's UUID can then be used to add a stable entry to /etc/fstab, i.e. an entry which does not need to be changed even if the block device is a removable disk of some sort whose device node might change each time it is plugged in.

Note: blkid does not list dm-crypt devices, the UUIDs for these devices must be found by a filesystem-specific utility which examines the device node under /dev/mapper. Example: btrfs filesystem show /dev/mapper/encrypted-backup.


SMART

SMART (Self-Monitoring, Analysis and Reporting Technology) is a low-level monitoring technology built into most modern hard disk drives. Debian provides two packages with utilities that can be used to control the SMART features of a harddisk:

  • smartmontools
  • libatasmart-bin


I have only tried out smartmontools, so here's some information about the package. Installing the package provides

  • A continuously running daemon smartd
  • The command-line utility smartctl


Print SMART information

With the 500 GB Hitachi (IBM) drives in my MacMini I can output all sorts of interesting information with this command. Using -x also prints out non-SMART information about the drive (e.g. temperature statistics), if you only want SMART data then use -a instead of -x.

smartctl -x /dev/sda


Drive detection fails

For some drives smartctl fails to autodetect the device type. Example:

$ sudo smartctl -a  /dev/sdd
[...]

/dev/sdd: Unknown USB bridge [0x1058:0x2620 (0x1020)]
Please specify device type with the -d option.
[...]

In such a case you must manually specify the device type using the -d option. Read the man page for details on what values can be specified for the -d option.

If you don't know anything about the device type, but you see a filesystem-specific device file (e.g. /dev/sdd1 instead of just /dev/sdd), then you can try to specify this specific device file to smartctl to obtain some basic initial information about the drive:

$ sudo smartctl -a  /dev/sdd1
[...]

=== START OF INFORMATION SECTION ===
Vendor:               WD
Product:              Elements 2620
[...]
SMART support is:     Unavailable - device lacks SMART capability.
[...]

Equipped with this information you may be able to find out more about the drive via Internet research. A typical device type to try is sat (for SCSI to ATA Translation (SAT)). This can be specified as follows:

smartctl -a -d sat /dev/sdd

or even more specific by forcing a byte-length for the "ATA PASS THROUGH SCSI command":

smartctl -a -d sat,16 /dev/sdd


Testing a drive

SMART offers a number of different types of tests that you can perform:

  • Offline test. A relatively quick test (ca. 10 minutes on one of my Hitachi drives). I don't know exactly what checks this test performs.
  • Short self-test: Another relatively quick test (ca. 2 minutes on one of my Hitachi drives) that checks the electrical and mechanical performance as well as the read performance of the entire disk.
  • Long self-test: A long test (ca. 2.5 hours on one of my Hitachi drives) that performs more thorough checks than the short self-test.
  • Selective self-test: With this test you can specify up to five ranges of LBAs to check instead of the entire disk. The duration of this test obviously varies with the number of LBAs that need to be checked. I am not sure if the checks performed are the same as in the short or in the long self-test, but I assume it's the latter.
  • For more tests see "man smartctl"


All tests are initiated with the -t option. Use -c to check whether a test is still running. Once complete the results of a test can be viewed with the -l option, followed by the appropriate log type. Only one test can be running at the same time; a running test can be aborted with -X. Here are the commands:

# Run offline test
smartctl -t offline /dev/sda
# Check whether the test is still running
smartctl -c /dev/sda
# View test results once the test is complete
smartctl -l xerror /dev/sda

# Run short self-test, check whether it's still running, then view final results
smartctl -t short /dev/sda
smartctl -c /dev/sda
smartctl -l xselftest /dev/sda

# Run long self-test, check whether it's still running, then view final results
smartctl -t long /dev/sda
smartctl -c /dev/sda
smartctl -l xselftest /dev/sda

# Run selective self-test, check whether it's still running, then view final results
smartctl -t select,154135527-154135527 /dev/sda
smartctl -c /dev/sda
smartctl -l xselftest /dev/sda
# View details of the five LBA spans
smartctl -l selective /dev/sda

Here's the example output after I ran a couple of self-tests:

$ smartctl -l xselftest /dev/sda

=== START OF READ SMART DATA SECTION ===
SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       50%     43263         153908424
# 2  Selective offline   Completed without error       00%     43262         -
# 3  Short offline       Completed without error       00%     43262         -


md

References


Summary

md stands for "multiple device" and is usually used to run a software-level RAID system. mdadm is the command line utility used to configure and maintain the RAID system.

md assembles same-size partitions on multiple physical devices into a single logical md device. The conventional physical device is a SATA hard disk, but the md system is flexible enough to allow any block device to be used to form a md device - even USB disks or network devices. However, if a device is prone to become unresponsive even for short times (e.g. a USB hard disk that is spinning down) then it is probably unsuitable for md because the md system might decide that the device has failed.


md devices

md devices are exposed by the Linux kernel as /dev/mdX block devices. The devices are also available as /dev/md/X, which are symlinks pointing to the actual device nodes. Example:

root@pelargir:~# ls -l /dev/md
total 0
lrwxrwxrwx 1 root root 6 Jun  2 03:28 0 -> ../md0
lrwxrwxrwx 1 root root 6 Jun  2 03:28 1 -> ../md1
lrwxrwxrwx 1 root root 6 Jun  2 03:28 2 -> ../md2
root@pelargir:~# ls -l /dev/md?
brw-rw---- 1 root disk 9, 0 Jun  2 03:28 /dev/md0
brw-rw---- 1 root disk 9, 1 Jun  2 03:28 /dev/md1
brw-rw---- 1 root disk 9, 2 Jun  2 03:28 /dev/md2


/proc/mdstat

The Linux kernel exposes the status of all md devices in the file /proc/mdstat. However, if an md device is stopped it is not listed there. Example:

root@pelargir:~# cat /proc/mdstat
Personalities : [raid1] 
md2 : active raid1 sda6[0](F) sdb6[1]
      482394112 blocks super 1.2 [2/1] [_U]
      bitmap: 3/4 pages [12KB], 65536KB chunk

md1 : active raid1 sda5[0] sdb5[1]
      3903488 blocks super 1.2 [2/2] [UU]
      
md0 : active raid1 sda1[0] sdb1[1]
      1950720 blocks super 1.2 [2/2] [UU]
      
unused devices: <none>

Discussion (taken from the Linux RAID wiki):

  • The "Personalities" line states which RAID levels the Linux kernel currently supports. Kernel modules add support for more RAID levels.
  • The details for each md device are given on multiple lines. The first line is prefixed with the name of the md device (e.g. "md2:").
  • Line 1
    • The first line states whether the device is "active" or "inactive". An inactive device is usually faulty.
    • The first line also states the device's RAID level
    • Finally, the first line lists all physical devices that are part of the md device. Each physical device has a suffix "[n]" which indicates the physical device's role in the RAID system. For instance, in a RAID system that consists of 2 devices:
      • The roles [0] and [1] denote the first and the second disk of the RAID system
      • Any roles with higher numbers (e.g. [2]) denote spare disks
    • If a physical device has failed, the indicator "(F)" appears after the physical device's RAID role suffix. This can be observed in the example above for the md2 device.
  • Line 2
    • The second line provides information about the md device's size (e.g. "482394112 blocks") and the superblock format that is in use (e.g. "super 1.2")
    • The "[n/m]" indicator on the second line states that the RAID system consists of "n" devices, of which "m" devices are currently active. If "m" is lower than "n" the RAID system is in trouble!
    • The "[UU]" indicator lists the status of each physical device. "U" = the device is up, and "_" = the device is down.
  • Line 3
  • This line appears only if the md device has a so-called "write-intent bitmap". I won't pretend to understand what this is, so read it up yourself.


Replacing a disk

If the md system detects a problem with a physical device, it marks that device as "faulty" and stops using the device. The device is said to have failed.

Depending on the RAID level and the current situation, the md device as a whole may still be able to function. If not, then data recovery of a whole different level is in order. This is not described here.


Now that we have established that the md device still works, the following situations can be distinguished:

  • If the md device has spare disks, the md system automatically starts to use the first available spare disk and begins reconstruction of the data that was on the failed disk.
  • If the md device has no spare disks, the md device is now running in "degraded" mode. This is a dangerous situation because there is now no redundancy left in the RAID system. If another disk fails, the entire md device will fail!


Regardless of whether there are spare disks, the faulty disk must now be replaced. The procedure works like this:

  • The faulty disk must first be removed from the md device. Example:
mdadm /dev/md2 --remove /dev/sda6
  • Now you can phyiscally remove the faulty disk from the system and replace with a good one
    • TODO: Do I need to power down the system to do this? Apparently the SATA interface was designed with hotplug support, but not all chipsets may support it.
  • TODO: What do we need to do with the new disk?
    • Boot sector?
    • Partitioning?
    • UUIDs in /etc/mdadm/mdadm.conf?
  • Add the new device. This triggers a rebuild of the data onto the new device:
mdadm /dev/md2 --add /dev/sda6


One might decide that a physical device needs to be proactively replaced (e.g. because SMART reports indicate that the device has a problem or nears end-of-life). In such a case the device needs to be manually marked as faulty. After that, the same procedure as described above applies. The following example command marks the physical device sdc2 as faulty on the md1 logical device:

mdadm --manage --set-faulty /dev/md1 /dev/sdc2


Monitoring

mdadm can be run in automated monitoring mode. On a Debian system this is enabled in /etc/default/mdadm by setting

START_DAEMON=true

On Debian you also have the option to periodically run redundancy checks over md devices using the utility checkarray. This is enabled in /etc/default/mdadm by setting

AUTOCHECK=true

(the actual check is run by default on the first Sunday each month, see /etc/cron.d/mdadm)


What happens if I forget to remove a disk from the RAID array before it is replaced?

The situation

In the section "Replacing a disk" I wrote that you need to remove a disk from the md device before you can replace it. But what happens if you "forget" to perform this removal?


This is the situation:

  • md has marked a disk faulty and the RAID array is in a degraded state
  • You don't remove the disk from the RAID array, instead you simply shut down the system
  • You replace the faulty disk with a new, good one and reboot the system


The result is that the system reboots fine, but neither the old, faulty disk nor the new, good disk are listed in /proc/mdstat:

root@pelargir:~# cat /proc/mdstat
Personalities : [raid1] 
md2 : active raid1 sda6[1]
      482394112 blocks super 1.2 [2/1] [_U]
      bitmap: 4/4 pages [16KB], 65536KB chunk

md1 : active (auto-read-only) raid1 sda5[1]
      3903488 blocks super 1.2 [2/1] [_U]
      
md0 : active raid1 sda1[1]
      1950720 blocks super 1.2 [2/1] [_U]
      
unused devices: <none>


mdadm --detail reports the array as degraded and lists the missing device with state "removed":

root@pelargir:~# mdadm --detail /dev/md2
/dev/md2:
        Version : 1.2
  Creation Time : Wed Jun  1 11:44:45 2016
     Raid Level : raid1
     Array Size : 482394112 (460.05 GiB 493.97 GB)
  Used Dev Size : 482394112 (460.05 GiB 493.97 GB)
   Raid Devices : 2
  Total Devices : 1
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Thu Nov 10 20:44:50 2016
          State : clean, degraded 
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           Name : GDP-LIN-228:2
           UUID : ab9e3d36:c8fd57b7:99789424:8e585f00
         Events : 577151

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8        6        1      active sync   /dev/sda6


The new, good disk is not formatted, nor even partitioned correctly:

root@pelargir:~# fdisk -l /dev/sdb

Disk /dev/sdb: 465.8 GiB, 500107862016 bytes, 976773168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00071581

Device     Boot Start       End   Sectors   Size Id Type
/dev/sdb1        2048 976769023 976766976 465.8G  7 HPFS/NTFS/exFAT


The solution, part 1

First, remove the missing device from the RAID array. The problem here is that the block device is no longer present, so you can't use the normal command because that would references a non-existing device:

mdadm /dev/md2 --remove /dev/sdb6


Instead you can issue the following command, which removes all devices that are no longer connected to the system:

mdadm /dev/md0 --remove detached

Kudos: Found this here: http://www.linuxquestions.org/questions/linux-newbie-8/mdadm-cannot-remove-failed-drive-drive-name-changed-782827/

Note that the keyword "detached" can also be used with "mdadm --fail", to mark disconnected devices as failed so that they can then be removed.


The solution, part 2

Now you can partition the disk using fdisk to match the partitions of the other disks. The following command lists the partitioning of the disk that is still good:

root@pelargir:~# fdisk -l /dev/sda

Disk /dev/sda: 465.8 GiB, 500107862016 bytes, 976773168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x3ba6d316

Device     Boot    Start       End   Sectors   Size Id Type
/dev/sda1  *        2048   3905535   3903488   1.9G fd Linux raid autodetect
/dev/sda2        3907582 976771071 972863490 463.9G  5 Extended
/dev/sda5        3907584  11718655   7811072   3.7G fd Linux raid autodetect
/dev/sda6       11720704 976771071 965050368 460.2G fd Linux raid autodetect

All you have to do now is to manually re-create this partitioning in fdisk's interactive mode, and then to format the partitions with a filesystem. This is left as an exercise to the reader.


Note: I don't know if there is some automated way how to clone the partition table from one disk to another. I read something about a utility called sfdisk, but when I read that even the partition UUIDs are cloned I became hesitant and in the end decided to perform the partitioning manually (including the partition type "fd" = Linux raid autodetect, and marking the first partition as bootable).


The solution, part 3

Finally, add the newly partitioned and formatted block devices to the RAID array

mdadm /dev/md2 --add /dev/sdb6


Note that when you do this, the RAID array immediately starts to rebuild. You can watch the progress with this command:

root@pelargir:~# mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Wed Jun  1 11:43:31 2016
     Raid Level : raid1
     Array Size : 482394112 (460.05 GiB 493.97 GB)
  Used Dev Size : 482394112 (460.05 GiB 493.97 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Thu Nov 10 21:31:30 2016
          State : clean, degraded, recovering 
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1

 Rebuild Status : 30% complete

           Name : GDP-LIN-228:2
           UUID : ab9e3d36:c8fd57b7:99789424:8e585f00
         Events : 577151

    Number   Major   Minor   RaidDevice State
       2       8       22        0      spare rebuilding   /dev/sdb6
       1       8        6        1      active sync   /dev/sda6


A final note: The configuration file /etc/mdadm/mdadm.conf lists md devices with a UUID. When you run blkid, this utility also prints out UUIDs for the md devices. I noticed that the two sets of UUIDs do not match, but apparently this is OK - at least I had no problem running such a setup for over a year.


Hard disk encryption with LUKS

References

  • HOWTO article that shows step-by-step how to configure a LUKS/dm-crypt partition using cryptsetup. Almost all of the information in the following sections comes from this article.
  • Block-layer Encryption (archive.org link). An in-depth article about LUKS/dm-crypt, starting with theory and ending with many practical examples about its usage.
  • man cryptsetup


Glossary

LUKS
Linux Unified Key Setup. A specification that describes a platform-independent on-disk format. The format consists of three parts:
  1. A standardized header at the start of the device
  2. A key-slot area directly behind the header
  3. And the bulk data area behind that.
LUKS container
The set consisting of three parts (header, key-slot area, data area) described by the LUKS specification.
LUKS device
The physical device that a LUKS container resides on.
dm-crypt
A disk encryption backend in the Linux kernel. dm-crypt can be used to create a partition that conforms to the LUKS specification, or it can be used raw/standalone in which case the dm-crypt partition does not use the LUKS format.
cryptsetup
Is the user front-end tool to configure a LUKS/dm-crypt partition.


Packages

The Debian package to install is

cryptsetup

An optional package to monitor dd is

pv


Initialize a LUKS partition

The following command initializes an existing partition to use the LUKS format. Warning: This will destroy all data on the partition!!!

cryptsetup --verify-passphrase --verbose luksFormat /dev/sda1

Discussion:

  • You must enter the passphrase interactively. Because of --verify-passphrase, the command asks for the passphrase twice. The passphrase you enter here cannot be recovered later, so make sure you write it down somewhere secure.
  • Alternatively you can use the --key-file option to specify a file that contains the passphrase. The file name "-" causes the command to read the passphrase from stdin.
  • The encryption key that is protected by the passphrase is assigned to key-slot 0. You can use the --key-slot option to override this. The LUKS partition has a total of 8 key-slots.
  • The cipher used for encryption is the LUKS default cipher (aes-xts-plain64 at the time of writing). Use the --cipher option to specify a different cipher.


Open / close a LUKS partition

The next step is to "open" the LUKS partition, making it accessible as a normal block device under a user-friendly name ("encrypted-backup" in this example). Note: You have to enter the passphrase interactively in order to open the LUKS partition. Use the --key-file option to specify a file that contains the passphrase.

cryptsetup luksOpen /dev/sda1 encrypted-backup

The LUKS partition is now accessible to any filesystem-related commands (typically mount) as

/dev/mapper/encrypted-backup

When you no longer need the LUKS partition you can close it:

cryptsetup luksClose encrypted-backup


Behind the scenes, opening a LUKS partition generates a dm-crypt device and adds a mapping from that device to the requested user-friendly name. If you want to know the dm-crypt device name you can have a look at the mapping simply like this:

pi@raspberrypi1:~$ ls -l /dev/mapper/encrypted-backup 
lrwxrwxrwx 1 root root 7 Dec 29 18:01 /dev/mapper/encrypted-backup -> ../dm-0


Inspect a LUKS partition

To see the status of a LUKS partition, including the cipher used for encryption, you use this command:

cryptsetup status encrypted-backup

The next command dumps the LUKS header information. Warning: From my limited understanding of the cryptographic background, this information may contain sensitive material!

cryptsetup luksDump /dev/sda1


Managing passphrases

A LUKS container can have up to 8 passphrases. Each passphrase is assigned to a key-slot numbered in the range from 0 to 9.


The following command changes the passphrase for key-slot 0. Use the --key-slot option to specify a different key-slot. Obviously, the old passphrase must be entered to authorize the change.

cryptsetup luksChangeKey /dev/sda1

Warning: A media failure during this operation can cause the overwrite to fail after the old passphrase has been wiped and make the LUKS container permanently inaccessible! For this reason it is better/safer to first add a new passphrase, and then remove the old passphrase after the new one was added successfully. See the next two commands.


The following command adds a new passphrase to the first empty key-slot. Use the --key-slot option to specify a different key-slot. One of the passphrases already present on the LUKS device must be entered to authorize the change.

cryptsetup luksAddKey /dev/sda1

The following command removes a passphrase from the LUKS device. Whichever passphrase is entered is removed.

cryptsetup luksRemoveKey /dev/sda1

Warning: Removing the last passphrase makes a LUKS container permanently inaccessible! If you enter the passphrase interactively or supply it via the --key-file option, the command will warn you about the impending disaster (but you can still go ahead and ignore the warning). If you supply the passphrase via stdin, the command will give no warning because it works in batch mode.


Create a filesystem on a LUKS partition

In the previous sections we have seen how to

  1. Initialize a LUKS device and create a LUKS container on the device (luksFormat)
  2. Open the LUKS container (luksOpen)


We are now ready to create a filesystem in the data area on the LUKS container. Example for btrfs:

mkfs.btrfs -f /dev/mapper/encrypted-backup


After that we can mount the filesystem as usual:

mount /dev/mapper/encrypted-backup /mnt/encrypted-backup


Randomize data on LUKS container

Before we create a filesystem on a LUKS container, we should make sure that anyone looking at the raw device will only see random data.

The simplest way to achieve this is to write zeroes into the data area using the dd tool:

dd if=/dev/zero of=/dev/mapper/encrypted-backup

Note that we are writing to the dm-crypt device, not the raw device! This ensures that the stream of zeroes is first encrypted before it is written to the physical disk. Also, the LUKS header and key-slot areas remain untouched.


The process of randomizing the LUKS container can take a very long time. If you want to have a progress report you can send the SIGUSR1 signal to the dd process, or you can use this alternative command line:

pv -tpreb /dev/zero | dd of=/dev/mapper/encrypted-backup bs=128M


Recipes

Map a physical sector to a filesystem block

If the disk drive encounters a read/write error, it will report the number of the bad sector to the kernel and thus to the syslog. If the filesystem does not also report the filesystem block, it may be necessary to find out which filesystem block contains the physical sector.

TODO: Write how to do this.


Force reallocation of a bad sector

The glossary explains that reallocation only happens if a write error occurs, so in order to force reallocation we must write data to the bad sector. Let's look at a syslog fragment:

May 15 06:28:17 pelargir kernel: [6849466.884776] ata1.00: exception Emask 0x0 SAct 0x7000003f SErr 0x0 action 0x0
May 15 06:28:17 pelargir kernel: [6849466.885148] ata1.00: irq_stat 0x40000008
May 15 06:28:17 pelargir kernel: [6849466.885738] ata1.00: failed command: READ FPDMA QUEUED
May 15 06:28:17 pelargir kernel: [6849466.886572] ata1.00: cmd 60/08:e0:6f:62:28/00:00:0a:00:00/40 tag 28 ncq 4096 in
May 15 06:28:17 pelargir kernel: [6849466.886572]          res 51/40:08:76:62:28/00:00:0a:00:00/40 Emask 0x409 (media error) <F>
May 15 06:28:17 pelargir kernel: [6849466.888280] ata1.00: status: { DRDY ERR }
May 15 06:28:17 pelargir kernel: [6849466.889113] ata1.00: error: { UNC }
May 15 06:28:17 pelargir kernel: [6849466.892172] ata1.00: configured for UDMA/133
May 15 06:28:17 pelargir kernel: [6849466.892197] sd 0:0:0:0: [sda] tag#28 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
May 15 06:28:17 pelargir kernel: [6849466.892203] sd 0:0:0:0: [sda] tag#28 Sense Key : Medium Error [current] [descriptor]
May 15 06:28:17 pelargir kernel: [6849466.892207] sd 0:0:0:0: [sda] tag#28 Add. Sense: Unrecovered read error - auto reallocate failed
May 15 06:28:17 pelargir kernel: [6849466.892212] sd 0:0:0:0: [sda] tag#28 CDB: Read(10) 28 00 0a 28 62 6f 00 00 08 00
May 15 06:28:17 pelargir kernel: [6849466.892216] blk_update_request: I/O error, dev sda, sector 170418806

First we see that "auto reallocate failed". We also see that the bad sector is 170418806. To force the reallocate we must write to the sector - this will fail and thus trigger the drive's reallocation process.

Before we begin, let's make sure that the sector is actually bad by reading its content.

# hdparm --read-sector 170418806 /dev/sda

/dev/sda:
reading sector 170418806: FAILED: Input/output error

Only if the read failed do we continue with the write. IMPORTANT: Writing to the sector causes data loss !!!

hdparm –-write-sector 170418806 /dev/sda


Mark a filesystem block as bad

Marking a filesystem block as bad will prevent the filesystem from accessing the physical sectors in that block. This has the following effects:

  • Sectors that are pending reallocation will remain in that state because neither reads nor writes will occur anymore for these sectors. This is good because it does not waste the disk's spare area for unnecessary reallocations. This can also be annoying because SMART monitoring tools may unnecessarily warn about the disk's perpetual state of pending reallocations (SMART attribute 197).
  • Sectors that are already reallocated will no longer be accessed, thus the drive's performance will not degrade.

TODO: Write how to do this.