Filesystem recovery examples with ltfsck

In addition to the filesystem implementation, the Linear Tape File System (LTFS) software ships with two core utilities, “mkltfs” and “ltfsck”. mkltfs (pronounced “make LTFS”) is used to format LTO cartridges with the LTFS Format. ltfsck is used to check and if necessary recover a partially corrupted LTFS Volume back to a consistent and usable state.

In this post I describe some inconsistent states a volume may wind up in, the scenarios that may lead to these states (often power-loss), and how ltfsck recovers to a consistent state.

I’ve described the layout of a consistent LTFS Volume here and here. A consistent LTFS Volume is illustrated below. An LTFS Volume must be consistent at mount time otherwise the LTFS software will reject the volume and instruct the user to run the ltfsck utility to recover the volume to a consistent state.

Logical layout of an LTFS volume showing historical Indexes with back-pointers and multiple files written and edited in-place.

Notice that this consistent LTFS Volume matches the definition of a consistent state. Specifically, that the current Index is written at the end of both partitions on the media.

Power-loss after writing DP Index

In the event of a power loss while unmounting the LTFS Volume the volume may end up in the state illustrated below.

Logical layout of an LTFS volume showing historical Indexes with back-pointers and multiple files written and edited in-place after power-loss during IP Index write.

In the example above, the LTFS cartridge was unmounted after writing File A and B to the volume. This unmount resulted in Index2. The cartridge was then mounted again and File A was opened for writing and had some new data written. This new data is stored in the purple extent labeled “File A3”. The cartridge was then unmounted. This unmount processing wrote the current Index as Index3 and switched to the Index Partition and started to write the current Index. This unmount processing follows the order listed here. Before the Index write completed the system crashed or lost power. This loss of power prevented completion of the Index write to the IP.

If this cartridge is mounted again the LTFS software will identify that the volume is inconsistent and recommend that ltfsck be used with the volume. ltfsck will identify the partial write of the Index to the IP and attempt to read the Index3 stored on the DP. If the Index at the end of the DP can be read successfully then ltfsck will over-write the partial Index on the IP with a duplicate of Index3 from the DP. After this write the cartridge will be consistent and in the state shown in the first diagram on this page.

A similar situation occurs if the power loss occurs after the current Index is written to the DP but before the current Index is written to the IP. The resultant volume structure is shown below.

Logical layout of an LTFS volume showing historical Indexes with back-pointers and multiple files written and edited in-place after power-loss during partition change.

In the example above, encountering a power-loss while the tape drive is changing partitions has resulted in a correct DP with the current Index as Index3, but the IP still has the old Index2. In this scenario, the LTFS software will identify that the Index2 on the IP is out of date based on the values stored in the cartridge MAM parameters and reject the LTFS Volume. ltfsck will identify based on the MAM parameters that Index3 at the end of the DP is the most current Index and will over-write Index1 on the DP with a copy of Index3.

This recovery is equivalent to the previous recovery of the partially written Index. In both cases, there is no loss of user data.

Power-loss before writing DP Index

In the event of a power loss while writing file data to the LTFS Volume the volume may end up in the state illustrated below.

Logical layout of an LTFS volume showing historical Indexes with back-pointers and multiple files written and edited in-place after power-loss during write of new data file.

In the example above, the file data write for “File ?” was in progress when the system crashed or lost power. Immediately before the loss of power the filename for this file existed in memory in the Index but the LTFS software was waiting for the write operation to complete before laying down the current Index as “Index4”. Due to the power loss, the updated Index was written to the media and there is no way of knowing whether the data stored in the black extent labeled “File ?” is complete or just a partial write.

In this example, the LTFS software will refuse to mount the volume and suggest that ltfsck be run against the volume. When ltfsck is run the user has a few different courses of action. By default the ltfsck utility will identify the “unexpected” data at the end of the DP and perform recovery actions to:

  1. read Index3 from the IP,
  2. create a directory at the root of the LTFS filesystem named “_ltfs_lostandfound” if the directory doesn’t already exist,
  3. update Index3 to include the “File ?” data as the contents of one or more files with generated filenames. Each separate file contains the data written to a single block on the media. The files are stored in “_ltfs_lostandfound”,
  4. write the updated Index out as Index4 to the DP, and
  5. perform normal unmount processing to write Index4 to the IP.

The layout of the LTFS Volume after these recovery steps have been performed by ltfsck is illustrated below.

Logical layout of an LTFS volume showing historical Indexes with back-pointers and multiple files written and edited in-place after power-loss during write of new data file, followed by full data recovery by ltfsck.

In the illustration above, the LTFS Volume is consistent but the data blocks shown in black have generated filenames rather than the user-specified filename. Additionally, these recovered data blocks are most likely to contain only partially written data. Recovering the blocks to the volume as recovered files provides the ability for the user to copy these blocks off the volume and re-construct the original data if no other copy exists. In most scenarios the user will still have a copy of the original data elsewhere because the file write to LTFS had not completed before the power loss.

After the user has finished working with the recovered files on the LTFS Volume there is probably no need to leave the blocks on the LTFS Volume. Rather than deleting the recovered files the user can use ltfsck to rollback the LTFS Volume to the Index3 snapshot. Using ltfsck to roll the volume back, rather than deleting the recovered files, means that the space occupied by the partial write (shown in black) will be reclaimed and the LTFS Volume will be returned the the state shown in the first illustration at the top of this page.

If the user has encountered a power loss during file write as described above and the user is not interested in recovering the partially written data, ltfsck provides a command-line option to automate the recovery of the LTFS Volume. With this automated recovery, rather than generating Index4 (described above), ltfsck simply erases the “File ?” data thereby returning the volume to the consistent state shown in the first illustration at the top of this page.

This entry was posted in data safety, data storage, LTFS. Bookmark the permalink.

2 Responses to Filesystem recovery examples with ltfsck

  1. Youhena says:

    hi Michael, thanks your nice article
    if want write a program that write and read files to LTFS format on Tape
    (in .Net )
    Do you know of an example for this?   Or know how to implement it?
    thanks

    • Avatar photo Michael says:

      Since LTFS is a POSIX filesystem you would use the same code as used to write a file that is stored on a hard drive.

      For example, the following .Net example code will write text to a file using C#.

      If you change the file path to a file on the LTFS volume the code will create a file on the LTFS volume.

Leave a Reply

Your email address will not be published. Required fields are marked *