Recovering the NTFS MFT from MFTMirr

2020-03-09

I was given a hard drive with a Windows partition that wouldn't boot and asked if I could extract the files. Although the drive was failing and some crucial sectors were unreadable, there was enough redundancy to recover the filesystem metadata. Here's how I went about doing that.

Trying the obvious

The disk layout is simple:

# lsblk -o NAME,FSTYPE,LABEL,SIZE,TYPE /dev/sdb
NAME   FSTYPE LABEL         SIZE TYPE
sdb                       931.5G disk
├─sdb1 ntfs   SYSTEM        100M part
├─sdb2                    913.7G part
└─sdb3 ntfs   HP_RECOVERY  17.7G part

There are two partitions that are recognized correctly, but which we don't care about, and there's one partition (sdb2) that we want to bring back to life.

If we try to mount it, we see

1 2	# mount /dev/sdb2 /mnt mount: /mnt: wrong fs type, bad option, bad superblock on /dev/sdb2, missing codepage or helper program, or other error.

In the dmesg output, we find

sd 6:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 6:0:0:0: [sdb] tag#0 Sense Key : Medium Error [current]
sd 6:0:0:0: [sdb] tag#0 Add. Sense: Unrecovered read error
sd 6:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 00 63 28 00 00 00 08 00
print_req_error: critical medium error, dev sdb, sector 6498304
sd 6:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 6:0:0:0: [sdb] tag#0 Sense Key : Medium Error [current]
sd 6:0:0:0: [sdb] tag#0 Add. Sense: Unrecovered read error
sd 6:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 00 63 28 00 00 00 08 00
print_req_error: critical medium error, dev sdb, sector 6498304
Buffer I/O error on dev sdb2, logical block 786432, async page read

Clearly, there's at least one unreadable sector that prevents the filesystem from mounting.

Because we expect this to be an NTFS partition, we can skip the filesystem check and try mounting it directly with NTFS-3G. This gets us a new error message, but no success:

# mount -t ntfs /dev/sdb2 /mnt
Error reading $MFT: Input/output error
Failed to load $MFT: Input/output error
Failed to mount '/dev/sdb2': Input/output error
NTFS is either inconsistent, or there is a hardware fault, or it's a
SoftRAID/FakeRAID hardware. [...]

Oh, it's definitely a hardware fault. We need to get as much data as possible from the drive before the failure gets worse.

Making a disk image

I won't go into how to use ddrescue to make a disk image, since there are plenty of tutorials already, but I do want to point out that the manual is rather good and it's worth giving it a skim while you're waiting for the disk to be imaged. I did have some luck with the following options:

--idirect: After several initial runs without it, I eventually turned this on to get raw access to the disk.
--try-again, --retrim, --reverse: These are useful to get ddrescue to try a different scan order, which can make some sectors read successfully.
--input-position, --size: These can be used to narrow down on a small region; for example if you want to focus on something important, like filesystem metadata. Conveniently, this will not destroy the information already obtained from outside of this region, so it's very easy to jump around or go back to trying the entire drive.

In hindsight, it would probably have been slightly better to only image the partition I needed, but I chose to do the entire disk. I decided to stop when it wasn't making any progress on a handful of sectors that just refused to be read. The process resulted in disk.img (the disk image itself) and disk.log (the ddrescue mapfile).

Poking around

With as many sectors rescued from the disk as possible, we can now put it aside and work only with the image. We'll make a read-only loop device backed by this image:

1 2	# losetup -Pr --show -f disk.img /dev/loop0

Making it read-only ensures that no matter what we do (like muck about with the filesystem metadata), we won't alter the precious disk image, which we might not be able to obtain again. This looks familiar:

# lsblk -o NAME,FSTYPE,LABEL,SIZE,TYPE /dev/loop0
NAME      FSTYPE LABEL         SIZE TYPE
loop0                        931.5G loop
├─loop0p1 ntfs   SYSTEM        100M part
├─loop0p2                    913.7G part
└─loop0p3 ntfs   HP_RECOVERY  17.7G part

If we try to mount it now, we get a slightly different error:

# mount -t ntfs /dev/loop0p2 /mnt
ntfs_mst_post_read_fixup_warn: magic: 0x00000000  size: 1024   usa_ofs: 0  usa_count: 0: Invalid argument
Record 0 has no FILE magic (0x0)
Failed to load $MFT: Input/output error
Failed to mount '/dev/loop0p2': Input/output error
NTFS is either inconsistent, or there is a hardware fault, or it's a
SoftRAID/FakeRAID hardware. [...]

Rather than getting a hardware error, the NTFS driver now reads garbage data (likely uninitialized zeros in the image file) for the Master File Table (MFT). NTFS-3G comes with an ntfsfix utility, so we do a dry run:

# ntfsfix -n /dev/loop0p2
Mounting volume... ntfs_mst_post_read_fixup_warn: magic: 0x00000000  size: 1024   usa_ofs: 0  usa_count: 0: Invalid argument
Record 0 has no FILE magic (0x0)
Failed to load $MFT: Input/output error
FAILED
Attempting to correct errors... ntfs_mst_post_read_fixup_warn: magic: 0x00000000  size: 1024   usa_ofs: 0  usa_count: 0: Invalid argument
Record 0 has no FILE magic (0x0)
Failed to load $MFT: Input/output error
FAILED
Failed to startup volume: Input/output error
Checking for self-located MFT segment... ntfs_mst_post_read_fixup_warn: magic: 0x00000000  size: 1024   usa_ofs: 0  usa_count: 0: Invalid argument
OK
Unrecoverable error
Volume is corrupt. You should run chkdsk.
No change made

Not looking good. I don't have access to chkdsk, so that's not an option.

The ddrutility suite includes the ddru_ntfsfindbad utility, which is used to identify the files that are affected by the unrecovered sectors in the ddrescue mapfile. We would normally run it as

1	# ddru_ntfsfindbad -V /dev/loop0p2 disk.log

but this is far too slow. Instead, we'll trust that ddru_ntfsfindbad won't alter the image file, and we'll pass it in directly. To do this, we need to find the second partition inside the image:

# fdisk -lu /dev/loop0 | grep -e Units -e Start -e loop0p2
Units: sectors of 1 * 512 = 512 bytes
Device       Boot      Start        End    Sectors   Size Id Type
/dev/loop0p2          206848 1916358655 1916151808 913.7G  7 HPFS/NTFS/exFAT

It starts at an offset of 206848 sectors of 512 bytes each, so we run

# ddru_ntfsfindbad -V -i "$(dc -e '206848 512*p')" disk.img disk.log
ddru_ntfsfindbad 1.5 20150109
Reading the logfile into memory...
processed 3217 lines out of 3225 with 0 errors
Reading partition boot sector...
Reading mft inode...
There was an error in reading or processing the main mft record.
Attempting to read the mft mirror...
total mft fragments=3
total mft size=492306432 bytes
total inodes=480768
processing inode 480768 of 480768
MFT hard errors=8
processing error record 7 of 7
ddru_ntfsfindbad took 42.380874 seconds to complete

This produces a file called ntfsfindbad.log containing various items from the filesystem. It appears that the MFT is broken (which we already knew), but the MFT mirror is sufficient to peer inside the filesystem. It's a little frustrating that NTFS-3G doesn't seem to make use of this, but perhaps it's erring on the side of caution.

Locating the MFT and MFTMirr

Let's see if we can't find the MFT and MFTMirr ourselves. We are told that some useful information lies at the start of the partition. The data

# xxd -s 0x0b -l 2 -e -g 2 /dev/loop0p2 | cut -d' ' -f1-2
0000000b: 0200
# xxd -s 0x0d -l 1 /dev/loop0p2 | cut -d' ' -f1-2
0000000d: 08
# xxd -s 0x30 -l 8 -e -g 8 /dev/loop0p2 | cut -d' ' -f1-2
00000030: 00000000000c0000
# xxd -s 0x38 -l 8 -e -g 8 /dev/loop0p2 | cut -d' ' -f1-2
00000038: 0000000000000002

mean that

Field	Value
Bytes per sector	0x200 = 512
Sectors per cluster	0x8 = 8
MFT offset	0xc0000 clusters
MFTMirr offset	0x2 clusters

Thus, a single cluster is 4 KB (or 0x1000 bytes), which puts the MFT at 0xc0000000 bytes and MFTMirr at 0x2000 bytes.

If we take a peek at the start of the MFT, we do indeed just see zeros:

1 2	# xxd -s 0xc0000000 -l 0x10 /dev/loop0p2 c0000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................

That's not good! On the other hand, the MFTMirr appears to contain useful information:

# xxd -s 0x2000 -l 0x100 /dev/loop0p2
00002000: 4649 4c45 3000 0300 14ae 1c4c 8300 0000  FILE0......L....
00002010: 0100 0100 3800 0100 b001 0000 0004 0000  ....8...........
00002020: 0000 0000 0000 0000 0600 0000 0000 0000  ................
00002030: 6007 84a6 0000 0000 1000 0000 6000 0000  `...........`...
00002040: 0000 1800 0000 0000 4800 0000 1800 0000  ........H.......
00002050: f195 a6bc 651e cd01 f195 a6bc 651e cd01  ....e.......e...
00002060: f195 a6bc 651e cd01 f195 a6bc 651e cd01  ....e.......e...
00002070: 0600 0000 0000 0000 0000 0000 0000 0000  ................
00002080: 0000 0000 0001 0000 0000 0000 0000 0000  ................
00002090: 0000 0000 0000 0000 3000 0000 6800 0000  ........0...h...
000020a0: 0000 1800 0000 0300 4a00 0000 1800 0100  ........J.......
000020b0: 0500 0000 0000 0500 f195 a6bc 651e cd01  ............e...
000020c0: f195 a6bc 651e cd01 f195 a6bc 651e cd01  ....e.......e...
000020d0: f195 a6bc 651e cd01 0040 0000 0000 0000  ....e....@......
000020e0: 0040 0000 0000 0000 0600 0000 0000 0000  .@..............
000020f0: 0403 2400 4d00 4600 5400 0000 0000 0000  ..$.M.F.T.......

We appear to have found a file record, in which we can see two attributes:

$STANDARD_INFORMATION at 0x2038, and
$FILE_NAME at 0x2098.

The latter attribute contains the filename "$MFT", suggesting that we've found the right spot. As expected, this MFTMirr contains the first four MFT records:

# xxd -s 0x20f2 -l 0x8 /dev/loop0p2
000020f2: 2400 4d00 4600 5400                      $.M.F.T.
# xxd -s 0x24f2 -l 0x10 /dev/loop0p2
000024f2: 2400 4d00 4600 5400 4d00 6900 7200 7200  $.M.F.T.M.i.r.r.
# xxd -s 0x28f2 -l 0x10 /dev/loop0p2
000028f2: 2400 4c00 6f00 6700 4600 6900 6c00 6500  $.L.o.g.F.i.l.e.
# xxd -s 0x2cda -l 0xe /dev/loop0p2
00002cda: 2400 5600 6f00 6c00 7500 6d00 6500       $.V.o.l.u.m.e.

These four records are 0x400 bytes each, filling up the 0x1000-byte cluster from beginning to end. These are followed by an index record, so there's apparently nothing more to the MFTMirr:

1 2	# xxd -s 0x3000 -l 0x10 /dev/loop0p2 00003000: 494e 4458 2800 0900 8588 623e 0000 0000 INDX(.....b>....

Thus, we only have enough data to recover the first cluster of the MFT. Thankfully, after the first cluster, the MFT seems to be intact:

# xxd -s 0xc0000000 -l 0x1100 -a /dev/loop0p2
c0000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
*
c00007f0: 0000 0000 0000 0000 0000 0000 0000 6007  ..............`.
c0000800: 0000 0000 0000 0000 0000 0000 0000 0000  ................
*
c0001000: 4649 4c45 3000 0300 9aa1 59c7 0000 0000  FILE0.....Y.....
c0001010: 0400 0100 3800 0100 c001 0000 0004 0000  ....8...........
c0001020: 0000 0000 0000 0000 0500 0000 0400 0000  ................
c0001030: c1b3 0000 0000 0000 1000 0000 4800 0000  ............H...
c0001040: 0000 1800 0000 0000 3000 0000 1800 0000  ........0.......
c0001050: f195 a6bc 651e cd01 f195 a6bc 651e cd01  ....e.......e...
c0001060: f195 a6bc 651e cd01 f195 a6bc 651e cd01  ....e.......e...
c0001070: 0600 0000 0000 0000 0000 0000 0000 0000  ................
c0001080: 3000 0000 7000 0000 0000 1800 0000 0200  0...p...........
c0001090: 5200 0000 1800 0100 0500 0000 0000 0500  R...............
c00010a0: f195 a6bc 651e cd01 f195 a6bc 651e cd01  ....e.......e...
c00010b0: f195 a6bc 651e cd01 f195 a6bc 651e cd01  ....e.......e...
c00010c0: 0090 0000 0000 0000 a08c 0000 0000 0000  ................
c00010d0: 0600 0000 0000 0000 0803 2400 4100 7400  ..........$.A.t.
c00010e0: 7400 7200 4400 6500 6600 0000 0000 0000  t.r.D.e.f.......
c00010f0: 5000 0000 8000 0000 0000 1800 0000 0300  P...............

We have an "$AttrDef" file record, which is just what we hoped to see.

Overlaying a COW

Now that we've oriented ourselves and determined that we should have enough information to proceed, we can copy the MFTMirr into the first cluster of the MFT. Recalling that our loop device is read-only, we make a copy-on-write overlay using device-mapper. Someone has already made a useful script, so we'll make use of that.

First, we create the backing storage for the changes we'll be making:

1	# dd if=/dev/zero of=cowfile bs=1 count=0 seek=1073741824

This makes a sparse file which is 1 GB large, but only takes up as much space on disk as necessary to contain its data:

# hexdump cowfile
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
40000000
# du --block-size=1 --apparent-size cowfile
1073741824  cowfile
# du --block-size=1 cowfile
0   cowfile

We already have a loop device for the image file, and its size (in units of 512-byte sectors) is

1 2	# blockdev --getsz /dev/loop0p2 1916151808

Next, we need one for the cowfile:

1 2	# losetup --show -f cowfile /dev/loop1

It functions as if it really were 1 GB:

1
2
3

# lsblk -o NAME,SIZE,TYPE /dev/loop1
NAME  SIZE TYPE
loop1   1G loop

Finally, we use a black magic incantation to merge the two loop devices into an unholy union called loopcow:

1	echo 0 1916151808 snapshot /dev/loop0p2 /dev/loop1 p 8 \| dmsetup create loopcow

What could this possibly mean? The dmsetup manpage tells us that tables have the format

1	logical_start_sector num_sectors target_type target_args

so we'll be using the entirety of /dev/loop0p2 to make a snapshot, but it doesn't mention what the args are for a snapshot target. According to the snapshot page of the Linux kernel manual, device-mapper expects the following format for a snapshot table:

1	snapshot <origin> <COW device> <persistent?> <chunksize>

That is, we have asked for /dev/loop1 to act as a COW device on top of /dev/loop0p2, for the changes to be persistent, and that the granularity for changes should be 8 sectors (4 KB).

We have ended up with a rather peculiar structure:

# lsblk -o NAME,FSTYPE,LABEL,SIZE,TYPE /dev/loop{0,1} /dev/mapper/loopcow
NAME        FSTYPE LABEL         SIZE TYPE
loop0                          931.5G loop
├─loop0p1   ntfs   SYSTEM        100M part
├─loop0p2                      913.7G part
│ └─loopcow                    913.7G dm
└─loop0p3   ntfs   HP_RECOVERY  17.7G part
loop1                              1G loop
└─loopcow                      913.7G dm
loopcow                        913.7G dm

We've not yet written anything, but 2 chunks are already occupied:

# du --block-size=1 cowfile
8192    cowfile
# dmsetup status loopcow
0 1916151808 snapshot 16/2097152 16

The kernel documentation tells us that the last bits of the status string are

1	<sectors_allocated>/<total_sectors> <metadata_sectors>

which means that both chunks are being used for bookkeeping purposes.

We can write to this overlay without changing the underlying image:

# xxd -s 0xc0000000 -l 0x10 /dev/mapper/loopcow
c0000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
# echo 'c0000000: 1234 5678' | xxd -r - /dev/mapper/loopcow
# xxd -s 0xc0000000 -l 0x10 /dev/mapper/loopcow
c0000000: 1234 5678 0000 0000 0000 0000 0000 0000  .4Vx............
# xxd -s 0xc0000000 -l 0x10 /dev/loop0p2
c0000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................

If we do, we see that we've allocated our first data chunk:

1 2	# dmsetup status loopcow 0 1916151808 snapshot 24/2097152 16

Copying the mirror

If we try to mount /dev/mapper/loopcow, naturally, we get the same errors that we did before, since we haven't fixed anything yet. But all the investigation and setup were hopefully worth it. Now that we are able to safely tinker with its contents without worrying about destroying our priceless disk image, we can copy the MFTMirr cluster into the first MFT cluster. We need to do the offset calculation carefully, so we save the patch to a file, giving us a chance to review it:

# xxd -s 0x2000 -l 0x1000 -o "$(dc -e '16i C0000000 2000-p')" /dev/mapper/loopcow > mftmirr_patch
# head -n1 mftmirr_patch
c0000000: 4649 4c45 3000 0300 14ae 1c4c 8300 0000  FILE0......L....
# tail -n1 mftmirr_patch
c0000ff0: 0000 0000 0000 0000 0000 0000 0000 6007  ..............`.

This looks acceptable, so we apply the patch:

1	# xxd -r mftmirr_patch /dev/mapper/loopcow

Because this is a single cluster (0x1000 B = 4 KB), it fits exactly into one of our 8-sector chunks, and the COW usage is unchanged:

1 2	# dmsetup status loopcow 0 1916151808 snapshot 24/2097152 16

However, the change was definitely applied, because we can now

1	# mount /dev/mapper/loopcow /mnt

without any complaints! The filesystem contents are all there:

1 2	# file /mnt/Windows/System32/notepad.exe /mnt/Windows/System32/notepad.exe: PE32+ executable (GUI) x86-64, for MS Windows

Presumably something like

1	# rsync -av --progress /mnt/ recovered_data

is now in order.

Tidying up

Once the filesystem contents are no longer needed, it's easy to dismantle the scaffolding that we have constructed:

1
2
3

# umount /mnt
# dmsetup remove loopcow
# losetup -d /dev/loop{0,1}

The cowfile retains the patch we have made, so it may be reused to mount the filesystem again. Alternatively, we could apply the patch directly to the disk image. Since the image I'm working with is for the entire drive, the offsets will need to be adjusted.

The cluster at 0xc6500000 is essentially empty, so even if we mistakenly write some garbage here, we shouldn't lose any data:

# xxd -a -s "$(dc -e '206848 512* 16i C0000000+p')" -l 0x1000 disk.img
c6500000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
*
c65007f0: 0000 0000 0000 0000 0000 0000 0000 6007  ..............`.
c6500800: 0000 0000 0000 0000 0000 0000 0000 0000  ................
*
c6500ff0: 0000 0000 0000 0000 0000 0000 0000 0000  ................

We make a patch using

1	# xxd -s "$(dc -e '206848 512* 16i 2000+p')" -l 0x1000 -o "$(dc -e '16i C0000000 2000-p')" disk.img > mftmirr_patch_offset

and see that the offsets and the data look correct:

# head -n1 mftmirr_patch_offset
c6500000: 4649 4c45 3000 0300 14ae 1c4c 8300 0000  FILE0......L....
# tail -n1 mftmirr_patch_offset
c6500ff0: 0000 0000 0000 0000 0000 0000 0000 6007  ..............`.

Now it's just a matter of applying the patch. Having triple-checked its contents, we run

1	# xxd -r mftmirr_patch_offset disk.img

This allows us to

1 2	# losetup -Pr --show -f disk.img /dev/loop0

and

1	# mount -o ro /dev/loop0p2 /mnt

without needing the COW overlay.