Some thoughts about file carving
File carving is the process of reassembling computer files from fragments in the absence of filesystem metadata.
This practice allows searching files or other kinds of objects based on content, rather than on metadata, for example for recovering files and fragments of files when directory entries are corrupt or missing, as may be the case with old files that have been deleted or when performing an analysis on damaged media.
File carving is frequently used during a digital investigation when the unallocated file system space is analyzed to extract files.
How it works
All filesystems contain some metadata that describes the file system structure, for example the hierarchy of folders and files, with names for each and the physical address on the disk where the file is stored.
File carving is the process of trying to recover files without this filesystem structure.
This is done by analyzing the raw data usually looking for specific sequences of bytes in file headers or footers: this bytes used for file idenfication are named “Magic Numbers”.
The Magic Numbers
The term magic number has different meanings, however here we are focusing on file, hence the magic number is a signature used to identify a file format.
Detecting such constants in files is a simple way of distinguishing between file formats, basically every file has an header and a footer in order to get correctly recognized, for example a pdf file starts with “%PDF” and ends with “%EOF” while a jpeg image file begins with “0xFFD8” and ends with “0xFFD9” and a Java class file has as its first four bytes the hexadecimal value CA FE BA BE.
Here a very complete list of signatures, by Gary Kessler: https://www.garykessler.net/library/file_sigs.html
In a forensic perspective it’s important to know something about file system fragmentation, that typically occurs when data is not contiguously stored, due to low free space or deletion/truncation files.
An example of fragmentation is the slack space in this figure, that rappresent and example of a tipical file allocation:
Slack space is the difference between the physical file size and logical file size.
E.g for a 5000 byte file, which is given 2 clusters (8192 bytes), the file slack will be 8192 – 5000, which is 3192 bytes.
The file slack should always be less than 1 cluster (4096 bytes).
For more information about slack space please refer to this article about FAT filesystem.
It can be assumed that large hard drives are less likely to have fragmented files that the smaller ones of the past.
High fragmentation eventually might be seen on large files such as email archive files.
New operating systems try to prevent fragmentation avoiding reuse of space from deleted files as much as possible.
The carving process
Data carving might be classified as basic and advanced
Basic data carving can be used when:
- the beginning of file is not overwritten
- the file is not fragmented
- the file is not compressed (i.e. NTFS compression)
basically this type of carving is made analyzing header and footer.
Advanced data carving occurs even to fragmented files, where fragments are:
- not sequential
- out of order
This technique performs carving based on an analysis of the contents of the proposed files.
However, there a lot of tools useful to automate the carving process.
For my tools shortlist, please refers to this article: Four tools for File Carving in forensic analysis
References and further readings
- Some thoughts about FAT Filesystem
- Four tools for File Carving in forensic analysis
- File Carving | CERTSI
- Gary Kessler’s list of file signatures
- SANS Digital Forensics and Incident Response
- Kevin Ripa on Twitter