Friday, 23 July 2010

Python

As regular readers will know here in the Sausage Factory our primary forensics tool is Encase. From time to time however we need to try out other tools to validate our results. Recently I wanted to utilise two python scripts widely discussed elsewhere and as a result had to figure out the mechanics of getting these scripts to run on a forensic workstation running Windows 7. I thought I'd share the process with you. Now some of you are highly geeky programmer types who write and run scripts for breakfast - if thats you turn away now. This blog post is in no way definitive and is intended for python newbies wishing to run python scripts in their forensicating but who until now didn't know how.

First off we need to install and configure Python

  • Download Python - I downloaded Python 2.7 Window X86-64 installer for my Windows 7 64 bit box
  • Run the installer
  • Right click on the Computer icon, select properties, select Advanced system settings and click on the Environment Variables button.
  • In the System Variables pane you will have a variable entitled Path, select it and click on edit
  • Add to the entries already there ;C:\Python27 (assuming you installed Python 2.7 to the default location)

The two scripts I wanted to run were David Kovar's analyzeMFT and the $USNJRNL parser written by Seth Nazzaro. They are designed to parse MFTs and USN Change Journals respectively which can be copied out of an image or made available via VFS or PDE. More about analyzeMFT can be found at the author's blog. Detailing how I ran these scripts will give a clear indication of how to run these, and many other python scripts, and utilise their output.

analyzeMFT
Download script by visiting http://www.integriography.com/ and right clicking on the Downloaded Here link in the Downloads section (for the source code) and saving the download as a text file. Once downloaded change the file extension to .py.

Save it somewhere and then run IDLE (installed with Python) and open the analyzeMFT.py script. Locate the words noGUI = False and edit to read noGUI = True and save.

To run

  • open command prompt
  • at prompt type Python C:\Path_to_the_script\analyzeMFT.py -f U:\Path_to_your_extracted_or_mounted_MFT\$MFT -o $MFT_parsed
  • The above command runs the script against your extracted or mounted $MFT and outputs the results to a file $MFT_parsed
  • Open $MFT_parsed using the text import wizard in Excel selecting the text format for each column.

Thanks to David Kovar for making this script available.

$USNJRNL•$J Parser
This script can be downloaded at http://code.google.com/p/parser-usnjrnl/.

To run

  • open command prompt
  • at prompt type Python C:\Path_to_the_script\UsnJrnl.py -f U:\Path_to_your_extracted_or_mounted_USNJRNL•$J\USNJRNL•$ -o Output_file -c
  • The above command runs the script against your extracted or mounted $USNJRNL•$J and outputs the results to Output_file.csv

Notes
Typing at the command prompt Python path_to_script.py wil give some help about a scripts options. For example Python UsnJrnl.py results in the output

Usage: UsnJrnl.py [options]
Options:
-h, --help show this help message and exit
-f INFILENAME, --infile=INFILENAME
input file name
-o OUTFILENAME, --outfile=OUTFILENAME
output file name (no extension)
-c, --csv create Comma-Separated Values Output File
-t, --tsv create Tab-Separated Values Output File
-s, --std write to stdout

I have installed Python 2.7. There are other (and later) versions available including some that are not completely open source. It is also possible to install Python modules to provide a GUI. I have not installed these - takes the fun out of running scripts!


Monday, 19 July 2010

Gatherer Transaction Log Files - a Windows Search artefact

A recurring theme in many examinations is the prevalence of evidence in unallocated clusters. Reinstallation of the OS is often to blame and a recent case where XP was installed on a drive where the previous OS was Vista further complicated matters. All relevant data had been created during Vista's reign and the challenge was to determine what files and folders existed under this OS. The Encase Recover Folders feature assisted to an extent as did Digital Detective's Hstex 3. Loading the output of Hstex 3 into NetAnalysis allowed me to identify the download of a number of suspect files and some local file access to files within the Downloads folder.

The next step was to carry out a keyword search utilising the suspect file names as keywords. This is always a good technique and results in the identification of useful evidence in a variety of artefacts (e.g. index.dats, link files, registry entries, NTFS file system artefacts et al) but because in this case every thing was unallocated identifying all the artefacts was a little tricky. A considerable number of the search hits were clearly within some structured data but the data was not an artefact I was familiar with.

I have highlighted Record Entry Headers to draw attention to the structured nature of the data. This screen shot is of test data where the file names/path are stored as unicode as opposed to ASCII in the case I was investigating.

A bit of googling led me to page 42 of Forensic Implications of Windows Vista - Barrie Stewart which identified the structured data I had located as being part of Gatherer Transaction Log files created by the search indexer process of Windows Search. These files have a filename in the format SystemIndex.NtfyX.gthr where the X is replaced by a decimal number and on a live Vista system can be found at the path

C:\ProgramData\Microsoft\Search\Data\Applications\Windows\Projects\SystemIndex\


These files have the words Microsoft Search Gatherer Transaction Log. Format Version 4.9 as a file header. The files are a transaction log of entries committed to the Windows search database indexing queues. The SearchIndexer process monitors the USN Change Journal which is part of the NTFS file system used to track changes to a volume. When a change is detected (by the creation of a new file for example) the SearchIndexer is notified and the file (providing it is in an indexable location - mainly User folders) is added to the queue to be indexed. The USN Change Journal is also something that may contain evidentially useful information and I will look at it in more depth in a later blog post.

Sometimes artefacts are only of academic interest but it was fairly apparent that this data could have some evidential value. Each file or folder has a record entry; parts of which had been deconstructed by Stewart. I was able to identify two additional pieces of information within each record - the length of the Filename block and a value that is possibly a sequence or index number or used to denote priority. I also observed some variations in some parts of the record that had been constant in Stewart's test data.

  • Record Header 0x4D444D44 [4 bytes]
  • Unknown variable data [12 bytes]
  • FILETIME Entry [8 bytes]
  • FILETIME Entry [8 bytes] or a value of 0x[0100]00000000000000
  • Unknown variable data [12 bytes]
  • Length of file path following plus 1 byte (or plus 2 bytes if file path stored as unicode) [4 bytes] stored as 32 bit integer
  • Name and fullpath of file/folder (ASCII or Unicode -version dependant) [variable length]
  • 0x000000000000000000FFFFFFFF [13 bytes]
  • FILETIME Entry [8 bytes] or a value of 0x[0100]00000000000000
  • 0xFFFFFFFF [4 bytes]
  • FILETIME Entry [8 bytes] or a value of 0x[0100]00000000000000
  • Unknown variable data [4 bytes]
  • Sequence or index number? [1 byte] stored as 8 bit integer
  • Unknown variable data [15 bytes]
  • FILETIME Entry [8 bytes] or a value of 0x[0100]00000000000000
  • FILETIME Entry [8 bytes] or a value of 0x[0100]00000000000000
  • Unknown variable data [20 bytes]

Microsoft do not seem to have publicly documented the record structure. To establish how useful this data can be I came to the conclusion that I needed to recover all of these records from unallocated. I needed an enscript and Oliver Smith over at Cy4or kindly wrote one for me. I wanted the enscript to parse out the file and path information, sequence or index number, the six time stamps and a hex representation of each unknown range of data into a spreadsheet. The script searches for and parses individual records (from the live systemindex file and unallocated) as opposed to entire files. I was astonished at just how much information the script parsed out - email me if you want a copy. Setting the spreadsheet to use a fixed width font (Courier New) lines up the extracted hex very well should anyone want to reverse engineer these records further. As it stands the file paths and timestamps can provide some useful evidential information, particularly when the recovered records have been recovered from unallocated clusters and relate to a file system older than the current one.

Timestamps
Obviously once you have run this enscript or manually examined the records the first question that arises is what are the timestamps. Establishing this has not been as easy as it could be and hopefully a little bit of crowd sourcing will sort this out for all of us. Post a comment if you can help in this regard. One approach is to use the hex 64 bit filetime value as a keyword and see where you get hits. Hits in another timestamp indicates that the timestamp is the same down to the nanosecond. Carrying out this process will result in hits in OS system files and fragments of them. I have found on the limited test data set I have used that Timestamp 3 matched the File Modified (File Altered) date within MFT for the file concerned and the timestamp for the same file in the USN Change Journal. The timestamp in the USN Change Journal record is the absolute system time that the change journal event was logged (1). It is worth reminding readers who are Encase users that Encase uses different terminology for the time stamps within the MFT - file modified is referred to as Last Written. I think it likely that timestamps 1 and 2 are linked to the indexing function (e.g. time submitted for indexing) given the journalling nature of the file but can not either prove this by testing or confirm this within Microsoft documentation. I can say that in testing sorting on Timestamp 1 gave a clear timeline of the file system activity I had provoked within User accessible folders.



Example CSV output of Enscript (click to enlarge)

References
Good citizenship when developing background services for Windows Vista - Microsoft
Forensic Implications of Windows Vista - Barrie Stewart
Forensic Artefacts Present in Microsoft Windows Desktop Search - John Douglas MSc Thesis
Indexing Process in Windows Search - Microsoft MSDN
(1) http://msdn.microsoft.com/en-us/library/cc232038%28PROT.10%29.aspx