Monday 19 July 2010

Gatherer Transaction Log Files - a Windows Search artefact

A recurring theme in many examinations is the prevalence of evidence in unallocated clusters. Reinstallation of the OS is often to blame and a recent case where XP was installed on a drive where the previous OS was Vista further complicated matters. All relevant data had been created during Vista's reign and the challenge was to determine what files and folders existed under this OS. The Encase Recover Folders feature assisted to an extent as did Digital Detective's Hstex 3. Loading the output of Hstex 3 into NetAnalysis allowed me to identify the download of a number of suspect files and some local file access to files within the Downloads folder.

The next step was to carry out a keyword search utilising the suspect file names as keywords. This is always a good technique and results in the identification of useful evidence in a variety of artefacts (e.g. index.dats, link files, registry entries, NTFS file system artefacts et al) but because in this case every thing was unallocated identifying all the artefacts was a little tricky. A considerable number of the search hits were clearly within some structured data but the data was not an artefact I was familiar with.

I have highlighted Record Entry Headers to draw attention to the structured nature of the data. This screen shot is of test data where the file names/path are stored as unicode as opposed to ASCII in the case I was investigating.

A bit of googling led me to page 42 of Forensic Implications of Windows Vista - Barrie Stewart which identified the structured data I had located as being part of Gatherer Transaction Log files created by the search indexer process of Windows Search. These files have a filename in the format SystemIndex.NtfyX.gthr where the X is replaced by a decimal number and on a live Vista system can be found at the path

C:\ProgramData\Microsoft\Search\Data\Applications\Windows\Projects\SystemIndex\


These files have the words Microsoft Search Gatherer Transaction Log. Format Version 4.9 as a file header. The files are a transaction log of entries committed to the Windows search database indexing queues. The SearchIndexer process monitors the USN Change Journal which is part of the NTFS file system used to track changes to a volume. When a change is detected (by the creation of a new file for example) the SearchIndexer is notified and the file (providing it is in an indexable location - mainly User folders) is added to the queue to be indexed. The USN Change Journal is also something that may contain evidentially useful information and I will look at it in more depth in a later blog post.

Sometimes artefacts are only of academic interest but it was fairly apparent that this data could have some evidential value. Each file or folder has a record entry; parts of which had been deconstructed by Stewart. I was able to identify two additional pieces of information within each record - the length of the Filename block and a value that is possibly a sequence or index number or used to denote priority. I also observed some variations in some parts of the record that had been constant in Stewart's test data.

  • Record Header 0x4D444D44 [4 bytes]
  • Unknown variable data [12 bytes]
  • FILETIME Entry [8 bytes]
  • FILETIME Entry [8 bytes] or a value of 0x[0100]00000000000000
  • Unknown variable data [12 bytes]
  • Length of file path following plus 1 byte (or plus 2 bytes if file path stored as unicode) [4 bytes] stored as 32 bit integer
  • Name and fullpath of file/folder (ASCII or Unicode -version dependant) [variable length]
  • 0x000000000000000000FFFFFFFF [13 bytes]
  • FILETIME Entry [8 bytes] or a value of 0x[0100]00000000000000
  • 0xFFFFFFFF [4 bytes]
  • FILETIME Entry [8 bytes] or a value of 0x[0100]00000000000000
  • Unknown variable data [4 bytes]
  • Sequence or index number? [1 byte] stored as 8 bit integer
  • Unknown variable data [15 bytes]
  • FILETIME Entry [8 bytes] or a value of 0x[0100]00000000000000
  • FILETIME Entry [8 bytes] or a value of 0x[0100]00000000000000
  • Unknown variable data [20 bytes]

Microsoft do not seem to have publicly documented the record structure. To establish how useful this data can be I came to the conclusion that I needed to recover all of these records from unallocated. I needed an enscript and Oliver Smith over at Cy4or kindly wrote one for me. I wanted the enscript to parse out the file and path information, sequence or index number, the six time stamps and a hex representation of each unknown range of data into a spreadsheet. The script searches for and parses individual records (from the live systemindex file and unallocated) as opposed to entire files. I was astonished at just how much information the script parsed out - email me if you want a copy. Setting the spreadsheet to use a fixed width font (Courier New) lines up the extracted hex very well should anyone want to reverse engineer these records further. As it stands the file paths and timestamps can provide some useful evidential information, particularly when the recovered records have been recovered from unallocated clusters and relate to a file system older than the current one.

Timestamps
Obviously once you have run this enscript or manually examined the records the first question that arises is what are the timestamps. Establishing this has not been as easy as it could be and hopefully a little bit of crowd sourcing will sort this out for all of us. Post a comment if you can help in this regard. One approach is to use the hex 64 bit filetime value as a keyword and see where you get hits. Hits in another timestamp indicates that the timestamp is the same down to the nanosecond. Carrying out this process will result in hits in OS system files and fragments of them. I have found on the limited test data set I have used that Timestamp 3 matched the File Modified (File Altered) date within MFT for the file concerned and the timestamp for the same file in the USN Change Journal. The timestamp in the USN Change Journal record is the absolute system time that the change journal event was logged (1). It is worth reminding readers who are Encase users that Encase uses different terminology for the time stamps within the MFT - file modified is referred to as Last Written. I think it likely that timestamps 1 and 2 are linked to the indexing function (e.g. time submitted for indexing) given the journalling nature of the file but can not either prove this by testing or confirm this within Microsoft documentation. I can say that in testing sorting on Timestamp 1 gave a clear timeline of the file system activity I had provoked within User accessible folders.



Example CSV output of Enscript (click to enlarge)

References
Good citizenship when developing background services for Windows Vista - Microsoft
Forensic Implications of Windows Vista - Barrie Stewart
Forensic Artefacts Present in Microsoft Windows Desktop Search - John Douglas MSc Thesis
Indexing Process in Windows Search - Microsoft MSDN
(1) http://msdn.microsoft.com/en-us/library/cc232038%28PROT.10%29.aspx


6 comments:

Jon Stewart said...

Have you tried graphing the different timestamps vs the separate filesystem timestamps? I wonder whether you'd see any overlays.

Jim Gordon said...

Richard,

As you are aware John Douglas of QCC fame together with myself covered the Windows indexing service for our dissertations at Cranfield. I think that I did more experimentation around the Gatherer Transaction Log file and managed to decode some further entries in addition to the one's that Barrie identified. I'll try and dig my Thesis out and email it to you. I think that identified the various FILETIME entries and what they relate to.

Regards

Jim

DC1743 said...

Jim,

Thanks -that will be a good help and I'll update the blog post.

Richard

HP said...

Rich

I don't know if you have looked at this but I was looking at a windows.edb file today and noticed a table with references in there to what must be the gthr files and this included some date fields.

H

DC1743 said...

Thanks Harry,

Jim Gordon's MSc thesis was on Windows Desktop Search. He found that one of the tables within the Windows.edb file was entitled SystemIndex_Gthr. Jim identified four FILETIME stamps within each record in this table -First Accessed, Last Accessed, Last Modified and Time MD5 changed. I don't think he established conclusively a direct relationship between these timestamps and the timestamps within the Gatherer Transaction Log Files.

He also established that in certain circumstances that the absence of certain timestamps indicated that a file had been deleted but concluded that a longer test regime was required.

I have carried out some further testing aided by Jim's research but at this stage I can only commit to Timestamp 3 being the File Modified time (as recorded in the MFT) of the file at the time of indexing and where there is a full complement of Timestamps TS1 represents the time the file was sent for indexing.

Richard

H. Carvey said...

I know it's well after the fact, but I looked these up on Win7, and they have an entirely different format now...