Background
Facebook has a built in instant messaging facility which has grown in popularity along with the Facebook social networking site itself. Many cases involve potential grooming offences in which the use of instant messaging needs to be investigated.
The instant messaging facility creates a number of artefacts which are easily found and I know have been commentated on elsewhere. The purpose of this blog post is to suggest a methodology to automate the discovery and reporting of Facebook messages.
For those who have not looked at this area in detail yet messages are cached in small html files with a file name P_xxxxxxxx.htm (or .txt). These messages can be found in browser cache, unallocated clusters, pagefiles, system restore points, the MFT as resident data and possibly other places. It is possible for the messages to be cached within the main Facebook profile page (although I have never seen them there - the main facebook page does not seem to be cached that often).
An example of a message is shown below:
for (;;);{"t":"msg","c":"p_1572402994","ms":[{"type":"msg","msg":{"text":"Another Message","time":1237390150796,"clientTime":1237390150114,"msgID":"3078127486"},"from":212300220,"to":1123402994,"from_name":"Mark PPPPPP","to_name":"Richard XXXX","from_first_name":"Mark","to_first_name":"Richard"}]}
The bulk of the message is in fact formatted as JavaScript Object Notation normally referred to as JSON. The format is a text based and human readable way for representing data structures. The timestamps are 13 digit unix timestamps that include milliseconds - they can be divided by 1000 to get a standard unix timestamp.
Although keyword searches will find these messages they are difficult to review particularly if you are only interested in communication between selected parties. Having found relevant hits you then have to create a sweeping bookmark for each one. For these reasons I follow the following methodology.
Suggested Methodology
- Create a Custom File Type within the Encase Case Processor File Finder module entitled Facebook Messages using the Header "text":" and the footer }]} making sure GREP is not selected.
- Run the file finder with the Facebook Messages option selected.
- When the file finder completes you will have a number of text files in your export directory (providing there are messages to be found).
- These text files are in the form of the example above. They do not have Carriage Return and Line Feed characters at the end of the text. We need to remedy this by utilising a DOS command at the command prompt.
- At the command prompt navigate to the directory containing your exported messages (please note Encase creates additional sub directories beneath your originally specified directory).
- Then run the following command:
FOR %c in (*.txt) DO (Echo.>>%~nc.txt)
This command adds a Carriage Return and Line Feed to the end of the extracted message.
- Next we want to concatenate the message text files into one file using the command at the DOS prompt: copy *.txt combined.txt
- Alternatively create (or email me for) a batch file that executes these two commands direct from windows.
- An additional file combined.txt will be created in your export directory.
- Launch Microsoft Excel and instigate the Text Import Wizard specifying Delimited with the Delimiter being a comma and the text qualifier " .
Put the data into your worksheet (or cell J3 of my pre-formatted worksheet). - All that's needed now is to tidy up the worksheet with some Excel formulas the full details of which can be found within my example pre-formatted worksheet. The formula to process the time values (which are Unix time stamps) is (RIGHT(K2,13))/1000/86400+25569 where K2 is the cell containing the source time data.
- Perform a sanity check and remove obviously corrupt entries.
- It can be seen below that after applying a data sort filter you can sort by time or user.
- The spreadsheet also allows you to de-duplicate the found messages. In my recent case over half the recovered messages were duplicates. In Excel 2007 these duplicate (rows) are easily removed (Data/DataTools/Remove Duplicates). In Excel 2003 an add-in called The Duplicate Master will do this for you.
Further Thoughts
Non Encase users may be able to use an alternative file carver (e.g. Blade) to carve out the messages. I am sure that the header and footer could be refined a bit to reduce false positives, however for me the ratio of legitimate versus false positives is OK. UPDATE 22nd April 2009 - non encase users may wish to look at my more recent post.
I have the pre-formatted spreadsheet in template form. Please email me for a copy (with a brief explanation of who you are - thanks).
To further investigate the data you recover you may wish to check out http://www.facebook.com/profile.php?id=xxxxxxx. Just substitute the xxxxx with the User ID's you recovered.
Enscript Method
I have collaborated with Simon Key and now have an enscript to parse out JSON objects including messages. It outputs to a csv spreadsheet and in my tests parsed 160GB in about an hour. It might not be as tolerant of corrupt strings as the method detailed above. The script will only run in 6.13 or newer. I have a template that tidies up the formatting of the csv- email me if you want a copy.
References and thanks
http://coderrr.wordpress.com/2008/05/06/facebook-chat-api
http://video.google.com/videoplay?docid=6088074310102786759
http://json.org/
http://en.wikipedia.org/wiki/JSON
http://www.wilsonmar.com/datepgms.htm#UNIXStamp
Thanks to Glenn Siddall for sparking my interest and providing me with some notes of his research.
Thanks to Mark Payton for his assistance in researching this.