Wednesday, 9 April 2014

Mac OS X "Set date and time automatically"

Just as I've found the time to write up a blog post along comes an appropriate time related subject.

Recently I examined an Apple iMac running Mountain Lion.  I was only given access to an image. This presented a problem because the matter under investigation relied on accurate time stamps and I had no system clock to check. I knew that by default Apple OS X (Snow Leopard through to Mavericks for certain and probably earlier versions) will Set date and time automatically whilst connected to the internet (to allow a connection to a network time server using Network Time Protocol [ntp] ) as shown in Figure 1.

Figure 1


























The problem to overcome then was to establish where this setting was stored - I knew it would be in a plist somewhere - isn't everything Mac related stored in a plist?  :-)  So many plists so little time!  A little bit of research and testing allowed me to determine exactly where the setting was stored and as a result share this with you. Figure 2 shows the overrides.plist in the Finder at the path /private/var/db/launchd.db/com.apple.launchd.

Figure 2

















This plist is an xml plist so it can be read easily.  Figure 3 shows this plist viewed with the Xcode Property List Editor.

Figure 3






















The key we are interested in is the org.ntp.ntpd key.  If the Disabled value is true Set date and time automatically is turned off.

Hope this helps someone.

Richard

Tuesday, 10 December 2013

Apple Safari update and fsCachedData

Recently I have had cause to look again at how the Apple Safari web browser stores cache. Surprisingly much of what I wrote concerning Safari back in 2010 still holds true.  The introduction of OSX Lion brought some changes in that a new table cfurl_cache_receiver_data was created within the SQLite cache.db database and used to store the cached item as a binary large object in the receiver_data field.   Previously this field was within the cfurl_cache_blob_data table.

I have now looked at Safari version 7 running in OSX Mavericks and found that not all cached data is now stored as binary large objects within cache.db.  In Figure 1 we can see a number of entries in the cfurl_cache_receiver_data table.  A number of them contain BLOBs as expected but in entry 18109 we can see that the new field isDataOnFS has a value of 1 and the receiver_data field has a value of ED6AC03A-4E37-46E2-8C95-E29EDC92BF6E.

Figure 1



































In Figure 2 we can see that the new fsCachedData folder contains the file ED6AC03A-4E37-46E2-8C95-E29EDC92BF6E.

Figure 2
























It appears that any file 4,096 bytes and larger is now cached externally in the fsCachedData folder.  I note that the files do not have file extensions so a file signature analysis will be required.  The file name is structured like a GUID. It does not appear to be a hash of the relevant URL.

The eagle eyed amongst my readers will have already spotted another type of entry within the receiver_data field in Figure 1.  Figure 3 shows more of them.

Figure 3

















.

It can be seen that the data in the receiver_data field has the structure '474946.................................3B'. Seasoned forensicators will recognise this data as hex and that contained within it is the header and footer of a gif file. This data is stored within the Sqlite database as a BLOB however the SQLite utilty reading the BLOB has been configured to display BLOBs less than 50 bytes in size as a string.

Until next time.

Richard

Monday, 4 March 2013

Location Data within JPGs

We have become accustomed to fact that many of our digital photographs have location data embedded within them, populated with a GPS receiver. This data is often utilized by the modern photo management programs such as iPhoto and could conceivably have some evidential value at some point.  So where is it stored? You are probably thinking, like I was, that it was sat along with all that other Exif data, is possibly in plain text and that it would be easy to locate and retrieve.  In fact there is a little more to it.

In Figure 1 we can see the first part of a jpg taken with an iPhone5.


Figure 1
My readers will recognize the start of the jpg header 0xFF 0xD8 0xFF  etc etc but I want to draw your attention to 0x4D 0x4D .  These two bytes are the start of a section of data known as a TIFF structure and are part of an 8 byte header.  The TIFF file specification published by Adobe details the information residing in this header.


Figure 2

In ASCII these bytes are read as MM.  These bytes indicate that where applicable the data following should be interpreted Big Endian.  The MM and II references hark back to the competing Intel and Motorola technologies.  Immediately following are the two bytes 0x00 0x2A which decoded as an (in this case big endian) integer have the value 42.  This value 42 always follows MM or II and is part of the TIFF structure header (note the HHGTTG reference). The following four bytes 00 00 00 08 detail the offset (8 in this case) from the start of the TIFF structure header to the start of a structure called an IFD.


IFD is an Image File Directory. Adobe defines it as: a 2 byte count of the number of directory entries (i.e. the number of fields), followed by a sequence of 12 byte field entries, followed by a 4 byte offset of the next IFD (or 0 if none).

We can see that in this case we have 0x00 0x0a (10) 12 byte directory entries the start of each being marked in green.  Adobe details the structure of each entry as follows:

Bytes 0-1 The tag that identifies the field

Bytes 2-3 The field type

Bytes 4-7 the number of values, Count of the indicated type

Bytes 8-11 The Value Offset, the file offset (in bytes) of the Value of the field

The tag we are interested in within the IFD is 0x88 0x25. A comprehensive list of tag numbers can be found at www.exiv2.org/tags.html. The 0x88 0x25 tag is used to store a pointer to another IFD, the GPS Info IFD. If this tag does not exist there isn't a GPS Info IFD and therefore no JPG location information (but don't forget the tag may be stored little endian 0x25 0x88). Bytes 2-3 in this directory entry are 0x00 0x04. This indicates that the field type is type 4. Adobe details the possible field types as follows:

1 = BYTE 8-bit unsigned integer.

2 = ASCII 8-bit byte that contains a 7-bit ASCII code; the last byte must be NUL (binary zero).

3 = SHORT 16-bit (2-byte) unsigned integer.

4 = LONG 32-bit (4-byte) unsigned integer.

5 = RATIONAL Two LONGs: the first represents the numerator of a fraction; the second, the denominator.


We can see that the field type is 4 indicating that the value of this entry will be stored as one or more (depending on the Count)  32 bit unsigned integers.  Bytes 4-7 in this directory entry are 0x00 0x00 0x00 0x01.  This indicates how many instances (a count) of the field type we require, in this case just 1 so this directory entry will contain only 1 32 bit unsigned integer.

The remaining four bytes of any directory entry, bytes 4-7 contain an offset from the TIFF structure header to the location of the Value data.  However Adobe states where the value is four bytes or less (as in this case) that:

To save time and space the Value Offset contains the Value instead of pointing to the Value if and only if the Value fits into 4 bytes. If the Value is shorter than 4 bytes, it is left-justified within the 4-byte Value Offset, i.e., stored in the lower numbered bytes. Whether the Value fits within 4 bytes is determined by the Type and Count of the field.

Our value in this example is one 32 bit unsigned integer which will fit into 4 bytes.  Therefore the remaining 4 bytes in our directory entry 0x00 0x00 0x02 0x5A are the value itself.  You will recall that the 0x88 0x25 tag is used to store a pointer to the GPS Info IFD.  0x00 0x00 0x02 0x5A decoded gives us a value of 602.  This indicates that the GPS Info IFD can be found at offset 602 from the TIFF Structure header.  Figure 3 below shows the data at offset 602 (from the TIFF structure header, not the start of the file).  We can see that the first two bytes at this offset are 0x00 0x09. This indicates the number of 12 byte directory entries to be found in this IFD, in this case 9, the start of each marked in green.

Figure 3
The GPS IFD has a series of potential tag values.  A comprehensive list can be found at http://www.awaresystems.be/imaging/tiff/tifftags/privateifd/gps.html.  From this list we can see that the tags of most interest here are:
Hex     Name                                                                                             Description

0001 GPSLatitudeRef   ------ Indicates whether the latitude is north or south latitude.

0002 GPSLatitude ----------   Indicates the latitude.

0003 GPSLongitudeRef ----  Indicates whether the longitude is east or west longitude.  

0004 GPSLongitude ---------  Indicates the longitude.


Figure 4
Figure 4 above shows the GPSLatitudeRef entry. 0x00 0x01 is the tag number. 0x00 0x02 indicates the value is stored as ASCII. 0x00 0x00 0x00 0x02 indicates how many instances of the ASCII field type we can expect, in this case two.  Two ASCII characters may be stored in two bytes which means that the value itself will be stored in the leftmost bytes of the remaining four bytes of the entry.  0x53 decodes in ASCII to uppercase S followed by 0x00.  The GPS LongitudeRef entry decodes in a similar fashion.

Figure 5
Figure 5 above shows the GPSLatitude entry. 0x00 0x02 is the tag number. 0x00 0x05 indicates that the type of value stored is a RATIONAL. 0x00 0x00 0x00 0x03 indicates how many instances of the RATIONAL value we can expect, in this case three. RATIONALs are 8 bytes long and three of them are needed to store the required value, so the remaining four bytes of this entry 0x00 0x00 0x02 0xCC indicate where the value is stored offset from the start of the TIFF structure.  In this case the offset is 716.

Figure 6
Figure 6 above shows the three RATIONALs at offset 716.  Each RATIONAL consists of two LONGs highlighted in red and yellow.  The first LONG represents the numerator of a fraction, the second LONG the denominator. We can decode the RATIONAL values as follows:
0x00 0x00 0x00 0x06 / 0x00 0x00 0x00 0x01  = 6/1 = 6
0x00 0x00 0x05 0xFC / 0x00 0x00 0x00 0x64  = 1532/100  =  15.32
0x00 0x00 0x00 0x00 / 0x00 0x00 0x00 0x01  = 0/1  =  0

The specification for storing the latitude states: the latitude is expressed as three RATIONAL values giving the degrees, minutes, and seconds, respectively. If latitude is expressed as degrees, minutes and seconds, a typical format would be dd/1,mm/1,ss/1. When degrees and minutes are used and, for example, fractions of minutes are given up to two decimal places, the format would be dd/1,mmmm/100,0/1.  It can be seen here that the latitude is stored in degrees and minutes.

Figure 7
Figure 7 above shows the GPSLongitude entry. 0x00 0x04 is the tag number. 0x00 0x05 indicates that the type of value stored is a RATIONAL. 0x00 0x00 0x00 0x03 indicates how many instances of the RATIONAL value we can expect, in this case three. RATIONALs are 8 bytes long and three of them are needed to store the required value, so the remaining four bytes of this entry 0x00 0x00 0x02 0xE4 indicate where the value is stored offset from the start of the TIFF structure. In this case the offset is 740.

Figure 8
Figure 8 above shows the three RATIONALs at offset 740. Each RATIONAL consists of two LONGs highlighted in blue and yellow. It can be seen that these RATIONAL values follow those for the latitude.  The first LONG represents the numerator of a fraction, the second LONG the denominator. We can decode the RATIONAL values as follows:

0x00 0x00 0x00 0x6A / 0x00 0x00 0x00 0x01 = 106/1 = 6

0x00 0x00 0x12 0xFE / 0x00 0x00 0x00 0x64 = 4862/100 =  48.62

0x00 0x00 0x00 0x00 / 0x00 0x00 0x00 0x01 = 0/1 = 0

The specification for storing the longitude states: the longitude is expressed as three RATIONAL values giving the degrees, minutes, and seconds, respectively. If longitude is expressed as degrees, minutes and seconds, a typical format would be ddd/1,mm/1,ss/1. When degrees and minutes are used and, for example, fractions of minutes are given up to two decimal places, the format would be ddd/1,mmmm/100,0/1. It can be seen here that the longitude is stored in degrees and minutes.
To confirm our findings I have used Irfan View to examine the EXIF data, the GPS data can be seen in Figure 9 below.
Figure 9
For the curious the actual photograph is below.

Figure 10
The location can be viewed on a map at http://maps.google.com/maps?q=-6.255333,106.810333&z=15












References

Tuesday, 15 May 2012

Windows Live Messenger – MessengerCache folder

A recent case was unusual because most of the ipoc were located by the police examiner in a folder entitled MessengerCache at the path C:\Users\<user_name>\AppData\Local\Temp\MessengerCache.

My mission was to have a closer look at how this folder is utilised by the program Windows Live Messenger.  The folder is a hidden folder and is used for various purposes by WLM.  I found that the folder can be used to store the user tile (this may be an icon or a thumbnail photograph or graphic) and theme picture of a remote contact. Of course the remote user (who could be anywhere in the world) can change these at any time to a contraband image.  In Figure 1 below the screenshot shows the Windows Live Messenger program running upon the local user’s computer. The two photographs arrowed and labelled as Remote User Tile and Remote User Theme Picture respectively have been received from the remote user Mars with whom the local user is engaged in an instant messaging conversation.

image

It is also possible for a remote contact anywhere in the world whilst engaged in an instant messaging conversation with the local user to drag a picture file into the conversation window. This results in the picture concerned appearing in the local user’s conversation window in full size and thumbnail form and at the same time a copy of the picture and a thumbnail version are stored within the MessengerCache folder. In the case that the picture concerned was ipoc the local user’s only immediate option would be to close the conversation window. He would be unlikely to be aware that the photograph concerned was now stored upon his own computer in the MessengerCache folder. In figure 2 the screenshot shows the local user’s conversation window after the remote user Mars has dragged a photograph of tulips into his conversation window. This has caused the local user’s conversation window to also display the tulip pictures. The tulip photograph would also be stored in full and thumbnail versions within the local user’s MessengerCache folder.

image

Figure 3 below illustrates a forensic examination of the local user’s MessengerCache folder. It can be seen that it contains the Remote User Tile and Remote User Theme Picture together with three different versions (they differ in resolution) of the Tulip picture. At this point none of these five pictures were solicited or accepted by the local user.

image

In the case referred to the prosecution, after discussions at court, offered no evidence in respect to all the counts on the indictment that relied on the pictures located within the MessengerCache folder.  The defendant pleaded guilty to one count of possession not related to the MessengerCache pictures.

Old Servers never die – unfortunately

But you can bet your last penny that at some stage you will have to image them.  That is the problem I faced one wet weekend recently when I was required to image an HP behemoth resplendent with two sizable raid 5 arrays and two USB 1 ports.  All drive bays and ports were in use so I could not insert a new drive into the box to image it and I didn’t fancy imaging all the elderly SCSI raided hard drives separately.  I was permitted to shut down the server and had decided to boot the box to a forensic linux distro that had suitable HP Raid Controller drivers.

The problem I faced was USB1.  Obviously I needed to output my images somewhere and an external USB hard drive was an option.  But the maths didn’t add up – the maximum bandwidth of a USB1 port is 12 megabits per second (Mbps) which equates to 1.5 megabytes per second (MB/s) which equates to 5.4 Gigabytes per hour.  There were not going to be enough hours in this weekend to image both arrays on the server. 

What I did next I thought might be worth sharing with you.  I used dd to create a source image, netcat to pipe it to an onsite laptop across a network and ewfacquirestream to capture the dd image, hash it and write it into Encase evidence files. It can be carried out entirely at the command line.  Crucially I achieved an imaging speed of about 25 MB/s which is 1.46 gigabytes a minute or nearly 88 gigabytes an hour using gigabit network interface cards.  In testing I have achieved 39 gigabytes an hour using 10/100 NICS.

Method to image computers across a network

  1. I connected my onsite laptop and the server via Cat5E cables to a Netgear GS105 5 port gigabit switch.  I attached a 2TB external hard drive to my onsite laptop and booted both the server and my laptop to a DEFT 7 forensic linux distro.
  2. To configure Ethernet settings on both using Gigabit NICs (10/100/1000) if available
    • Launch terminal and at prompt type sudo su
    • At prompt type ifconfig to identify network cards
    • At prompt type ifconfig eth0 192.168.0.100 on onsite laptop and ifconfig eth0 192.168.0.101 on machine to be imaged (these commands assume that you are pugged into eth0 – if there is more than one NIC on the computer to be imaged it might be eth1 or higher)
    • Test connection by typing at prompt ping –c 5 192.168.0.100 or ping –c 5 192.168.0.101 as appropriate
  3. On on-site laptop
    • Connect collection hard disk drive
    • Launch terminal and at prompt type sudo su
    • At prompt type fdisk –l to identify storage drive
    • Create a folder to mount the storage drive to by typing mkdir /mnt/(name of your folder)
    • Next mount the storage drive to your folder by typing mount /dev/(sdb2 or whatever) /mnt/(name of your folder)
    • Now we create a netcat listener and a pipe to ewfacquirestream – at prompt type but donʼt press enter just yet nc –l 1234 | ewfacquirestream –c none –b 4096 –C case_number –D description –w –E evidence_number –e ʻRichard Drinkwaterʼ –t /mnt/(name of your folder)/(name of your evidence files)
      [relevant switches –c compression type: none, fast or best; -b amount of sectors to read at once: 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384 or 32768; words in italics change to suit and use single quote marks (ʻ-- --ʼ) to group more than one word]
  4. On machine to be imaged
    • At prompt type sudo su
    • At prompt type fdisk –l to identify drive to be imaged
    • Next we prepare to dd drive to be imaged and pipe to netcat – at prompt type dd if=/dev/sdb conv=noerror,sync bs=4096 | nc 192.168.0.100 1234 but donʼt press enter (if you are imaging a server with an HP Raid card the command might look something like dd if=/dev/cciss/c0d0 bs=4096 conv=noerror,sync | nc 192.168.0.100 1234)
  5. Start imaging process by
    • Press enter within terminal on onsite laptop first to start netcat listener
    • Then press enter within terminal on machine to be imaged to start dd
  6. When the acquisition completes ewfacquirestream outputs a MD5 hash calculated over data value to the terminal. Either photograph this value or copy and paste it to a text file on your collection hard disk drive.

 

Notes re imaging speed

In testing where the NICs are both gigabit speeds of over 40 Mb/s (144 GB/h) can be achieved. With 10/100 NICs up to 11 Mb/s (39.6 GB/h) can be expected. Compression and block size does affect imaging speed and if you have time it may be worth fine-tuning these settings. The settings shown in this post are probably a good starting point. To fine-tune, run the imaging process with the settings in this post. After 5 minutes or so if you are getting poor speeds stop the process and try adjusting the compression size on the onsite laptop (i.e. change from none to fast). Sometimes either doubling or halving the block size on both source and receiver machines can make a difference also.

Wednesday, 8 February 2012

Adobe Bridge CS3 and some MySQL stuff

Like buses – you wait all day for one and then two come along at once!

A recent case involved a number of images found within a file entitled FileSystem_Nodes.MYD  on an Apple Snow Leopard boxThe indictment referred to each image by its File Offset and the date of the offence was particularised with an arbitrary date relating to the date of seizure.  The forensic investigator had not presented any further evidence relating directly to the images.  The path to FileSystem_Nodes.MYD was

~\Library\Caches\Adobe\Bridge CS3\Cache\data\BridgeStore\FileSystem_Nodes.MYD
and within the same folder were two other files of note FileSystem_Nodes.MYI and FileSystem_Nodes.frm.  As the path suggests these files are utilised by the Adobe Bridge CS3 program (later versions work differently).   Those readers who are familiar with MySQL databases will recognise that the .MYD, .MYI and .frm files are constituent parts of a MYISAM table within a MYSQL database.  MYD – my data, MYI – my index  and the .frm is the definition of the table.
Adobe Bridge CS3 stores thumbnail and preview images as Binary Large Objects (BLOBS) within a MySQL database.  These images can be seen in Figure 1 where the highlighted picture of a giraffe in the filmstrip at the bottom of the screenshot is a thumbnail picture. The larger representation of the same picture in the centre of the screenshot is a preview picture.
image
Figure 1
I was tasked with ascertaining more information about each picture located within the FileSystem_Nodes.MYD file.  So in order to do this 


  • I installed MySQL version 5.5.20 on a test Apple iMac running Snow Leopard (I would expect that a MS Windows version would work just as well).  

  • Once installed I made sure that the MySQL database server was not running (which if you have installed the Mac version of MySQL is easy to do because an option to toggle on or off is added to the Macs system preferences – on a Windows box I would probably install MySQL administrator and use that program to turn the db server on or off).  

  • Then I copied the BridgeStore folder into the default location for MySQL databases which was at the path Macintosh HD\usr\local\mysql-5.5.20-osx10.6-x86\data on my Mac and restarted the MySQL database server.

  • Using MySQL Administrator I could view some information relating to the BridgeStore database


 image
Figure 2


  • We now need to query the database and also be able to view and save out the relevant BLOBS.  To do this I used a utility called RazorSQL.  On the menu bar of this program there is a Connections option.  Select this and Add Connection Profile.  Work through the wizard, the bulk of the configuration can be seen in Figure 3
Screen shot 2012-02-08 at 13.16.25
Figure 3


  • When you are connected you can enter queries in the top right hand pane.

  • The query show columns from FileSystem_Nodes tells us that each record within the table has 52 fields as shown in Figure 4
Screen shot 2012-02-08 at 13.40.20
Figure 4


  • The fields of particular interest are displayPath, created, thumbnail and preview

  • Other useful SQL queries may include Select * from FileSystem_Nodes or Select id, displayPath, created from FileSystem_Nodes

  • To view the thumbnail or preview blobs execute a SQL query such as one of the examples above and in the results pane scroll across to the preview or thumbnail column.  Once there select a cell containing an ASCII representation of the binary blob data, right click and select Binary Data Editor, in the resulting window click on the View Image button.
In summary it was possible to establish the original path and date created for each picture carved from the FileSystem_Nodes.myd file using this method.
 


References




http://http://en.wikipedia.org/wiki/MyISAM


http://dev.mysql.com/doc/refman/5.5/en/myisam-storage-engine.html

Missing in action

No not me (although I have been missing for some time!) I’m talking about the registry key

HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings\Url History\DaysToKeep

For those of us looking at dead boxes this key can be found within each users NTUSER.dat registry file and is used to record how many days history Internet Explorer keeps for the particular user.  As many of us know the default is 20 days.  In a recent case the value of this key was pertinent and when I went to examine it on two separate boxes I found it was missing.


For the record one box was running IE8 on Windows 7, the other IE6 on XP.  I also checked out a test VM of mine running IE7 on Vista for good measure and the key was missing on that box also.


So after a little bit of testing I have established that this key is only created if the user visits the settings dialogue shown below in Figure 1.  If the user does not visit this dialogue since the browser’s installation (or for that matter after the deletion of the registry key) the registry key is not created.  Once this dialogue is visited the registry key is created whether or not the days to keep setting is actually changed.


HistorySettings


Figure 1


It has been suggested to me that in the absence of the registry key detailed above the registry key

HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\Internet Settings\Url History\DaysToKeep


takes over (found in the SOFTWARE registry file on a dead box).  In my testing this key could be edited to another value instead of the default 20 days and the days to keep listed in the settings in Figure 1 would remain at 20 days.  This obviously begs the question what is this key for.  As far as I can ascertain it is possible to set a group policy to enable a per machine (as opposed to a per user) setting and in that case this key takes effect.

Until next time.

Saturday, 2 July 2011

SQLite overflow pages and other loose ends...

This is the fourth post dealing with the elements making up SQLite databases and complements the previous three:
We will remember from these previous posts that:
  • The entire database file is divided into equally sized pages - SQLite database files always consist of an exact number of pages
  • The page size is always a power of two between 512 (29) and 65536 (216) bytes
  • All multibyte integer values are read big endian
  • The page size for a database file is determined by the 2 byte integer located at an offset of 16 bytes from the beginning of the database file
  • Pages are numbered beginning from 1, not 0
  • Therefore to navigate to a particular page when you have a page number you have to calculate the offset from the start of the database using the formula:
    offset = (page number-1) x page size
and that the database may have the following possible page types:
  • An index B-Tree internal node
  • An index B-Tree leaf node
  • A table B-Tree internal node
  • A table B-Tree leaf node
  • An overflow page
  • A freelist page
  • A pointer map page
  • The locking page
In this post I am going to take a closer look at Overflow pages.

Overflow pages are required when a record within a database requires more space than that available within a cell in one database page. One SQLite database of forensic interest is the Cache.db file maintained by the Apple Safari web browser. One of the tables within this database is entitled cfurl_cache_blob_data which uses the receiver_data field to store the cached item itself (e.g. cached jpgs, gifs, pngs, html et al ) as a BLOB. A BLOB is a Binary Large OBject. These cached objects often require overflow pages and we can demonstrate the mechanics of them by walking through a record within Cache.db.

If you run a file carver across a Cache.db file searching for pictures you are likely to carve out a number corrupt pictures as shown in the example within Figure 1.

Double click to enlarge
Figure 1

It can be seen that this picture starts at File offset 4583317 within Cache.db. By examining the two bytes at offset 16 within this SQLite database we have established that the database page size is 1024 bytes. The record that contains this picture has six fields as shown in Figure 2.

Double click to enlarge
Figure 2

As discussed in my earlier post An analysis of the record structure within SQLite databases the data making up this record is store in a serialised way (the data representing field 1 is immediately followed by the data representing field 2 and so on with no delimiters). It can be seen therefore that a cell storing a record within the cfurl_cache_blob_data table is almost bound to overflow the 1024 byte database page.
In our example our corrupt picture starts at FO 4583317. To calculate the database page it is stored in we divide the offset by the page size 4583317/1024=4475.8955078125 and round up to establish the page number. Our corrupt picture header is in page 4476.
The SQLite.org file format states that overflow pages are chained together using a singly linked list. The first 4 bytes of each overflow page is a big-endian unsigned integer value containing the page number of the next page in the list. The remaining usable database page space is available for record data.
We know that our picture is likely to be stored in a number of overflow pages and we can establish the next page by looking at the first 4 bytes of the page that is in. Using the formula offset = (page number-1) x page size I can calculate that the offset of these 4 bytes at the start of the page is 4475 x 1024=4582400. This offset can be seen in Figure 3.

Double click to enlarge
Figure 3

The 4 bytes in hex are 00 00 11 7D which decoded big endian is 4477. The next page therefore in the linked list is page 4477. The first 4 bytes of this page found at offset 4583424 in hex are 00 00 11 7E which decoded big endian is 4478. The first 4 bytes of this page found at offset 4584448 in hex are 00 00 11 7F which decoded big endian is 4479. The first four bytes of this page found at offset 4585472 are in hex 00 00 00 00. This value signifies the last page in the linked list.
I can be seen in this example that our corrupt picture starts in page 4476 and overflows into pages 4477, 4478 and 4479. Obviously the overflow pages are contiguous in this case, so in theory at least, if I copy the data from the jpg header of the corrupt picture to the jpg footer and edit out the 'corruption' I should end up with a complete picture. The corruption was likely caused by the overflow page values at the start of each page so using a hex editor I can remove these and hey presto:











Essentially what we have done here is start in the middle of the record and work forwards to the end. Because overflow pages are chained together using a linked list this is relatively straightforward.
But what do we do if we want to locate the earlier pages in the record? This is a little more complicated because each overflow page does not contain the page number of the previous page. The Safari cache.db SQLite 3 database is an auto-vacuum database so we could utilise Pointer Map pages to locate the parent page of the page (4476) where our corrupt picture header is stored. You will recall from my previous post that Pointer Map pages store a 5 byte record relating to every page that follows the Pointer Map page. Pointer Map pages found in Safari Cache.db files will have a lot of entries that relate to overflow pages. The 5 byte records are structured with 1 byte indicating a Page Type and then 4 bytes, decoded big endian, referencing the parent page number as follows:
  • 0x01 0x00 0x00 0x00 0x00
    This record relates to a B-tree root page which obviously does not have a parent page, hence the page number being indicated as zero.
  • 0x02 0x00 0x00 0x00 0x00
    This record relates to a free page, which also does not have a parent page.
  • 0x03 0xVV 0xVV 0xVV 0xVV (where VV indicates a variable)
    This record relates to the first page in an overflow chain. The parent page number is the number of the B-Tree page containing the B-Tree cell to which the overflow chain belongs.
  • 0x04 0xVV 0xVV 0xVV 0xVV (where VV indicates a variable)
    This record relates to a page that is part of an overflow chain, but not the first page in that chain. The parent page number is the number of the previous page in the overflow chain linked-list.
  • 0x05 0xVV 0xVV 0xVV 0xVV (where VV indicates a variable)
    This record relates to a page that is part of a table or index B-Tree structure, and is not an overflow page or root page. The parent page number is the number of the page containing the parent tree node in the B-Tree structure.
We can expect to see a lot of Page Types 0x03 and 0x04. So how do we find the pointer map page? We know that the Pointer Map page may contain up to (database page size/5) records (rounded down if necessary) - in this case 1024/5=204.8 so there are 204 records in each Pointer Map page. The first Pointer Map page is Page 2. This is followed by 204 pages and then another Pointer Map page, page 207, followed by 204 pages and then another Pointer Map page, page 412 and so on. In other words there is a Pointer Map page every 205th page, starting page 2. In our example we know that our corrupt picture header is in database page 4476 and the applicable Pointer Map page is prior to it. To calculate the applicable Pointer Map page number we divide 4476 by 205 = 21.834146341463415, round down to 21 and multiply by 205 and then add 2 which equals 4307. The applicable Pointer Map page for page 4476 of the database is page 4307. Using the formula offset = (page number-1) x page size I can calculate that the offset to this page is 4409344. This page can be seen in Figure 4. Each Page Type flag where it references an Overflow page is bookmarked in green, other Page Types are in blue. The first 5 byte record relates to database page 4308, the second 5 byte record page 4309 and so on. The record for page 4476 is the 169th record on the Pointer Map page (4476-4307).

Double click to enlarge
Figure 4

It can be seen that the 169th record is in hex 04 00 00 11 7B. This record has a flag 0x04 which indicates that this record relates to a page that is part of an overflow chain, but not the first page in that chain. The parent page number is 00 00 11 7B decoded big endian to page 4475. The record for page 4475 is the 168th record on the page 04 00 00 11 7A. This record also indicates that the page is part of an overflow chain but not the first page. The parent page number is 00 00 11 7A decoded big endian to page 4474. The record for page 4474 is the 167th record on the page 03 00 00 11 6E. This record has a flag 0x03 which indicates that this record relates to the first page in an overflow chain. The parent page number is the number of the B-Tree page containing the B-Tree cell to which the overflow chain belongs. The parent page number is 00 00 11 6E decoded big endian to page 4462.
Using the formula offset = (page number-1) x page size I can calculate that the offset to the beginning of this page is 4568064. We can now decode the page header (detailed more fully in the post An analysis of the record structure within SQLite databases ) shown in Figure 5.

Double click to enlarge
Figure 5

Figure 5 shows the page header of the B-tree leaf node page. The first byte 0D is a flag indicating the page is a table B-tree leaf node. The next two bytes 00 00 indicate that there are no free blocks within the page. The next two bytes 00 03 read big endian indicate that there are three cells stored on the page. The next two bytes at offset 5 within the page header 00 EB decoded big endian give a value of 235 which is the byte offset of the first byte of the cell content area relative to the start of the page. The last byte of the eight byte page header 00 is used to indicate the number of fragmented free bytes on the page, in this case there are none. The remaining highlighted three pairs of bytes 02 60, 01 F1 and 00 EB are the cell pointer array for this page. These three values are offsets to the start of each cell when decoded big endian are 608, 497 and 235 respectively. We will focus on the cell at offset 235. At offset 235 we find two Varints representing the Length of Payload and Row ID (see Figure 6).

Double click to enlarge
Figure 6

The varints are B1 66 and 82 11. The calculation needed to decode them follows in Figure 7:

Double click to enlarge
Figure 7

Following the Length of Payload and Row ID are varints representing the Length of the Payload Header and the serial type codes of the entry_ID, response_object, request_object, receiver_data, proto_props and user_info fields respectively as shown in Figure 8:

Double click to enlarge
Figure 8

Figure 8 shows highlighted in blue and green the first three elements of the Cell make up - the Payload Length, the Row ID and the Payload Header. We have already decoded the Payload Length B1 66 and the Row ID 82 11. The next byte h0A denotes the length of the Payload Header which is in this case 10 bytes (including the Payload Header Length byte). It can be seen therefore that the remaining 9 bytes contain the varints 00, 9B 54, 96 3E, B1 4A, 00 and 00. To determine what each varint indicates we have to consult the Serial Type Code chart detailed in the post An analysis of the record structure within SQLite databases . Each Serial Type Code details the type and length of the data in the payload that follows the payload header.
  • 00 This serial type code indicates that the first field is NULL and the content length is 0 bytes. We know that the first field in our record relates to Row ID however the SQLite.org file format states If a database table column is declared as an INTEGER PRIMARY KEY, then it is an alias for the rowid field, which is stored as the table B-Tree key value. Instead of duplicating the integer value in the associated record, the record field associated with the INTEGER PRIMARY KEY column is always set to an SQL NULL.
  • 9B 54 This serial type code has a value of 3540 which is greater than 12 and an even number. The chart indicates therefore that this field is a BLOB (3540-12)/2 bytes in length [1764 bytes]
  • 96 3E This serial type code has a value of 2878 which is greater than 12 and an even number. The chart indicates therefore that this field is a BLOB (2878-12)/2 bytes in length [1433 bytes]
  • B1 4A This serial type code has a value of 6346 which is greater than 12 and an even number. The chart indicates therefore that this field is a BLOB (6346-12)/2 bytes in length [3167 bytes]
  • 00 This serial type code indicates that the field is NULL
  • 00 This serial type code indicates that the field is NULL
The serial type code for the response_object indicates that this field is a BLOB 1764 bytes in length. The entire database page would not be big enough to store this BLOB and the cell is even less capable. Figure 9 shows each cell highlighted alternately blue and green:

Double click to enlarge
Figure 9

The first blue shaded area is the cell the elements of which are duscussed above. The last four bytes of this cell highlighted in darker blue is the hex value 00 00 11 7A which when decoded big endian gives the value 4474. This is the page number of the first Overflow page for this cell and is consistent with the information found in the Pointer Map page discussed above.

Next Post
Following on from my earlier SQLite blog posts James Crabtree has been kind enough to code a Varint decoder and Alex Caithness of CCL has supplied me with his fully featured SQLite record recovery tool EPILOG. I'll review this software next time. Thanks to James and CCL.

References
http://www.sqlite.org/fileformat.html
http://www.sqlite.org/fileformat2.html

Tuesday, 10 May 2011

An analysis of the record structure within SQLite databases

My two previous posts Carving SQLite databases from unallocated clusters and SQLite Pointer Maps pages looked at the structure of SQLite databases as a whole. Information contained in those posts may hopefully facilitate the carving of complete SQLite databases. This post is aimed at examining the potential of carving individual records within an SQLite database but should be read in conjunction with the Carving SQLite databases from unallocated clusters post.

Background
Carving individual SQLite database records in certain circumstances may be more fruitful than carving whole databases. There are in fact a number of applications that do exactly this for some types of SQLite database. For example Firefox 3 History Recovery (FF3HR) is an application written to recover Firefox records. A paper entitled Forensic analysis of the Firefox 3 Internet history and recovery of deleted SQLite records written by Murilo Tito Pereira also deals with the recovery of Firefox records.
SQLite databases can be considered as a mini file system in their own right. Within this file system are areas that are marked as free that may contain deleted data. Record based recovery may help identify records that have been deleted but are still contained within the parent SQLite database. More obviously record based recovery is indicated where only deleted and partially overwritten databases are available. However for record based recovery to be useful the data you wish to recover must be stored within one table within the SQLite database concerned. If it is necessary to query two or more tables to extract useful data record based recovery is probably not going to be appropriate.

Table Record

Figure 1


Figure 1 shows, as viewed with the SQLite Database Browser software, a record within the Google Chrome History file URL table at row ID 608. This record is stored within the Google Chrome History SQLite database within a B-tree table leaf node in an area known as a cell. It can be seen from the column headers that this record consists of an id, a url, a title, a visit_count, a typed_count, a last_visit_time, a hidden flag and lastly a favicon _id.  To aid viewing I will repeat the record data below:
  • ID
    608
  • URL
    http://www.sqlite.org/fileformat2.html
  • title
    File Format For SQLite Databases
  • visit_count
    1
  • typed_count
    0
  • last_visit_time
    12949409092779476
  • hidden
    0
  • favicon_id
    46

The urls table is stored within one table B-tree which will consist of a root page and possibly a number of internal and leaf node pages.  I have established that the data representing the record detailed above is stored in a cell that exists within a B-tree table leaf node database page.

Cells
Lets recap some of the information dealt with in the earlier post Carving SQLite databases from unallocated clusters.  SQLite databases are divided into equal sized pages, the size of which is detailed in two bytes, decoded as a 16 bit integer big endian, at offset 16 of the database file within the database header.  Most of an SQLite database consists of B-tree structures consisting of one or more B-tree pages.  Each B-tree page has either an 8 or 12 byte page header (depending on whether it is a leaf or internal node).
              Figure 2

















As can be seen in Figure 2 the cells tend to be at the end of each database page in an area referred to as the cell content area.  These cells are used to store the database records, one record per cell.   The first cell to be written in a database page is stored at the end of the page and additional cells work back towards the start of the page.

The number of cells and their location within a database page is stored within the B-Tree page header at the following offsets.
  • Offset 1 2 bytes 16 bit integer read big endian   
  • Byte offset of first block of free space on this page (0 if no free blocks)
  • Offset 3 2 bytes 16 bit integer read big endian
  • Number of entries (cells) on the page
  • Offset 5 2 bytes 16 bit integer read big endian
  • Byte offset of the first byte of the cell content area relative to the start of the page. If this value is zero, then it should be interpreted as 65536
Figure 3  Page Header and Cell Pointer array

Figure 3 shows the page header of the B-tree leaf node page that contains the record detailed in Figure 1 above.  The first byte 0D is a flag indicating the page is a table B-tree leaf node. The next two bytes 00 00 indicate that there are no free blocks within the page. The next two bytes 00 03 read big endian indicate that there are three cells stored on the page.  The next two bytes at offset 5 within the page header 0B 8E decoded big endian give a value of 2958 which is the byte offset of the first byte of the cell content area relative to the start of the page.  The last byte of the eight byte page header 00 is used to indicate the number of fragmented free bytes on the page, in this case there are none.

The remaining highlighted three pairs of bytes 0C 44,  0B EC and 0B 8E are the cell pointer array for this page. The SQLite.org file format notes helpfully state that the cell pointer array of a b-tree page immediately follows the b-tree page header.  Let K be the number of cells on the b-tree. The cell pointer array consists of K 2-byte integer offsets to the cell contents. The cell pointers are arranged in key order with left-most cell (the cell with the smallest key) first and the right-most cell (the cell with the largest key) last.  The key value referred to is the row ID.  In this case we have three cells and therefore three offsets which when decoded big endian are 3140, 3052 and 2958.   These offsets allow us to find the start of each cell, it is worth pointing out that there may be free blocks or fragments between each cell so we can not use the offsets to determine the length of each cell.

The record detailed in Figure 1 is contained within the cell at offset 2958 within the page.  We will decode the contents of this cell but first we better look at the make up of a cell.

Figure 4 Cell make up
Figure 4 indicates four areas of interest.  The payload is the data forming the record as detailed in this example in Figure 1 and as suggested in the diagram it is stored in a serialized way with all the relevant data concatenated together.  The payload header details how we can identify each field within the concatenated data (see the Payload Header section below for details of how this works).  The Row ID number and the Payload length are stored using variable length integers known as varints.  To successfully decode the Cell and Payload headers we have to be able to decode a varint.


Varint
http://www.sqlite.org/fileformat2.html and http://www.sqlite.org/fileformat.html#varint_format provides some detail in respect to how varints are structured.  I will try here to simplify things and provide a few example decodings when we decode the cell relating to the record detailed at figure 1.

  • Varints are variable length integers between 1 and 9 bytes in length depending on the value stored
  • They are a static Huffman encoding of 64-bit twos-complement integers that uses less space for small positive values
  • Where the most significant bit of byte 1 is set this indicates that byte 2 is required, where the most significant bit of byte 2 is set this indicates that byte 3 is required, and so on
  • Varints are big-endian: bits taken from the earlier byte of the varint are the more significant than bits taken from the later bytes
  • Seven bits are used from each of the first eight bytes present, and, if present, all eight from the final ninth byte
Figure 5
Figure 5 shows the beginning of the cell at offset 2958 within the page.  As shown in figure 4 the first value is the payload length represented by a varint.  The first byte is 5B.  We have to establish the value of the most significant bit and this can be done by converting the hex 5B to binary:

It can be seen that in this case the most significant bit is zero and therefore not set.  This varint is only one byte long and indicates that the payload length is 91 bytes.  The payload length is the length in bytes of both the payload header and the payload.

The next byte is  84.  Converting this to binary:

The most significant bit here has the value of one and therefore is set.  This indicates that this varint includes, at least, the next byte 60 which converted to binary:

It can be seen that the most significant bit is zero and therefore not set.  This byte therefore is not followed by another and is the last byte of this varint.  To establish the value of this varint we now have to take the least significant 7 bits of each of the two contributing bytes and concatenate them together:

00001001100000

We discard the leading zeros and convert the binary 1001100000 to decimal, giving a value of 608.  This  varint represents the row ID and we can see in figure 1 that the row ID is confirmed as 608.

The calculation we have carried out can be represented by a formula.   If we say that the value of the varint is N and the unsigned integer value of the first byte is x and the unsigned integer value of the second byte is y we can use the formula:

N =  (x-128) x 128 + y

If we substitute the value of our unsigned integers 132 and 96 into the formula:

(132-128) x 128 + 96 = 608

This formula works for two byte varints that can represent a maximum value of 16383.  I suspect we are not likely to encounter larger varints in the SQLite databases we have an interest in with the possible exception of SQLite databases used to store browser cache.  It is also worth noting that the most significant bit if included and allowed to contribute to the unsigned integer value would have a value of 128 (hence the [x-128]).  Therefore if the first byte of a varint is less than 128 you can exclude the possibility of there being a second byte in the varint.

Payload Header and Payload
We have already looked at two of the four areas of interest within a cell, the payload length and row ID. Next up is the Payload Header and Payload.

Figure 6  Payload Header make up
Figure 6 shows the make up of the payload header of a record within the URLs table of the Google Chrome History SQLite database.  The payload is the data forming the record stored in a serialized way with all the relevant data concatenated together.  The payload header details how we can identify each field within the concatenated data and will vary from table to table, the contents of which is dictated by the fields required in each record. All payload headers will have however a Payload Header Length followed by one or more Serial Type Codes.  The Serial Type Code is used to denote the type of data found in a field within the payload and it's length. All possible Serial Type Codes are varints and are detailed in a chart provided by SQLite.org at Figure 7:

Figure 7
Lets have a look at the  Payload Header of our example record detailed in Figure 1.

Figure 8

Figure 8 shows highlighted in blue and green the first three elements of the Cell make up shown in Figure 4 - the Payload Length, the Row ID and the Payload Header.  We have already decoded the Payload Length 5B and the Row ID 84 60.  The next byte h09 denotes the length of the Payload Header which is in this case 9 bytes (including the Payload Header Length byte). It can be seen therefore that the remaining 8 bytes shown in hex are 00594D01010601 and 01. These bytes represent varints so we have to consider that a value may be represented by more than one byte, however in this case the unsigned integer value of each byte is less than 128.  We can conclude therefore that each varint is only a single byte in length.  To determine what each varint indicates we have to consult the Serial Type Code chart shown at figure 7.  Each Serial Type Code details the type and length of the data in the payload that follows the payload header.  The multi byte integers are decoded big endian.
  • 00  This serial type code indicates that the first field is NULL and the content length is 0 bytes.  We know that the first field in our record relates to Row ID (see figure 1) however the SQLite.org file format states If a database table column is declared as an INTEGER PRIMARY KEY, then it is an alias for the rowid field, which is stored as the table B-Tree key value. Instead of duplicating the integer value in the associated record, the record field associated with the INTEGER PRIMARY KEY column is always set to an SQL NULL.
  • 59  This serial type code has a value of 89 which is greater than 13 and an odd number.  The chart indicates therefore that this field is a text string (89-13)/2 bytes in length [38 bytes]
  • 4D  This serial type code has a value of 77 which is greater than 13 and an odd number.  The chart indicates therefore that this field is a text string (77-13)/2 bytes in length [32 bytes]
  • 01  This serial type code has a value of 1 indicating the next field is an 8 bit integer using 1 byte
  • 01  This serial type code has a value of 1 indicating the next field is an 8 bit integer using 1 byte
  • 06  This serial type code has a value of 6 indicating the next field is an 64 bit integer using 8 bytes
  • 01  This serial type code has a value of 1 indicating the next field is an 8 bit integer using 1 byte
  • 01  This serial type code has a value of 1 indicating the next field is an 8 bit integer using 1 byte
It can be seen that our payload is 82 bytes in length (38+32+1+1+8+1+1).  The payload header was 9 bytes and therefore the overall payload length is 91 (82+9) bytes, as previously calculated, and represented by the byte 5B.

Figure 9

Figure 9 shows each element of the payload highlighted alternately in green and blue.  The first element is http://www.sqlite.org/fileformat2.html 38 bytes in length, the next element is File Format For SQLite Databases 32 bytes in length.  The next element is represented by the byte 01 which denotes the visit_count of 1.  This is followed by the byte 00 denoting the typed_count of 0.  Next are the eight bytes 00 2E 01 6B 41 06 BD D4 decoded as a 64 bit integer big endian giving a value of 12949409092779476, the last_visit_time (stored in the Google format).  The next byte is 00, the hidden flag, followed lastly by 2E decoded as 46, the favicon_ID.  The next record in this case immediately follows at offset 3052 within the database page.

Notes
I have glossed over some possible combinations of data found in stored records in order to try and simplify things a little.  It is possible for a record to require more space than the space available in a cell within one database page.  In this eventuality pages known as overflow pages come into play.  I will leave any commentary on this to another day :-)

Carving Considerations
It can be seen that each record of the Google Chrome History URL table may vary in content and length.  This precludes simple carving of records using known headers.  It may be possible to define a scheme to assist with carving however by focussing on the parameters of each element of the record.  It is clear that for the Google Chrome History URL table the scheme would be fairly complicated, allowing for very large URLs and Page titles which may well induce many false positives.  For databases using a simpler record structure things are a bit easier.  A presentation presented by Alex Caithness of CCL details an approach that can be adopted for carving iPhone calls.db databases.

Deleted Data within Live Databases
This area will require another blog post on another day!  I am aware of two programs  possibly written to recover this deleted data. SQL Undeleter from Chirashi Security and Epilog from CCL.  If the developers will let me test these programs out I will report the results to you.


References
http://www.sqlite.org/fileformat.html
http://www.sqlite.org/fileformat2.html
http://www.ccl-forensics.com/images/f3%20presentation3.pdf
http://mobileforensics.wordpress.com/2011/04/30/sqlite-records/