Monday, January 16, 2012

Timeline Analysis: The Hybird Approach

Harlan Carvey recently blogged about approaches to conduct Timeline Analysis:

"So, anyway...I've been thinking about some of the things that I put into pretty much all of my timeline analysis presentations.  When it comes to creating timelines, IMHO there are essentially two "camps", or approaches.  One is what I call the "kitchen sink" approach, which is basically, "Give me everything and let me do the analysis."  The other is what I call the "layered" or "overlay" approach, in which the analyst is familiar with the system being analyzed and adds successive "layers" to the timeline.  When I had a chance to chat with Chad Tilbury at PFIC 2011, he recommended a hybrid of the two approaches...get everything, and then view the data a layer at a time, using something he referred to as a "zoom" capability.  This is something I think is completely within reach...but I digress."

I very much agree with the various approaches outlined above and their respective descriptions. Well put, Harlan and Chad Tilbury.

Over the years I have observed the traditional "kitchen sink" approach evolve into a "layered - overlay" approach. Fundamentally this has been the building blocks of timeline analysis. Harlan, Rob Lee, and the Sleuth kit have been primary drivers of this transformation with contributions such as "regtime.pl", "mac_daddy", and "fls". These contributions have allowed us to take the "kitchen sink", a entire hard drive image, and break it up into different "layers". Each layer representing a specific artifact type such as registry or file system.

What I appreciate about the "layered - overlay" approach is that it is a effective method of "removing the noise". This is my way of saying, hone in on specific areas of interest. In contrast, the "kitchen sink" approach can result in overwhelming volumes of data that can easily lead to distraction.

For example, if I'm only interested in reviewing USB connections, there are specific "data points" that I only need to look at. In such, I would only apply relevant layers of data points to my timeline (i.e. registry, setupapi.log) to identify the connections. Then if needed, I could double check my results by adding a third layer into the timeline, ".evtx" files (event logs in win7 logs usb connections system) which should essentially overlay my existing USB connections and confirm my results.

Perhaps, I then wanted to see if there was any ".lnk" files created on the hard drive image to show files being accessed from the USB device during the date/time of a USB connection. Subsequently, a fourth layer, file system activity could be added to the timeline for review and quickly filtered by ".lnk" files. In summary, this fundamental process of building a timeline is the concept of the "layered - overlay" approach.

Adobe Photoshop (a graphic design application) is a good example of putting this concept to use. For anyone not familiar with the product (pictured to the right), multiple layers are used to represent and control each part of a image; background, shading/coloring, objects, etc. All of the individual layers merged together (overlayed) make up the "entire picture."

However, as Harlan alluded to, not using the "kitchen sink" approach will dilute visibility into the context of specific artifacts -- limiting your analysis to specific layers instead of looking at the "entire picture." :

"the more data we have, the more context there is likely to be available.  After all, a file modification can be pretty meaningless, in and of itself...but if you are able to see other events going on "nearby", you'll begin to see what events led up to and occurred immediately following the file modification."

So how does Dav Nads' combine the best of the two approaches into one - the Hybird Approach?


To my knowledge there's no "out of box solution" or "push button" solution for this. It's a process of using multiple tools and applications. It's a manual process but comprehensive process. The process like all processes should be is constantly redefining to adapt to technology and needs..


It all starts out with owning a lot of real estate, 2 x 24" monitors :-) Having tall and wide monitors is key for any type of timeline analysis. It allows you to see more data (and context) at one glance and increases efficiently by reducing clicking n' scrolling.

I use one monitor to display the timeline data output  from log2timeline-sift in SPLUNK. This process is described in detail by Klein&Co. Why do I use SPLUNK to display my log2timeline-sift output?
  • Running log2timeline-sift on a 120GB hard drive image can easily result in a 2-3 GB of output. Not to mention, try running log2timeline on a 500 GB hard drive image. Microsoft Excel ain't going to work to review all of your data. It has limitations, period. 
  • Sure you can use "l2l_process" to cull your resulting output from log2timeline down by criteria such as date-range, but this still does not guarantee your resulting output will be a manageable volume. It also takes away context and makes the process of building timeline a iterative process if you need to adjust later on.
  • Most people know enough Python, SQL, GREP or PERL to be dangerous but not productive. Therefore, having a GUI based platform similar to Excel tends to be a preference when reviewing timeline data.
  • SPLUNK indexes timeline data, providing the ability to search, filter, and sort data on the fly. It's also scalable, in the sense it's a enterprise tool that is designed to work with GBs of data. With the click of a button I can easily refine my timeline to only show certain data types. Note, DAV NADS does not work for SPLUNK, it's just the the best solution I have found.
Harlan raises an excellent point, "That leads me to this question...if you're running a tool that someone else designed and put together, and you're just pushing a button or launching a command, how do you know that the tool got everything?  How do you know that what you're looking at in the output of the tool is, in fact, everything?"

If I were to rely solely on the using the output of log2timeline and SPLUNK as a review tool for my analysis, that would be a issue for 2 reasons:

First , let's be honest regardless of what tool used (commercial or open source) they all have or had at one point BUGS. Just as recently as a week ago, a bug in log2timeline was identified on the win4n6 list and was subsequently fixed.

Secondly, timelines are what I like to refer to as skeletons. They do not show you the meat on the bones. Reviewing timeline data may reveal that "Top Secret - Receipt for Coke.docx" was created and opened. However the limitation with timeline data is, you can't view the document. That's when the second monitor comes into the picture...

I use the second monitor to display the hard drive image in a Forensic tool (Encase, FTK, etc). This allows me now to take a look at "Top Secret - Receipt for Coke.docx" and see that it's just a document that discusses how Coke's secret formula is now on exhibit in World of Coca-Cola in Atlanta! This also allows me to potentially see anything that may be in context of this event that is not displayed in my timeline as a layer.

Leveraging a second tool simultaneously to view the data from a different perspective allows me to also double check and verify findings. For instance, if I see that how_to_kill_the_dog.doc was created on January 1, 2013 in my timeline data, I can quickly check to see if I'm seeing the same thing from my forensic tool or if this is a odd anomaly and potentially a issue with my timeline.

From my experience, the Hybrid timeline analysis approach is really finding synergy between the "full kitchen" and "layered - overlay" approaches. The important thing to understand to sucessfully deploy this approach is the strengths and weakness of the tools you use. For instance, identifying the difference between timeline data (output from log2timeline or wherever) that may only contain X where the full disk image contains Z and empowering a processing to fill these gaps. This allows you to develop a Hybird approach, like I described above that fits your needs.


- DAV NADS,  tweetin' @DAVNADS tweet at me cyber girls!



Wednesday, January 4, 2012

Thank you to all of my #DFIR followers. Hope everyone had a great New Years. Let 2012 bring many dongles, matching hashes, and cold blowing CPU fans to everyone!

-DAV NADS






Sunday, December 18, 2011

Monday, November 28, 2011

Extending Reg Ripper, again.

A few months ago I posted how to automate the process of reporting all date/time instances a USB connection was made (including from Restore Points), using a combination of Mount Image Pro, SubInACL.exe, Reg Ripper, and some batch script Kung Foo. For one engagement, the scope was 50 + hard drives. Exercising this process reduced analysis time from hours to minutes per hard drive and translated into a significant time and cost savings to the client.

Recently, I received 50 + SYSTEM registry hives from various host systems. Note, due to special circumstances only the SYSTEM hives were provided -- fyi -- there are other artifacts that log USB connections. All hives where preserved in Logical Evidence File (L01s) format. Using Encase I took a look at the L01 files. Based on full path information of the SYSTEM registry hives collected, it appeared they were from both active and Restore Point locations.

For this engagement I needed to report all date/time instances a USB connection was made based on the SYSTEM registry hives provided...

Since I was dealing with hives from various hosts within the L01s-- the only thing segregating them was the directory structure  (full path information) they were preserved in. It would be key to preserve this same full path information for each hive in whatever output/report created. This would allow one to tie a Hive back to a specific host later on.

Therefore, it was time to put my thinking cap on. Below is the list of options I came up with:
  1. Manually parse out the Hives.
  2. Run the Encase Advanced Enscript USB parser, but that outputs into a messy log file that is not delimited. Experience also tells me it can be hit or miss.
  3. Export the Hives and run Reg Ripper on each of them one by one, manually building a report as I go.
  4. Build a Reg Ripper batch script, but this would not preserve the file name and full path source of the hive in the output.
  5. Script that sh!@t!!
I like being challenged so scripting that sh!@t using Python sounded trivial. Note, as I stated in my post about using Python to automate the process of creating folder structures, my coding skillz are script kiddie at best so please no LuLzing.

The requirements of the tool needed to be:
  • Recursively walk through a directory structure (using Encase I exported all L01's preserving folder paths to a case folder).
  • Identify any "SYSTEM" or "_REGISTRY_MACHINE_SYSTEM" registry hives.
  • For each Hive it finds:
    • Append File name to processing audit log
    • Run Reg Ripper against it with specific plug in ( USBSTOR3 to show me all USB connections)
    • Import Reg Ripper output into Python memory based list/db
    • For each line imported, append full path of original hive parsed (for audit purposes -- will allow me to tie a hive back to it's original source later).
  • Export CSV report for all hive files found.
Below is the pretty Python code I compiled. For fun I’m going to try to add some error handling, convert to OO, and port into an Executable. For now, all I can say is it works and saved me a ton of manual effort/time.

import os, fnmatch, csv
a = []

def find_files(directory, pattern): #Recursively walk directory path for files
    print 'Recursively search directory for SYSTEM hives..'
    for root, dirs, files in os.walk(directory):
        for basename in files:
            if fnmatch.fnmatch(basename, pattern):
                filename = os.path.join(root, basename)
                yield filename

for filename in find_files('C:\directory_structure_to_search)' , '*SYSTEM'):  #Define dir path and hive type to look for
    print 'Found Hive:', filename
    print 'Ripping...'
    os.system('""C:\\Program Files (x86)\\RegRipper032911\\rip.exe " -r "' + filename + '" -p usbstor3> c:\\final.csv"')
    print 'Done Ripping.'
    print 'Processing Output...'

    with open('c:\\final.csv', 'r+') as f: #Import RegRipper output into list
        writer = csv.writer(f)
        reader = csv.reader(f)
        for row in reader:
            row.append(filename)
            a.append(row)
           
    log = open('c:\\log.txt', 'r+') #Append each processed file to log output
    log.writelines(filename + '\n')   

output = open('c:\\output.csv', 'r+') #print 'Writing output CSV'
wr = csv.writer(output)
for i in a:
    print i
    wr.writerow(i)
print a
output.close
print 'Done'
exit

Truly,  
Dav Nads



Sunday, November 13, 2011

Intellectual Property (IP) Theft and Technology 1o1o1o1

I'm working on a paper on High Tech Intellectual Property Theft so I thought I would share some food for thought!

According to Wikipedia (whatev that's worth), Intellectual Property (IP) is a term referring to a number of distinct types of creations of the mind for which a set of exclusive rights are recognized—and the corresponding fields of law and theft is the illegal taking of another person's property without that person's freely-given consent.

Do the math, IP + Theft is a equation for stealing s$% you shouldn't!! If you add technology as a variable into this equation, stealing $#% can get super geeky. For instance, a employee can copy the text from a document containing the recipe for Coke onto a website called pastebin.com. This is a website where you can freely copy and paste text making it accessible to the world with just a few clicks. It is a convenient and "virtually untraceable" platform for people to share large amounts of text. The website has been traditionally used by programmers to store source code but also more recently used by HaX0r groups like Anonymous, 4chan, and LulzSec to post their pirated caches and booties.

Methods of IP theft are becoming more advanced and mutually difficult to detect. Traditional methods of detection (i.e. usb connection analysis, print spool files, e-mail, etc.) are not going to CUT it in some cases. I used one example of a insider COPYING and PASTING IP out of a network, but their are many other advanced methods such as transferring data from a laptop to a mobile device in someone's pocket via ad-hoc networking, to installing mobile malware/spyware software on a VIP.

However, traditional methods of IP theft may not be as advanced but just as difficult (if not more difficult) to detect. For instance, taking pictures of IP with a camera phone or calling a partner and communicating IP over a phone. In these cases it's more important to be aware of these methods and put governance and policies in place to prevent so your NOT responding to the "perfect crime".

Let's also not forget about how the most simple digital crime can become ah so difficult. For instance, a terminated user transferred documents from a computer to a USB storage device a week before they resign. During that week, a Windows Update is also run and all USB last connection date/time information in the active registry are unfortunately updated. Now you, as a forensic examiner are challenged to think outside of the box and look elsewhere ;-)

Below is a collaborative (thank you unnamed co-worker) brain dump of potential methods of IP Theft. Note, some of these methods may leave little to NO forensic residue - the emphasis of the paper I'm writing is identification and detection from a Computer Forensic purpose. The purpose of this list is to promote awareness and hopefully assist with your due diligence or your next IP Theft investigation .
  1. Personal e-mail account usage (i.e. user logs into personal e-mail account via web mail and attaches documents or copies text to e-mail message).
  2. Instant Messaging software such as AIM, MSN, Yahoo, Gtalk, or ICQ (i.e. transfer text or attachment over instant messaging conversation)
  3. Internet activity to online storage tools, file sharing services, social media platforms, and public/private forums (i.e. upload documents to online storage service or copy text to website such as pastebin.com).
  4. Access to network resources such as file servers (i.e. copy documents from file server directly to USB device) without subsequently accessing it.
  5. Network connectivity to private networks via Bluetooth, wifi, or remote access to transfer data (i.e. computer transfers documents to another computer via Bluetooth network).
  6. Removable storage device (i.e. user copies data to thumb drive or external hard drive). Keep in mind removable storage devices do not not always get tracked comprehensively (i.e. O/S update occurs that updates all USB last connection date/time information in registry).
  7. Screen capture applications run from removable devices to minimize forensic residue (i.e. run screen recording tool from USB drive).
  8. Use of non-standard applications/protocols such as VPN, FTP, SFTP, P2P, SHH (i.e. Use FTP application to transfer data to remote server).
  9. Copy data to device that be configured as USB storage device such as mobile phone or music player (i.e. copy data via USB to iPhone or iPod).
  10. Bypassing the operating system by booting the system into a bootable disk to copy data to an external drive (i.e. anti-forensic or forensic software such as Helix or Knoppix).
  11. Traditional forensic and IT methods of cloning hard drives (i.e. extract hard drive from system and use forensic software/hardware to copy/clone data).
  12. Host and Mobile device based Spyware/Malware
  13. Other "low tech" methods of exfiltrating data include:
    1. Taking hard copy documents or electronic devices,
    2. Photography or video,
    3. printing,
    4. scanning,
    5. use of unknown devices,
    6. making a phone call and communicating the IP. 

Stay tuned.. I will be posting some more forensication soon.

-Dav Nads

Monday, November 7, 2011

Reminiscing about my CEIC 2010 video competition entry

In 2010, Guidance Software hosted a video competition for 2 free passes to their CEIC conference. We did not win because apparently it was not appropriate.I still went anyways, but reminiscing about our great video!

Wednesday, August 24, 2011

Debian GNU/Linux Postfix Server Incident - p'owned?

Reason to believe a server was compromised and it's a physical Debian GNU/Linux mail server in a production environment?  ..Sounds like fun!

Below is a short list of items to consider when responding to a incident. This is from a technical perspective and by no means a work plan for a comprehensive investigation.

If you haven't already, try to get a physical or logical image of the device. If the server can't be turned off to acquire physically, consider acquiring the logical partitions live:

1.    Attach USB
2.    mkdir /m1
3.    mount /dev/sdb1 /m1 # Substitute /dev/sdb1 for your USB device’s partition, fdisk –l helps
4.    dd if=/dev/sda1 of=/m1/my_image.img # this cmd is very basic and will dd the partition to the USB disk. If it uses logical volume manager, copy the logical partition as reconstructing the raid/lvm later could be an issue.

Identify all logs that could contain potential evidence related to the intrusion. Logs are going to be one of the key points of analysis in Linux based investigations. To that point, don't forget to inquiry about log retention polices and procedures during your scoping. For instance, are logs from the target server collected using a SIM, backed up to tape, or maybe logging is not even enabled? A good analogy is, make sure to account for ("or eat") all the crumbs that may be surrounding the cookie.

Here is a short list:

1.    /var/log/secure
2.    /var/log/secure.*
3.    /var/log/messages
4.    /var/log/messages.*
5.    /var/log/wtmp
6.    /var/log/wtmp.*
7.    /var/log/btmp
8.    /var/log/btmp.*
9.    /var/log/mail.log
10.    /var/log/mail.log.*
11.    /var/log/apache
12.    /var/log/auth.log
13.    /var/spool/
14.    Check syslog configuration (/etc/syslog.conf typically) and see if additional log files are stored
15.    If the machine is behind a firewall, check firewall (machine/appliance)logs.