Extracting Data with Bulk Extractor

When it comes to forensics, styles and methodologies may vary from person to person (or organization). Some methods take longer than others and results may vary. One tool/ technique that I lean to time and time again is using Bulk Extractor. Bulk Extractor is a program that enables you to extract key information from digital media. Its usage is valuable no matter the type of case you may be working. A list of the type of information it can extract is depicted on their webpage at https://github.com/simsong/bulk_extractor/wiki/Testing.

There is a Windows and Linux variant of the program both capable of running from the command line or GUI. It is 4-8 times faster than other tools like EnCase or FTK due to its multi-threading. The program is capable of handling image files, raw devices, or directories. After it completes, it outputs its findings to an .xml file, which can be read back into Bulk Extractor for analysis. The output will look similar to below.


The scanners that you selected to run against your image file have will out to a report in the reports column. Not all scanners generate their own report as they may bucket the information that they find with another report. The chart above can help you determine where a scanner will output. Also, when a selected scanner doesn’t return any suitable data, you will not see a report for it. When you do select a report, it will output its findings to the middle column. From there you can type in strings to search for our just scroll down to view the data. If you want to go further into it data, just click on one of the findings in the middle column and more output will appear in the image column all the way to the right. The image column by default will display the text and the location of the data in the image file. There is an option though to change the image output from text to hex.

On top of all of this, there are a few post processing capabilities. One particular is bulk_diff.py. It takes the results of two Bulk Extractor runs and shows the differences between the two runs. This program essentially tells the difference between two disk images. It will note the different features that are found by Bulk Extractor between one image and the next. It can be used, for example, to easily tell whether or not a computer user has been visiting websites they are not supposed to by comparing a disk image from their computer from one week to the next.

With it all said and done, if you are in forensics, Bulk Extractor should be in your kitbag as it will be an enabler no manner the fashion in which it is used or deployed.