Full Code of korczis/foremost for AI

master 9b2ccf2a6d92 cached
17 files
196.0 KB
59.3k tokens
107 symbols
1 requests
Download .txt
Showing preview only (203K chars total). Download the full file or copy to clipboard to get everything.
Repository: korczis/foremost
Branch: master
Commit: 9b2ccf2a6d92
Files: 17
Total size: 196.0 KB

Directory structure:
gitextract_vb7wi003/

├── CHANGES
├── Makefile
├── README
├── api.c
├── cli.c
├── config.c
├── dir.c
├── engine.c
├── extract.c
├── extract.h
├── foremost.8
├── foremost.conf
├── helpers.c
├── main.c
├── main.h
├── ole.h
└── state.c

================================================
FILE CONTENTS
================================================

================================================
FILE: CHANGES
================================================
Version  1.5.7
	-Added support for MP4 files
Version  1.5.6
	-Added support for Office 2007 file as well as bug fixes
Version  1.5.5
	-Added patch submitted by John K. Antonishek as well as cleaning 
	up compiler warnings and man file installation.
Version  1.5.4
	-Added patch submitted by Milan Broz & Eamonn Saunders that 
	 fixes jpeg extraction bug.
warnings
	and an 64 bit bug. 
Version  1.5.3
	-Added patches submitted by Toshio Kuratomi that fix compiler warnings
	and an 64 bit bug. 
Version  1.5.2
	-Fixed problem with gap code thanks to Jeffry Turnmire
Version  1.5.1 
	-Fixed jpeg extraction bug thanks to Jeffry Turnmire
	-Fixed bug in OLE extraction thanks to Filip Van Raemdonck
Version  1.5
	-Fixed Endian errors on OSX
	-Fixed several bugs reported by John K. Antonishek
Version  1.4
	-Fixed realpath problems when compiling with cygwin
	-Fixed flaw in Zip extraction
	-Made indirect block detection a little more stable
Version  1.3
	- Fixed flaw in ZIP algorithm that didn't take into acct zeroized local file headers
	  that contain valid compressed/uncompressed info in the data descriptors
Version  1.2
	- Fixed conf file typos
Version  1.1
	- Improved Speed of extraction functions
	- Added NEXT option to config file
	- Fixed some integer overflow problems
	- Updated config file
	- Added ASCII option for the config file
Version  1.0 
	- Changed display functionq
	- Enhanced RaR and PE extraction
	- Minor bug fixes thanks to Eamon Walsh for the bug report and patch
	- Added support for Windows PE executables
	- Added support for multiple files
	- Thanks to Toshio Kuratomi for fixing some compiler warnings under gcc 4
	- Fixed bugs with respect to unique file names, and quick mode
Version  0.9.4
	- Improved speed and reliability of zip and mpeg extraction algorithms.
Version  0.9.3
	-  Added subdirectories for each output type as opposed to 1 directory
		containing 90,000 files.	
Version  0.9.2
	- Greatly improved OLE extraction capabilites.
Version  0.9.1
	- Re-wrote code to run on LINUX,OSX,BSD,and SOLARIS
	- Added builtin extraction functions
	- Changed default behavior to look for the conf file in /usr/local/etc as 
	  well as the the current dir.  Also the conf file is not required
	  for the program to run if the -t option is enabled.
	- Added a -i switch to specify an input file as opposed to using stdin
	- Added -k to allow the user to change the default chunk size as well 
	  as -b to change the default block size
	- Changed the output dir to a time stamp of when the program was run.
	- Added -d for indirect block detection
Version 0.69 (Our thanks to Zach Kanner for these improvements...)
	- Corrected a bug that prevented the "reverse footer search" option
	  from working correctly.
	- Added a new "NEXT" option, specify NEXT after the footer on any
          search specification line and foremost will search for 
	  the last occurence (forward only currently) of that footer in the 
          window of size length but not including that footer in the resulting 
          output file created.  This feature lets you search for files that 
          don't have good ending footers but are separated by multiple starting           footers or other identifiable data which you know should not be 
          included in the output.  This works really well for MS Word documents           where you don't know where the end is.  The start of another document
          becomes the end.  With this feature as you can specific the "NEXT" 
	  or something after the end of the data we are looking for.
	- Updated the default foremost.conf file to use the feature for .doc
	  files.  Also added tags for ScanSoft PaperPort files (.max), and
          a Windows program called PINs (.pins), which stores encrypted 
	  passwords.
Version 0.68
Version 0.67
	- Added "reverse footer search" option, specify REVERSE after the 
	  footer on any search specification line and foremost will search for
	  the last occurence of that footer in the window of size length.

Version 0.66
	- Changed normal search to Boyer-Moore algorithm. Much faster!
 	- Added progress meter
	- Added ability to suppress extensions from a single file type or 
	   from all file types.
	- Added "chop" field to show when files have been trimmed
           based on their definitions in the configuration file
	- Added "interior" field to show when files have been found 
           somewhere other than a sector boundary
	- Added OpenBSD support
	- Added Win32 support via native compilation (Mingw)
        - Added Win32 support via Cygwin, to include:
                 -using %lld instead of %Ld
                 -ignoring the fnctl line for O_LARGEFILE in Win32
                 -redeclare strsignal as const char strsignal
                 -write function basename for Win32 using '\\' as delimiter
                 -updated Makefile
	- Removed unneccessary header files from foremost.h
             
(Version 0.65 was not published)

Version 0.64 - Audit file now records full paths of input and output files
               Foremost now requires that the output directory is empty
                 before running. If necessary, foremost will create the
	         output directory (ie. if it doesn't exist)
               Added structure to internal code of foremost.c and created 
                 dig.c file
               Fixed bug that generated wrong line number in configuration
                 file error messages
	       Fixed bug on empty wildcard definitions
	       Added limit for number of file types in configuration file

Version 0.63 - Increased speed by using files already loaded in memory	
	         instead of going back to the disk every time.
	       Minor speed increase to helper functions
               Added footers for several file formats including ZIP

Version 0.62 - Added man page and make install functionality
               Added "internal" indicator to show when a file is found
                 off the start of the sector. 
               Fixed discrepancy between audit file and screen output
                regarding file numbers and offset locations (off by one)
               Added more graceful error handling

Version 0.61 - Added check for "^M" line feeds added by MSDOS editors
                 while reading configuration files.

Version 0.6 - Renamed project to "foremost"
	      Added support for wildcards
              Added -q for quick mode
              More code clean up
              Removed BSD porting code (oops) and added support
               for large (>2GB) files.

Version 0.5 - Added -v for verbose mode
              Added more intelligble output regarding file locations
	      Added error handling procedures
	      Added support for loading specification files from the disk

Version 0.4 - More code cleanup
	      (not actually released, used as test during investigation)

Version 0.3 - Code cleanup continues, moved all variables into the 
              state variable. The program still needs a LOT of work.

Version 0.2 - Code cleanup by Jesse Kornblum. Removed linux specific
              code and ported to OpenBSD. Added support for handling
              multiple images from the command line and created the
              state variable. 07 March 2001

Version 0.1 - Proof of concept code written by Kris Kendall,
              originally called "snarfit" 05 March 2001


================================================
FILE: Makefile
================================================

RAW_CC = gcc
RAW_FLAGS = -Wall -O2
LINK_OPT = 
VERSION = 1.5.7
# Try to determine the host system
SYS := $(shell uname -s | tr -d "[0-9]" | tr -d "-" | tr "[A-Z]" "[a-z]")


# You can cross compile this program for Win32 using Linux and the 
# MinGW compiler. See the README for details. If you have already
# installed MinGW, put the location ($PREFIX) here:
CR_BASE = /usr/local/cross-tools/i386-mingw32msvc/bin

# You shouldn't need to change anything below this line
#---------------------------------------------------------------------

# This should be commented out when debugging is done
#RAW_FLAGS += -D__DEBUG -ggdb

NAME = foremost
MAN_PAGES = $(NAME).8.gz

RAW_FLAGS += -DVERSION=\"$(VERSION)\"

# Where we get installed
BIN = /usr/local/bin
MAN = /usr/share/man/man8
CONF= /usr/local/etc
# Setup for compiling and cross-compiling for Windows
# The CR_ prefix refers to cross compiling from OSX to Windows
CR_CC = $(CR_BASE)/gcc
CR_OPT = $(RAW_FLAGS) -D__WIN32
CR_LINK = -liberty
CR_STRIP = $(CR_BASE)/strip
CR_GOAL = $(NAME).exe
WINCC = $(RAW_CC) $(RAW_FLAGS) -D__WIN32

# Generic "how to compile C files"
CC = $(RAW_CC) $(RAW_FLAGS) -D__UNIX
.c.o:   
	$(CC) -c $<


# Definitions we'll need later (and that should rarely change)
HEADER_FILES = main.h ole.h extract.h
SRC =  main.c state.c helpers.c config.c cli.c engine.c dir.c extract.c api.c
OBJ =  main.o state.o helpers.o config.o cli.o engine.o dir.o extract.o api.o
DOCS = Makefile README CHANGES $(MAN_PAGES) foremost.conf
WINDOC = README.txt CHANGES.txt


#---------------------------------------------------------------------
# OPERATING SYSTEM DIRECTIVES
#---------------------------------------------------------------------

all: $(SYS) goals

goals: $(NAME)

linux: CC += -D__LINUX -DLARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64
linux: goals

sunos: solaris
solaris: CC += -D__SOLARIS -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
solaris: goals

darwin: CC += -D__MACOSX
darwin: goals

mac: CC += -D__MACOSX
mac: goals

netbsd:  unix
openbsd: unix
freebsd: unix
unix: goals

#Fore some reasons BSD variants get confused on how to build engine.o
#so lets make it real clear

engine.o:       engine.c
	$(CC) -c engine.c


# Common commands for compiling versions for Windows. 
# See cross and windows directives below.
win_general: LINK_OPT = $(CR_LINK)
win_general: GOAL = $(CR_GOAL)
win_general: goals
	$(STRIP) $(CR_GOAL)

# Cross compiling from Linux to Windows. See README for more info
cross: CC = $(CR_CC) $(CR_OPT)
cross: STRIP = $(CR_STRIP)
cross: win_general

# See the README for information on Windows compilation
windows: CC = $(WINCC)
windows: STRIP = strip
windows: win_general 

cygwin_nt.: unix
cygwin: unix


#---------------------------------------------------------------------
# COMPILE THE PROGRAMS
#   This section must be updated each time you add an algorithm
#---------------------------------------------------------------------

foremost: $(OBJ)
	$(CC) $(OBJ) -o $(NAME) $(LINK_OPT)


#---------------------------------------------------------------------
# INSTALLATION AND REMOVAL 
#---------------------------------------------------------------------

install: goals
	install -m 755 $(NAME) $(BIN)
	install -m 444 $(MAN_PAGES) $(MAN)
	install -m 444 foremost.conf $(CONF)
macinstall: BIN = /usr/local/bin/
macinstall: MAN = /usr/share/man/man1/
macinstall: CONF = /usr/local/etc/
macinstall: mac install


uninstall:
	rm -f -- $(BIN)/{$(RM_GOALS)}
	rm -f -- $(MAN)/{$(RM_DOCS)}

macuninstall: BIN = /usr/bin
macuninstall: MAN = /usr/share/man/man1
macuninstall: uninstall

#---------------------------------------------------------------------
# CLEAN UP
#---------------------------------------------------------------------

# This is used for debugging
preflight:
	grep -n RBF *.1 *.h *.c README CHANGES

nice:
	rm -f -- *~

clean: nice
	rm -f -- *.o
	rm -f -- $(CR_GOAL) $(NAME) $(WIN_DOC)
	rm -f -- $(TAR_FILE).gz $(DEST_DIR).zip $(DEST_DIR).zip.gpg

#-------------------------------------------------------------------------
# MAKING PACKAGES
#-------------------------------------------------------------------------

EXTRA_FILES = 
DEST_DIR = $(NAME)-$(VERSION)
TAR_FILE = $(DEST_DIR).tar
PKG_FILES = $(SRC) $(HEADER_FILES) $(DOCS) $(EXTRA_FILES)

# This packages me up to send to somebody else
package: clean
	rm -f $(TAR_FILE) $(TAR_FILE).gz
	mkdir $(DEST_DIR)
	cp $(PKG_FILES) $(DEST_DIR)
	tar cvf $(TAR_FILE) $(DEST_DIR)
	rm -rf $(DEST_DIR)
	gzip $(TAR_FILE)


# This Makefile is designed for Mac OSX to package the file. 
# To do this on a linux box, The big line below starting with "/usr/bin/tbl"
# should be replaced with:
#
#	man ./$(MD5GOAL).1 | col -bx > README.txt
#
# and the "flip -d" command should be replaced with dos2unix
#
# The flip command can be found at:
# http://ccrma-www.stanford.edu/~craig/utility/flip/#
win-doc:
	/usr/bin/tbl ./$(MD5GOAL).1 | /usr/bin/groff -S -Wall -mtty-char -mandoc -Tascii | /usr/bin/col > README.txt
	cp CHANGES CHANGES.txt
	flip -d $(WINDOC)

cross-pkg: clean cross win-doc
	rm -f $(DEST_DIR).zip
	zip $(DEST_DIR).zip $(CR_MD5GOAL) $(CR_SHA1GOAL) $(CR_SHA256GOAL) $(WINDOC)
	rm -f $(WINDOC)

world: package cross-pkg


================================================
FILE: README
================================================

FOREMOST 
----------------------------------------------------------------------

Foremost is a Linux program to recover files based on their headers and
footers. Foremost can work on image files, such as those generated by dd,
Safeback, Encase, etc, or directly on a drive. The headers and footers are
specified by a configuration file, so you can pick and choose which
headers you want to look for.



--------------------------------------------
INSTALL FOREMOST
--------------------------------------------

To run foremost, you must:

- uncompress the archive
- compile
- install

Here's how to do it:

LINUX:
$ tar zxvf foremost-xx.tar.gz
$ cd foremost-xx
$ make
$ make install

BSD:
$ tar zxvf foremost-xx.tar.gz
$ cd foremost-xx
$ make unix
$ make install

SOLARIS:
$ tar zxvf foremost-xx.tar.gz
$ cd foremost-xx
$ make solaris
$ make install

OSX:
$ tar zxvf foremost-xx.tar.gz
$ cd foremost-xx
$ make mac
$ make macinstall

On systems with older versions of glibc (earlier than 2.2.0), you will get 
some harmless warnings about ftello and fseeko not being defined. You can 
ignore these.


If you ever need to remove foremost from your system, you can do this:

$ make uninstall



--------------------------------------------
USING FOREMOST
--------------------------------------------

A description of the command line arguments can be found in the man page. 
To view it:

$ man foremost



--------------------------------------------
CONFIGURATION FILE FORMAT
--------------------------------------------

The configuration file is used to control what types of files foremost
searches for. A sample configuration file, foremost.conf, is included with
this distribution. For each file type, the configuration file describes
the file's extension, whether the header and footer are case sensitive,
the maximum file size, and the header and footer for the file. The footer
field is optional, but header, size, case sensitivity, and extension are
not!

Any line that begins with a '#' is considered a comment and ignored. Thus,
to skip a file type just put a '#' at the beginning of that line

Headers and footers are decoded before use. To specify a value in
hexadecimal use \x[0-f][0-f], and for octal use \[0-7][0-7][0-7].  Spaces
can be represented by \s. Example: "\x4F\123\I\sCCI" decodes to "OSI CCI".

To match any single character (aka a wildcard) use a '?'. If you need to
search for the '?' character, you will need to change the 'wildcard' line
*and* every occurrence of the old wildcard character in the configuration
file. Don't forget those hex and octal values! '?' is equal to 0x3f and
\063.

Here's a sample set of headers and footers:

# extension  case-sens  max-size   header			footer		(option)
#
# GIF and JPG files (very common)
	gif	y	155000	\x47\x49\x46\x38\x37\x61	\x00\x3b
  	gif	y 	155000	\x47\x49\x46\x38\x39\x61	\x00\x00\x3b
  	jpg	y	200000	\xff\xd8\xff			\xff\xd9

Note: the option is a method of specifying additional options.  Current the following options exist:

FORWARD: Specify to search from the header to the footer (optional) up to the max-size.
REVERSE: Specify to search from the footer to the header up to the max-size.
NEXT: Specify to search from the header to the data just past the footer.  This allows you to  specify data that you know is 'NOT' in the data you are looking for and should terminated the search, up to the max-size.

--------------------------------------------
BUG REPORTING
--------------------------------------------

Please report ALL bugs to nick dot mikus AT gmail d0t com. Please include a 
description of the bug, how you found it, and your contact information.




--------------------------------------------
CREDITS AND THANKS
--------------------------------------------

Foremost was written by Special Agent Kris Kendall and Special Agent Jesse
Kornblum of the United States Air Force Office of Special Investigations
starting in March 2001. This program would not be what it is today without
help from (in no particular order): Rob Meekins, Dan Kalil, and Chet
Maciag. This project was inspired by CarvThis, written by the Defense
Computer Forensic Lab in 1999.


--------------------------------------------
LEGAL NOTICE
--------------------------------------------

dd, Safeback, and Encase are copyrighted works and any questions regarding 
these tools should be directed to the copyright holders. The United States 
Government does not endorse the use of these or any other imaging tools. 


================================================
FILE: api.c
================================================
/*
	Modified API from http://chicago.sourceforge.net/devel/docs/ole/
	Basically the same API, added error checking and the ability
	to check buffers for docs, not just files.
*/
#include "main.h"
#include "ole.h"

/*Some ugly globals
* This API should be re-written
* in a modular fashion*/
unsigned char	buffer[OUR_BLK_SIZE];
char			*extract_name;
int				extract = 0;
int				dir_count = 0;
int				*FAT;
int				verbose = TRUE;
int				FATblk;
int				currFATblk;
int				highblk = 0;
int				block_list[OUR_BLK_SIZE / sizeof(int)];
extern int		errno;

/*Inititialize those globals used by extract_ole*/
void init_ole()
{
	int i = 0;
	extract = 0;
	dir_count = 0;
	FAT = NULL;
	highblk = 0;
	FATblk = 0;
	currFATblk = -1;
	dirlist = NULL;
	dl = NULL;
	for (i = 0; i < OUR_BLK_SIZE / sizeof(int); i++)
		{
		block_list[i] = 0;
		}

	for (i = 0; i < OUR_BLK_SIZE; i++)
		{
		buffer[i] = 0;
		}
}

void *Malloc(size_t bytes)
{
	void	*x;

	x = malloc(bytes);
	if (x)
		return x;
	die("Can't malloc %d bytes.\n", (char *)bytes);
	return 0;
}

void die(char *fmt, void *arg)
{
	fprintf(stderr, fmt, arg);
	exit(1);
}

int get_dir_block(unsigned char *fd, int blknum, int buffersize)
{
	int				i;
	struct OLE_DIR	*dir;
	unsigned char	*dest = NULL;

	dest = get_ole_block(fd, blknum, buffersize);
	if (dest == NULL)
		{
		return FALSE;
		}

	for (i = 0; i < DIRS_PER_BLK; i++)
		{
		dir = (struct OLE_DIR *) &dest[sizeof(struct OLE_DIR) * i];
		if (dir->type == NO_ENTRY)
			break;
		}

	if (i == DIRS_PER_BLK)
		{
		return TRUE;
		}
	else
		{
		return SHORT_BLOCK;
		}
}

int get_dir_info(unsigned char *src)
{
	int				i, j;
	char			*p, *q;
	struct OLE_DIR	*dir;
	int				punctCount = 0;
	short			name_size = 0;

	for (i = 0; i < DIRS_PER_BLK; i++)
		{
		dir = (struct OLE_DIR *) &src[sizeof(struct OLE_DIR) * i];
		punctCount = 0;

		//if(dir->reserved!=0) return FALSE;
		if (dir->type < 0)	//Should we check if values are > 5 ?????
		{
#ifdef DEBUG
			printf("\n	Invalid directory type\n");
			printf("type:=%c size:=%lu \n", dir->type, dir->size);
#endif
			return FALSE;
		}

		if (dir->type == NO_ENTRY)
			break;

#ifdef DEBUG

		//dump_dirent (i);
#endif
		dl = &dirlist[dir_count++];
		if (dl == NULL)
		{
#ifdef DEBUG
			printf("dl==NULL!!! bailing out\n");
#endif
			return FALSE;
		}

		if (dir_count > 500)
			return FALSE;	/*SANITY CHECKING*/
		q = dl->name;
		p = dir->name;

		name_size = htos((unsigned char *) &dir->namsiz, FOREMOST_LITTLE_ENDIAN);

#ifdef DEBUG
		printf(" dir->namsiz:=%d\n", name_size);
#endif
		if (name_size > 64 || name_size <= 0)
			return FALSE;

		if (*p < ' ')
			p += 2;			/* skip leading short */
		for (j = 0; j < name_size; j++, p++)
			{

			if (p == NULL || q == NULL)
				return FALSE;
			if (*p && isprint(*p))
				{

				if (ispunct(*p))
					punctCount++;
				*q++ = *p;

				}
			}

		if (punctCount > 3)
		{
#ifdef DEBUG
			printf("dl->name:=%s\n", dl->name);
			printf("pcount > 3!!! bailing out\n");
#endif
			return FALSE;
		}

		if (dl->name == NULL)
		{
#ifdef DEBUG
			printf("	***NULL dir name. bailing out \n");
#endif
			return FALSE;
		}

		/*Ignore Catalogs*/
		if (strstr(dl->name, "Catalog"))
			return FALSE;
		*q = 0;
		dl->type = dir->type;
		dl->size = htoi((unsigned char *) &dir->size, FOREMOST_LITTLE_ENDIAN);

		dl->start_block = htoi((unsigned char *) &dir->start_block, FOREMOST_LITTLE_ENDIAN);
		dl->next = htoi((unsigned char *) &dir->next_dirent, FOREMOST_LITTLE_ENDIAN);
		dl->prev = htoi((unsigned char *) &dir->prev_dirent, FOREMOST_LITTLE_ENDIAN);
		dl->dir = htoi((unsigned char *) &dir->dir_dirent, FOREMOST_LITTLE_ENDIAN);
		if (dir->type != STREAM)
			{
			dl->s1 = dir->secs1;
			dl->s2 = dir->secs2;
			dl->d1 = dir->days1;
			dl->d2 = dir->days2;
			}
		}

	return TRUE;
}

static int	*lnlv;			/* last next link visited ! */
int reorder_dirlist(struct DIRECTORY *dir, int level)
{

	//printf("	Reordering the dirlist\n");
	dir->level = level;
	if (dir->dir != -1 || dir->dir > dir_count)
		{
		return 0;
		}
	else if (!reorder_dirlist(&dirlist[dir->dir], level + 1))
		return 0;

	/* reorder next-link subtree, saving the most next link visited */
	if (dir->next != -1)
		{
		if (dir->next > dir_count)
			return 0;
		else if (!reorder_dirlist(&dirlist[dir->next], level))
			return 0;
		}
	else
		lnlv = &dir->next;

	/* move the prev child to the next link and reorder it, if any exist
 */
	if (dir->prev != -1)
		{
		if (dir->prev > dir_count)
			return 0;
		else
			{
			*lnlv = dir->prev;
			dir->prev = -1;
			if (!reorder_dirlist(&dirlist[*lnlv], level))
				return 0;
			}
		}

	return 1;
}

int get_block(unsigned char *fd, int blknum, unsigned char *dest, long long int buffersize)
{
	unsigned char		*temp = fd;
	int					i = 0;
	unsigned long long	jump = (unsigned long long)OUR_BLK_SIZE * (unsigned long long)(blknum + 1);
	if (blknum < -1 || jump < 0 || blknum > buffersize || buffersize < jump)
	{
#ifdef DEBUG
		printf("	Bad blk read1 blknum:=%d  jump:=%lld buffersize=%lld\n", blknum, jump, buffersize);
#endif
		return FALSE;
	}

	temp = fd + jump;
#ifdef DEBUG
	printf("	Jumping to %lld blknum=%d buffersize=%lld\n", jump, blknum, buffersize);
#endif
	for (i = 0; i < OUR_BLK_SIZE; i++)
		{
		dest[i] = temp[i];
		}

	if ((blknum + 1) > highblk)
		highblk = blknum + 1;
	return TRUE;
}

unsigned char *get_ole_block(unsigned char *fd, int blknum, unsigned long long buffersize)
{
	unsigned long long	jump = (unsigned long long)OUR_BLK_SIZE * (unsigned long long)(blknum + 1);
	if (blknum < -1 || jump < 0 || blknum > buffersize || buffersize < jump)
	{
#ifdef DEBUG
		printf("	Bad blk read1 blknum:=%d  jump:=%lld buffersize=%lld\n", blknum, jump, buffersize);
#endif
		return FALSE;
	}

#ifdef DEBUG
	printf("	Jumping to %lld blknum=%d buffersize=%lld\n", jump, blknum, buffersize);
#endif
	return (fd + jump);
}

int get_FAT_block(unsigned char *fd, int blknum, int *dest, int buffersize)
{
	static int	FATblk;

	//   static int currFATblk = -1;
	FATblk = htoi((unsigned char *) &FAT[blknum / (OUR_BLK_SIZE / sizeof(int))],
				  FOREMOST_LITTLE_ENDIAN);
#ifdef DEBUG
	printf("****blknum:=%d FATblk:=%d currFATblk:=%d\n", blknum, FATblk, currFATblk);
#endif
	if (currFATblk != FATblk)
	{
#ifdef DEBUG
		printf("*****blknum:=%d FATblk:=%d\n", blknum, FATblk);
#endif
		if (!get_block(fd, FATblk, (unsigned char *)dest, buffersize))
			{
			return FALSE;
			}

		currFATblk = FATblk;
	}

	return TRUE;
}

void dump_header(struct OLE_HDR *h)
{
	int i, *x;

	//struct OLE_HDR *h = (struct OLE_HDR *) buffer;
	// fprintf (stderr, "clsid  = ");
	//printx(h->clsid,0,16);
	fprintf(stderr,
			"\nuMinorVersion  = %u\t",
			htos((unsigned char *) &h->uMinorVersion, FOREMOST_LITTLE_ENDIAN));
	fprintf(stderr,
			"uDllVersion  = %u\t",
			htos((unsigned char *) &h->uDllVersion, FOREMOST_LITTLE_ENDIAN));
	fprintf(stderr,
			"uByteOrder  = %u\n",
			htos((unsigned char *) &h->uByteOrder, FOREMOST_LITTLE_ENDIAN));
	fprintf(stderr,
			"uSectorShift  = %u\t",
			htos((unsigned char *) &h->uSectorShift, FOREMOST_LITTLE_ENDIAN));
	fprintf(stderr,
			"uMiniSectorShift  = %u\t",
			htos((unsigned char *) &h->uMiniSectorShift, FOREMOST_LITTLE_ENDIAN));
	fprintf(stderr,
			"reserved  = %u\n",
			htos((unsigned char *) &h->reserved, FOREMOST_LITTLE_ENDIAN));
	fprintf(stderr,
			"reserved1  = %u\t",
			htoi((unsigned char *) &h->reserved1, FOREMOST_LITTLE_ENDIAN));
	fprintf(stderr,
			"reserved2  = %u\t",
			htoi((unsigned char *) &h->reserved2, FOREMOST_LITTLE_ENDIAN));
	fprintf(stderr,
			"csectMiniFat = %u\t",
			htoi((unsigned char *) &h->csectMiniFat, FOREMOST_LITTLE_ENDIAN));
	fprintf(stderr,
			"miniSectorCutoff = %u\n",
			htoi((unsigned char *) &h->miniSectorCutoff, FOREMOST_LITTLE_ENDIAN));
	fprintf(stderr,
			"root_start_block  = %u\n",
			htoi((unsigned char *) &h->root_start_block, FOREMOST_LITTLE_ENDIAN));
	fprintf(stderr,
			"dir flag = %u\n",
			htoi((unsigned char *) &h->dir_flag, FOREMOST_LITTLE_ENDIAN));
	fprintf(stderr,
			"# FAT blocks = %u\n",
			htoi((unsigned char *) &h->num_FAT_blocks, FOREMOST_LITTLE_ENDIAN));
	fprintf(stderr,
			"FAT_next_block = %u\n",
			htoi((unsigned char *) &h->FAT_next_block, FOREMOST_LITTLE_ENDIAN));
	fprintf(stderr,
			"# extra FAT blocks = %u\n",
			htoi((unsigned char *) &h->num_extra_FAT_blocks, FOREMOST_LITTLE_ENDIAN));
	x = (int *) &h[1];
	fprintf(stderr, "bbd list:");
	for (i = 0; i < 109; i++, x++)
		{
		if ((i % 10) == 0)
			fprintf(stderr, "\n");
		if (*x == '\xff')
			break;
		fprintf(stderr, "%x ", *x);
		}

	fprintf(stderr, "\n	**************End of header***********\n");
}

struct OLE_HDR *reverseBlock(struct OLE_HDR *dest, struct OLE_HDR *h)
{
	int i, *x, *y;
	dest->uMinorVersion = htos((unsigned char *) &h->uMinorVersion, FOREMOST_LITTLE_ENDIAN);
	dest->uDllVersion = htos((unsigned char *) &h->uDllVersion, FOREMOST_LITTLE_ENDIAN);
	dest->uByteOrder = htos((unsigned char *) &h->uByteOrder, FOREMOST_LITTLE_ENDIAN);				/*28*/
	dest->uSectorShift = htos((unsigned char *) &h->uSectorShift, FOREMOST_LITTLE_ENDIAN);
	dest->uMiniSectorShift = htos((unsigned char *) &h->uMiniSectorShift, FOREMOST_LITTLE_ENDIAN);	/*32*/
	dest->reserved = htos((unsigned char *) &h->reserved, FOREMOST_LITTLE_ENDIAN);					/*34*/
	dest->reserved1 = htoi((unsigned char *) &h->reserved1, FOREMOST_LITTLE_ENDIAN);				/*36*/
	dest->reserved2 = htoi((unsigned char *) &h->reserved2, FOREMOST_LITTLE_ENDIAN);				/*40*/
	dest->num_FAT_blocks = htoi((unsigned char *) &h->num_FAT_blocks, FOREMOST_LITTLE_ENDIAN);		/*44*/
	dest->root_start_block = htoi((unsigned char *) &h->root_start_block, FOREMOST_LITTLE_ENDIAN);	/*48*/
	dest->dfsignature = htoi((unsigned char *) &h->dfsignature, FOREMOST_LITTLE_ENDIAN);			/*52*/
	dest->miniSectorCutoff = htoi((unsigned char *) &h->miniSectorCutoff, FOREMOST_LITTLE_ENDIAN);	/*56*/
	dest->dir_flag = htoi((unsigned char *) &h->dir_flag, FOREMOST_LITTLE_ENDIAN);					/*60 first sec in the mini fat chain*/
	dest->csectMiniFat = htoi((unsigned char *) &h->csectMiniFat, FOREMOST_LITTLE_ENDIAN);			/*64 number of sectors in the minifat */
	dest->FAT_next_block = htoi((unsigned char *) &h->FAT_next_block, FOREMOST_LITTLE_ENDIAN);		/*68*/
	dest->num_extra_FAT_blocks = htoi((unsigned char *) &h->num_extra_FAT_blocks,
									  FOREMOST_LITTLE_ENDIAN);

	x = (int *) &h[1];
	y = (int *) &dest[1];
	for (i = 0; i < 109; i++, x++)
		{
		*y = htoi((unsigned char *)x, FOREMOST_LITTLE_ENDIAN);
		y++;
		}

	return dest;
}

void dump_ole_header(struct OLE_HDR *h)
{
	int i, *x;

	//fprintf (stderr, "clsid  = ");
	//printx(h->clsid,0,16);
	fprintf(stderr,
			"\nuMinorVersion  = %u\t",
			htos((unsigned char *) &h->uMinorVersion, FOREMOST_LITTLE_ENDIAN));
	fprintf(stderr,
			"uDllVersion  = %u\t",
			htos((unsigned char *) &h->uDllVersion, FOREMOST_LITTLE_ENDIAN));
	fprintf(stderr,
			"uByteOrder  = %u\n",
			htos((unsigned char *) &h->uByteOrder, FOREMOST_LITTLE_ENDIAN));
	fprintf(stderr,
			"uSectorShift  = %u\t",
			htos((unsigned char *) &h->uSectorShift, FOREMOST_LITTLE_ENDIAN));
	fprintf(stderr,
			"uMiniSectorShift  = %u\t",
			htos((unsigned char *) &h->uMiniSectorShift, FOREMOST_LITTLE_ENDIAN));
	fprintf(stderr,
			"reserved  = %u\n",
			htos((unsigned char *) &h->reserved, FOREMOST_LITTLE_ENDIAN));
	fprintf(stderr,
			"reserved1  = %u\t",
			htoi((unsigned char *) &h->reserved1, FOREMOST_LITTLE_ENDIAN));
	fprintf(stderr,
			"reserved2  = %u\t",
			htoi((unsigned char *) &h->reserved2, FOREMOST_LITTLE_ENDIAN));
	fprintf(stderr,
			"csectMiniFat = %u\t",
			htoi((unsigned char *) &h->csectMiniFat, FOREMOST_LITTLE_ENDIAN));
	fprintf(stderr,
			"miniSectorCutoff = %u\n",
			htoi((unsigned char *) &h->miniSectorCutoff, FOREMOST_LITTLE_ENDIAN));
	fprintf(stderr,
			"root_start_block  = %u\n",
			htoi((unsigned char *) &h->root_start_block, FOREMOST_LITTLE_ENDIAN));
	fprintf(stderr,
			"dir flag = %u\n",
			htoi((unsigned char *) &h->dir_flag, FOREMOST_LITTLE_ENDIAN));
	fprintf(stderr,
			"# FAT blocks = %u\n",
			htoi((unsigned char *) &h->num_FAT_blocks, FOREMOST_LITTLE_ENDIAN));
	fprintf(stderr,
			"FAT_next_block = %u\n",
			htoi((unsigned char *) &h->FAT_next_block, FOREMOST_LITTLE_ENDIAN));
	fprintf(stderr,
			"# extra FAT blocks = %u\n",
			htoi((unsigned char *) &h->num_extra_FAT_blocks, FOREMOST_LITTLE_ENDIAN));
	x = (int *) &h[1];
	fprintf(stderr, "bbd list:");
	for (i = 0; i < 109; i++, x++)
		{
		if ((i % 10) == 0)
			fprintf(stderr, "\n");
		if (*x == '\xff')
			break;
		fprintf(stderr, "%x ", htoi((unsigned char *)x, FOREMOST_LITTLE_ENDIAN));
		}

	fprintf(stderr, "\n	**************End of header***********\n");
}

int dump_dirent(int which_one)
{
	int				i;
	char			*p;
	short			unknown;
	struct OLE_DIR	*dir;

	dir = (struct OLE_DIR *) &buffer[which_one * sizeof(struct OLE_DIR)];
	if (dir->type == NO_ENTRY)
		return TRUE;
	fprintf(stderr, "DIRENT_%d :\t", dir_count);
	fprintf(stderr,
			"%s\t",
			(dir->type == ROOT) ? "root directory" : (dir->type == STORAGE) ? "directory" : "file");

	/* get UNICODE name */
	p = dir->name;
	if (*p < ' ')
		{
		unknown = *((short *)p);

		//fprintf (stderr, "%04x\t", unknown);
		p += 2; /* step over unknown short */
		}

	for (i = 0; i < dir->namsiz; i++, p++)
		{
		if (*p && (*p > 0x1f))
			{
			if (isprint(*p))
				{
				fprintf(stderr, "%c", *p);
				}
			else
				{
				printf("***	Invalid char %x ***\n", *p);
				return FALSE;
				}
			}
		}

	fprintf(stderr, "\n");

	//fprintf (stderr, "prev_dirent = %lu\t", dir->prev_dirent);
	//fprintf (stderr, "next_dirent = %lu\t", dir->next_dirent);
	//fprintf (stderr, "dir_dirent  = %lu\n", dir->dir_dirent);
	//fprintf (stderr, "name  = %s\t", dir->name);
	fprintf(stderr, "namsiz  = %u\t", dir->namsiz);
	fprintf(stderr, "type  = %d\t", dir->type);
	fprintf(stderr, "reserved  = %u\n", dir->reserved);

	fprintf(stderr, "start block  = %lu\n", dir->start_block);
	fprintf(stderr, "size  = %lu\n", dir->size);
	fprintf(stderr, "\n	**************End of dirent***********\n");
	return TRUE;
}


================================================
FILE: cli.c
================================================


#include "main.h"

void fatal_error (f_state * s, char *msg)
	{
	fprintf(stderr, "%s: %s%s", __progname, msg, NEWLINE);
	if (get_audit_file_open(s))
		{
		audit_msg(s, msg);
		close_audit_file(s);
		}
	exit(EXIT_FAILURE);
	}

void print_error(f_state *s, char *fn, char *msg)
{
	if (!(get_mode(s, mode_quiet)))
		fprintf(stderr, "%s: %s: %s%s", __progname, fn, msg, NEWLINE);
}

void print_message(f_state *s, char *format, va_list argp)
{
	vfprintf(stdout, format, argp);
	fprintf(stdout, "%s", NEWLINE);
}


================================================
FILE: config.c
================================================


#include "main.h"

int translate (char *str)
	{
	char	next;
	char	*rd = str, *wr = str, *bad;
	char	temp[1 + 3 + 1];
	char	ch;

	if (!*rd)					//If it's a null string just return
		{
		return 0;
		}

	while (*rd)
		{

		/* Is it an escaped character ? */
		if (*rd == '\\')
			{
			rd++;
			switch (*rd)
				{
				case '\\':
					*rd++;
					*wr++ = '\\';
					break;

				case 'a':
					*rd++;
					*wr++ = '\a';
					break;

				case 's':
					*rd++;
					*wr++ = ' ';
					break;

				case 'n':
					*rd++;
					*wr++ = '\n';
					break;

				case 'r':
					*rd++;
					*wr++ = '\r';
					break;

				case 't':
					*rd++;
					*wr++ = '\t';
					break;

				case 'v':
					*rd++;
					*wr++ = '\v';
					break;

				/* Hexadecimal/Octal values are treated in one place using strtoul() */
				case 'x':
				case '0':
				case '1':
				case '2':
				case '3':
					next = *(rd + 1);
					if (next < 48 || (57 < next && next < 65) || (70 < next && next < 97) || next > 102)
						break;	//break if not a digit or a-f, A-F
					next = *(rd + 2);
					if (next < 48 || (57 < next && next < 65) || (70 < next && next < 97) || next > 102)
						break;	//break if not a digit or a-f, A-F
					temp[0] = '0';
					bad = temp;
					strncpy(temp + 1, rd, 3);
					temp[4] = '\0';
					ch = strtoul(temp, &bad, 0);
					if (*bad == '\0')
						{
						*wr++ = ch;
						rd += 3;
						}		/* else INVALID CHARACTER IN INPUT ('\\' followed by *rd) */
					break;

				default:		/* INVALID CHARACTER IN INPUT (*rd)*/
					*wr++ = '\\';
					break;
				}
			}

		/* Unescaped characters go directly to the output */
		else
			*wr++ = *rd++;
		}
	*wr = '\0';					//Null terminate the string that we just created...
	return wr - str;
	}

char *skipWhiteSpace(char *str)
{
	while (isspace(str[0]))
		str++;
	return str;
}

int extractSearchSpecData(f_state *state, char **tokenarray)
{

	/* Process a normal line with 3-4 tokens on it
   token[0] = suffix
   token[1] = case sensitive
   token[2] = size to snarf
   token[3] = begintag
   token[4] = endtag (optional)
   token[5] = search for footer from back of buffer flag and other options (whew!)
*/

	/* Allocate the memory for these lines.... */
	s_spec	*s = &search_spec[state->num_builtin];

	s->suffix = malloc(MAX_SUFFIX_LENGTH * sizeof(char));
	s->header = malloc(MAX_STRING_LENGTH * sizeof(char));
	s->footer = malloc(MAX_STRING_LENGTH * sizeof(char));
	s->type = CONF;
	if (!strncasecmp(tokenarray[0], FOREMOST_NOEXTENSION_SUFFIX, strlen(FOREMOST_NOEXTENSION_SUFFIX)
		))
		{
		s->suffix[0] = ' ';
		s->suffix[1] = 0;
		}
	else
		{

		/* Assign the current line to the SearchSpec object */
		memcpy(s->suffix, tokenarray[0], MAX_SUFFIX_LENGTH);
		}

	/* Check for case sensitivity */
	s->case_sen = (!strncasecmp(tokenarray[1], "y", 1) || !strncasecmp(tokenarray[1], "yes", 3));

	s->max_len = atoi(tokenarray[2]);

	/* Determine which search type we want to use for this needle */
	s->searchtype = SEARCHTYPE_FORWARD;
	if (!strncasecmp(tokenarray[5], "REVERSE", strlen("REVERSE")))
		{

		s->searchtype = SEARCHTYPE_REVERSE;
		}
	else if (!strncasecmp(tokenarray[5], "NEXT", strlen("NEXT")))
		{
		s->searchtype = SEARCHTYPE_FORWARD_NEXT;
		}

	// this is the default, but just if someone wants to provide this value just to be sure
	else if (!strncasecmp(tokenarray[5], "FORWARD", strlen("FORWARD")))
		{
		s->searchtype = SEARCHTYPE_FORWARD;
		}
	else if (!strncasecmp(tokenarray[5], "ASCII", strlen("ASCII")))
		{
			//fprintf(stderr,"Setting ASCII TYPE\n");
		s->searchtype = SEARCHTYPE_ASCII;
		}

	/* Done determining searchtype */

	/* We copy the tokens and translate them from the file format.
   The translate() function does the translation and returns
   the length of the argument being translated */
	s->header_len = translate(tokenarray[3]);
	memcpy(s->header, tokenarray[3], s->header_len);
	s->footer_len = translate(tokenarray[4]);
	memcpy(s->footer, tokenarray[4], s->footer_len);

	init_bm_table(s->header, s->header_bm_table, s->header_len, s->case_sen, s->searchtype);
	init_bm_table(s->footer, s->footer_bm_table, s->footer_len, s->case_sen, s->searchtype);

	return TRUE;
}

int process_line(f_state *s, char *buffer, int line_number)
{

	char	*buf = buffer;
	char	*token;
	char	**tokenarray = (char **)malloc(6 * sizeof(char[MAX_STRING_LENGTH]));
	int		i = 0, len = strlen(buffer);

	/* Any line that ends with a CTRL-M (0x0d) has been processed
   by a DOS editor. We will chop the CTRL-M to ignore it */
	if (buffer[len - 2] == 0x0d && buffer[len - 1] == 0x0a)
		{
		buffer[len - 2] = buffer[len - 1];
		buffer[len - 1] = buffer[len];
		}

	buf = (char *)skipWhiteSpace(buf);
	token = strtok(buf, " \t\n");

	/* Any line that starts with a '#' is a comment and can be skipped */
	if (token == NULL || token[0] == '#')
		{
		return TRUE;
		}

	/* Check for the wildcard */
	if (!strncasecmp(token, "wildcard", 9))
		{
		if ((token = strtok(NULL, " \t\n")) != NULL)
			{
			translate(token);
			}
		else
			{
			return TRUE;
			}

		if (strlen(token) > 1)
			{
			fprintf(stderr,
					"Warning: Wildcard can only be one character,"
					" but you specified %zu characters.\n"
				"         Using the first character, \"%c\", as the wildcard.\n",
			strlen(token),
					token[0]);
			}

		wildcard = token[0];
		return TRUE;
		}

	while (token && (i < NUM_SEARCH_SPEC_ELEMENTS))
		{
		tokenarray[i] = token;
		i++;
		token = strtok(NULL, " \t\n");
		}

	switch (NUM_SEARCH_SPEC_ELEMENTS - i)
		{
		case 2:
			tokenarray[NUM_SEARCH_SPEC_ELEMENTS - 1] = "";
			tokenarray[NUM_SEARCH_SPEC_ELEMENTS - 2] = "";
			break;

		case 1:
			tokenarray[NUM_SEARCH_SPEC_ELEMENTS - 1] = "";
			break;

		case 0:
			break;

		default:
			fprintf(stderr, "\nERROR: In line %d of the configuration file.\n", line_number);
			return FALSE;
			return TRUE;

		}

	if (!extractSearchSpecData(s, tokenarray))
		{
		fprintf(stderr,
				"\nERROR: Unknown error on line %d of the configuration file.\n",
				line_number);
		}

	s->num_builtin++;

	return TRUE;
}

int load_config_file(f_state *s)
{
	FILE	*f;
	char	*buffer = (char *)malloc(MAX_STRING_LENGTH * sizeof(char));
	off_t	line_number = 0;

#ifdef __DEBUG
	printf("About to open config file %s%s", get_config_file(s), NEWLINE);
#endif

	if ((f = fopen(get_config_file(s), "r")) == NULL)
	{

		/*Can't find  a conf in the current directory
    * So lets try the /usr/local/etc*/
#ifdef __WIN32
		set_config_file(s, "/Program Files/foremost/foremost.conf");
#else
		set_config_file(s, "/usr/local/etc/foremost.conf");
#endif
		if ((f = fopen(get_config_file(s), "r")) == NULL)
			{
			print_error(s, get_config_file(s), strerror(errno));
			free(buffer);
			return TRUE;
			}

	}

	while (fgets(buffer, MAX_STRING_LENGTH, f))
		{
		++line_number;
		if (!process_line(s, buffer, line_number))
			{
			free(buffer);
			fclose(f);
			return TRUE;

			}
		}

	fclose(f);
	free(buffer);
	return FALSE;
}


================================================
FILE: dir.c
================================================


#include "main.h"

int is_empty_directory (DIR * temp)
	{

	/* Empty directories contain two entries for . and .. 
     A directory with three entries, therefore, is not empty */
	if (readdir(temp) && readdir(temp) && readdir(temp))
		return FALSE;

	return TRUE;
	}

/*Try to cleanup the ouput directory if nothing to a sub-dir*/
void cleanup_output(f_state *s)
{
	char			dir_name[MAX_STRING_LENGTH];

	DIR				*temp;
	DIR				*outputDir;
	struct dirent	*entry;

	if ((outputDir = opendir(get_output_directory(s))) == NULL)
		{

		/*Error?*/
		}

	while ((entry = readdir(outputDir)))
		{
		memset(dir_name, 0, MAX_STRING_LENGTH - 1);
		strcpy(dir_name, get_output_directory(s));
		strcat(dir_name, "/");
		strcat(dir_name, entry->d_name);
		temp = opendir(dir_name);
		if (temp != NULL)
			{
			if (is_empty_directory(temp))
				{
				rmdir(dir_name);
				}
			}

		}

}

int make_new_directory(f_state *s, char *fn)
{

#ifdef __WIN32

	#ifndef __CYGWIN
fprintf(stderr,"Calling mkdir with\n");	
	if (mkdir(fn))
	#endif

#else
		mode_t	new_mode =
			(
				S_IRUSR |
				S_IWUSR |
				S_IXUSR |
				S_IRGRP |
				S_IWGRP |
				S_IXGRP |
				S_IROTH |
				S_IWOTH
			);
	if (mkdir(fn, new_mode))
#endif
		{
		if (errno != EEXIST)
			{
			print_error(s, fn, strerror(errno));
			return TRUE;
			}
		}

	return FALSE;
}

/*Clean the timestamped dir name to make it a little more file system friendly*/
char *clean_time_string(char *time)
{
	int len = strlen(time);
	int i = 0;

	for (i = 0; i < len; i++)
	{
#ifdef __WIN32
		if (time[i] == ':' && time[i + 1] != '\\')
			{
			time[i] = '_';
			}

#else
		if (time[i] == ' ' || time[i] == ':')
			{
			time[i] = '_';
			}
#endif
	}

	return time;
}

int create_output_directory(f_state *s)
{
	DIR		*d;
	char	dir_name[MAX_STRING_LENGTH];
  
	memset(dir_name, 0, MAX_STRING_LENGTH - 1);
	if (s->time_stamp)
		{
		strcpy(dir_name, get_output_directory(s));
		strcat(dir_name, "_");
		strcat(dir_name, get_start_time(s));
		clean_time_string(dir_name);
		set_output_directory(s, dir_name);
		}
#ifdef DEBUG
	printf("Checking output directory %s\n", get_output_directory(s));
#endif

	if ((d = opendir(get_output_directory(s))) != NULL)
		{

		/* The directory exists already. It MUST be empty for us to continue */
		if (!is_empty_directory(d))
			{
			printf("ERROR: %s is not empty\n \tPlease specify another directory or run with -T.\n",
				   get_output_directory(s));

			exit(EXIT_FAILURE);
			}

		/* The directory exists and is empty. We're done! */
		closedir(d);
		return FALSE;
		}

	/* The error value ENOENT means that either the directory doesn't exist,
     which is fine, or that the filename is zero-length, which is bad.
     All other errors are, of course, bad. 
*/
	if (errno != ENOENT)
		{
		print_error(s, get_output_directory(s), strerror(errno));
		return TRUE;
		}

	if (strlen(get_output_directory(s)) == 0)
		{

		/* Careful! Calling print_error will try to display a filename
       that is zero characters! In theory this should never happen 
       as our call to realpath should avoid this. But we'll play it safe. */
		print_error(s, "(output_directory)", "Output directory name unknown");
		return TRUE;
		}

	return (make_new_directory(s, get_output_directory(s)));
}

/*Create file type sub dirs, can get tricky when multiple types use one 
 extraction algorithm (OLE)*/
int create_sub_dirs(f_state *s)
{
	int		i = 0;
	int		j = 0;
	char	dir_name[MAX_STRING_LENGTH];
	char	ole_types[7][4] = { "ppt", "doc", "xls", "sdw", "mbd", "vis", "ole" };
	char	riff_types[2][4] = { "avi", "wav" };
	char	zip_types[8][5] = { "sxc", "sxw", "sxi", "sx", "jar","docx","pptx","xlsx" };

	for (i = 0; i < s->num_builtin; i++)
		{
		memset(dir_name, 0, MAX_STRING_LENGTH - 1);
		strcpy(dir_name, get_output_directory(s));
		strcat(dir_name, "/");
		strcat(dir_name, search_spec[i].suffix);
		make_new_directory(s, dir_name);

		if (search_spec[i].type == OLE)
			{
			for (j = 0; j < 7; j++)
				{
				if (strstr(ole_types[j], search_spec[i].suffix))
					continue;

				memset(dir_name, 0, MAX_STRING_LENGTH - 1);
				strcpy(dir_name, get_output_directory(s));
				strcat(dir_name, "/");
				strcat(dir_name, ole_types[j]);
				make_new_directory(s, dir_name);
				}
			}
		else if (get_mode(s, mode_write_all))
			{
			for (j = 0; j < 7; j++)
				{
				if (strstr(search_spec[i].suffix, ole_types[j]))
					{
					for (j = 0; j < 7; j++)
						{
						if (strstr(ole_types[j], search_spec[i].suffix))
							continue;

						memset(dir_name, 0, MAX_STRING_LENGTH - 1);
						strcpy(dir_name, get_output_directory(s));
						strcat(dir_name, "/");
						strcat(dir_name, ole_types[j]);
						make_new_directory(s, dir_name);
						}
					break;
					}

				}
			}

		if (search_spec[i].type == EXE)
			{
			memset(dir_name, 0, MAX_STRING_LENGTH - 1);
			strcpy(dir_name, get_output_directory(s));
			strcat(dir_name, "/");
			strcat(dir_name, "dll");
			make_new_directory(s, dir_name);
			}

		if (search_spec[i].type == RIFF)
			{
			for (j = 0; j < 2; j++)
				{
				if (strstr(ole_types[j], search_spec[i].suffix))
					continue;
				memset(dir_name, 0, MAX_STRING_LENGTH - 1);
				strcpy(dir_name, get_output_directory(s));
				strcat(dir_name, "/");
				strcat(dir_name, riff_types[j]);
				make_new_directory(s, dir_name);
				}
			}
		else if (get_mode(s, mode_write_all))
			{
			for (j = 0; j < 2; j++)
				{
				if (strstr(search_spec[i].suffix, riff_types[j]))
					{
					for (j = 0; j < 2; j++)
						{
						if (strstr(ole_types[j], search_spec[i].suffix))
							continue;

						memset(dir_name, 0, MAX_STRING_LENGTH - 1);
						strcpy(dir_name, get_output_directory(s));
						strcat(dir_name, "/");
						strcat(dir_name, riff_types[j]);
						make_new_directory(s, dir_name);
						}
					break;
					}

				}
			}

		if (search_spec[i].type == ZIP)
			{
			for (j = 0; j < 8; j++)
				{
				if (strstr(ole_types[j], search_spec[i].suffix))
					continue;

				memset(dir_name, 0, MAX_STRING_LENGTH - 1);
				strcpy(dir_name, get_output_directory(s));
				strcat(dir_name, "/");
				strcat(dir_name, zip_types[j]);
				make_new_directory(s, dir_name);
				}
			}
		else if (get_mode(s, mode_write_all))
			{
			for (j = 0; j < 8; j++)
				{
				if (strstr(search_spec[i].suffix, zip_types[j]))
					{
					for (j = 0; j < 5; j++)
						{
						if (strstr(ole_types[j], search_spec[i].suffix))
							continue;

						memset(dir_name, 0, MAX_STRING_LENGTH - 1);
						strcpy(dir_name, get_output_directory(s));
						strcat(dir_name, "/");
						strcat(dir_name, zip_types[j]);
						make_new_directory(s, dir_name);
						}
					break;
					}
				}
			}

		}

	return TRUE;
}

/*We have found a file so write to disk*/
int write_to_disk(f_state *s, s_spec *needle, u_int64_t len, unsigned char *buf, u_int64_t t_offset)
{

	char		fn[MAX_STRING_LENGTH];
	FILE		*f;
	FILE		*test;
	long		byteswritten = 0;
	char		temp[32];
	u_int64_t	block = ((t_offset) / s->block_size);
	int			i = 1;

	//Name files based on their block offset
	needle->written = TRUE;

	if (get_mode(s, mode_write_audit))
		{
		if (needle->comment == NULL)
			strcpy(needle->comment, " ");

		audit_msg(s,
				  "%d:\t%10ld.%s \t %10s \t %10llu \t %s",
				  s->fileswritten,
				  block,
				  needle->suffix,
				  human_readable(len, temp),
				  t_offset,
				  needle->comment);
		s->fileswritten++;
		needle->found++;
		return TRUE;
		}

	snprintf(fn,
			 MAX_STRING_LENGTH,
			 "%s/%s/%0*llu.%s",
			 s->output_directory,
			 needle->suffix,
			 8,
			 block,
			 needle->suffix);

	test = fopen(fn, "rb");
	while (test)	/*Test the files to make sure we have unique file names, some headers could be within the same block*/
		{
		memset(fn, 0, MAX_STRING_LENGTH - 1);
		snprintf(fn,
				 MAX_STRING_LENGTH - 1,
				 "%s/%s/%0*llu_%d.%s",
				 s->output_directory,
				 needle->suffix,
				 8,
				 block,
				 i,
				 needle->suffix);
		i++;
		fclose(test);
		test = fopen(fn, "rb");
		}

	if (!(f = fopen(fn, "wb")))
		{
		printf("fn = %s  failed\n", fn);
		fatal_error(s, "Can't open file for writing \n");
		}

	if ((byteswritten = fwrite(buf, sizeof(char), len, f)) != len)
		{
		fprintf(stderr, "fn=%s bytes=%lu\n", fn, byteswritten);
		fatal_error(s, "Error writing file\n");
		}

	if (fclose(f))
		{
		fatal_error(s, "Error closing file\n");
		}

	if (needle->comment == NULL)
		strcpy(needle->comment, " ");
	
	if (i == 1) {
      audit_msg(s,"%d:\t%08llu.%s \t %10s \t %10llu \t %s",
         s->fileswritten,
         block,
         needle->suffix,
         human_readable(len, temp),
         t_offset,
         needle->comment);
         } else {
      audit_msg(s,"%d:\t%08llu_%d.%s \t %10s \t %10llu \t %s",
         s->fileswritten,
         block,
         i - 1,
         needle->suffix, 
         human_readable(len, temp),
         t_offset,
         needle->comment);
         }

/*
	audit_msg(s,"%d:\t%10llu.%s \t %10s \t %10llu \t %s",
			  s->fileswritten,
			  block,
			  needle->suffix,
			  human_readable(len, temp),
			  t_offset,
			  needle->comment);

*/
	s->fileswritten++;
	needle->found++;
	return TRUE;
}


================================================
FILE: engine.c
================================================

	 /* FOREMOST
 *
 * By Jesse Kornblum, Kris Kendall, & Nick Mikus
 *
 * This is a work of the US Government. In accordance with 17 USC 105,
 * copyright protection is not available for any work of the US Government.
 *
 * This program is distributed in the hope that it will be useful, but
 * WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 *
 */

#include "main.h"

int user_interrupt (f_state * s, f_info * i)
	{
	audit_msg(s, "Interrupt received at %s", current_time());

	/* RBF - Write user_interrupt */
	fclose(i->handle);
	free(s);
	free(i);
	cleanup_output(s);
	exit(-1);
	return FALSE;
	}

unsigned char *read_from_disk(u_int64_t offset, f_info *i, u_int64_t length)
{

	u_int64_t		bytesread = 0;
	unsigned char	*newbuf = (unsigned char *)malloc(length * sizeof(char));
	if (!newbuf) {
           fprintf(stderr, "Ran out of memory in read_from_disk()\n");
           exit(1);
         }

	fseeko(i->handle, offset, SEEK_SET);
	bytesread = fread(newbuf, 1, length, i->handle);
	if (bytesread != length)
	{
		free(newbuf);
		return NULL;
	}
	else
	{
		return newbuf;
	}
}

/*
   Perform a modified boyer-moore string search (w/ support for wildcards and case-insensitive searches)
   and allows the starting position in the buffer to be manually set, which allows data to be skipped
*/
unsigned char *bm_search_skipn(unsigned char *needle, size_t needle_len, unsigned char *haystack,
							   size_t haystack_len, size_t table[UCHAR_MAX + 1], int casesensitive,
							   int searchtype, int start_pos)
{
	register size_t shift = 0;
	register size_t pos = start_pos;
	unsigned char	*here;

	if (needle_len == 0)
		return haystack;

	if (searchtype == SEARCHTYPE_FORWARD || searchtype == SEARCHTYPE_FORWARD_NEXT)
		{
		while (pos < haystack_len)
			{
			while (pos < haystack_len && (shift = table[(unsigned char)haystack[pos]]) > 0)
				{
				pos += shift;
				}

			if (0 == shift)
				{
				here = (unsigned char *) &haystack[pos - needle_len + 1];
				if (0 == memwildcardcmp(needle, here, needle_len, casesensitive))
					{
					return (here);
					}
				else
					pos++;
				}
			}

		return NULL;
		}
	else if (searchtype == SEARCHTYPE_REVERSE)	//Run our search backwards
		{
		while (pos < haystack_len)
			{
			while
			(
				pos < haystack_len &&
				(shift = table[(unsigned char)haystack[haystack_len - pos - 1]]) > 0
			)
				{
				pos += shift;
				}

			if (0 == shift)
				{
				if (0 == memwildcardcmp(needle, here = (unsigned char *) &haystack[haystack_len - pos - 1],
					needle_len, casesensitive))
					{
					return (here);
					}
				else
					pos++;
				}
			}

		return NULL;
		}

	return NULL;
}

/*
   Perform a modified boyer-moore string search (w/ support for wildcards and case-insensitive searches)
   and allows the starting position in the buffer to be manually set, which allows data to be skipped
*/
unsigned char *bm_search(unsigned char *needle, size_t needle_len, unsigned char *haystack,
						 size_t haystack_len, size_t table[UCHAR_MAX + 1], int case_sen,
						 int searchtype)
{

	//printf("The needle2 is:\t");
	//printx(needle,0,needle_len);
	return bm_search_skipn(needle,
						   needle_len,
						   haystack,
						   haystack_len,
						   table,
						   case_sen,
						   searchtype,
						   needle_len - 1);

}

void setup_stream(f_state *s, f_info *i)
{
	char	buffer[MAX_STRING_LENGTH];
	u_int64_t	skip = (((u_int64_t) s->skip) * ((u_int64_t) s->block_size));
#ifdef DEBUG
	printf("s->skip=%d s->block_size=%d total=%llu\n",
		   s->skip,
		   s->block_size,
		   (((u_int64_t) s->skip) * ((u_int64_t) s->block_size)));
#endif
	i->bytes_read = 0;
	i->total_megs = i->total_bytes / ONE_MEGABYTE;

	if (i->total_bytes != 0)
		{
		audit_msg(s,
				  "Length: %s (%llu bytes)",
				  human_readable(i->total_bytes, buffer),
				  i->total_bytes);
		}
	else
		audit_msg(s, "Length: Unknown");

	if (s->skip != 0)
		{
		audit_msg(s, "Skipping: %s (%llu bytes)", human_readable(skip, buffer), skip);
		fseeko(i->handle, skip, SEEK_SET);
		if (i->total_bytes != 0)
			i->total_bytes -= skip;
		}

	audit_msg(s, " ");

#ifdef __WIN32
	i->last_read = 0;
	i->overflow_count = 0;
#endif

}

void audit_layout(f_state *s)
{
	audit_msg(s,
			  "Num\t %s (bs=%d)\t %10s\t %s\t %s \n",
			  "Name",
			  s->block_size,
			  "Size",
			  "File Offset",
			  "Comment");

}

void dumpInd(unsigned char *ind, int bs)
{
	int i = 0;
	printf("\n/*******************************/\n");

	while (bs > 0)
		{
		if (i % 10 == 0)
			printf("\n");

		//printx(ind,0,10);
		printf("%4u ", htoi(ind, FOREMOST_LITTLE_ENDIAN));

		bs -= 4;
		ind += 4;
		i++;
		}

	printf("\n/*******************************/\n");
}

/********************************************************************************
 *Function: ind_block
 *Description: check if the block foundat is pointing to looks like an indirect 
 *	block
 *Return: TRUE/FALSE
 **********************************************************************************/
int ind_block(unsigned char *foundat, u_int64_t buflen, int bs)
{

	unsigned char	*temp = foundat;
	int				jump = 12 * bs;
	unsigned int	block = 0;
	unsigned int	block2 = 0;
	unsigned int	dif = 0;
	int				i = 0;
	unsigned int	one = 1;
	unsigned int	numbers = (bs / 4) - 1;

	//int reconstruct=FALSE;

	/*Make sure we don't jump past the end of the buffer*/
	if (buflen < jump + 16)
		return FALSE;

	while (i < numbers)
		{
		block = htoi(&temp[jump + (i * 4)], FOREMOST_LITTLE_ENDIAN);

		if (block < 0)
			return FALSE;

		if (block == 0)
			{
			break;
			}

		i++;
		block2 = htoi(&temp[jump + (i * 4)], FOREMOST_LITTLE_ENDIAN);
		if (block2 < 0)
			return FALSE;

		if (block2 == 0)
			{
			break;
			}

		dif = block2 - block;

		if (dif == one)
		{

#ifdef DEBUG
			printf("block1:=%u, block2:=%u dif=%u\n", block, block2, dif);
#endif
		}
		else
		{

#ifdef DEBUG
			printf("Failure, dif!=1\n");
			printf("\tblock1:=%u, block2:=%u dif=%u\n", block, block2, dif);
#endif

			return FALSE;
		}

#ifdef DEBUG
		printf("block1:=%u, block2:=%u dif=%u\n", block, block2, dif);
#endif
		}

	if (i == 0)
		return FALSE;

	/*Check if the rest of the bytes are zero'd out */
	for (i = i + 1; i < numbers; i++)
		{
		block = htoi(&temp[jump + (i * 4)], FOREMOST_LITTLE_ENDIAN);
		if (block != 0)
			{

			//printf("Failure, 0 test\n");
			return FALSE;
			}
		}

	return TRUE;
}

/********************************************************************************
 *Function: search_chunk
 *Description: Analyze the given chunk by running each defined search spec on it
 *Return: TRUE/FALSE
 **********************************************************************************/
int search_chunk(f_state *s, unsigned char *buf, f_info *i, u_int64_t chunk_size, u_int64_t f_offset)
{

	u_int64_t		c_offset = 0;
	//u_int64_t               foundat_off = 0;
	//u_int64_t               buf_off = 0;

	unsigned char	*foundat = buf;
	unsigned char	*current_pos = NULL;
	unsigned char	*header_pos = NULL;
	unsigned char	*newbuf = NULL;
	unsigned char	*ind_ptr = NULL;
	u_int64_t		current_buflen = chunk_size;
	int				tryBS[3] = { 4096, 1024, 512 };
	unsigned char	*extractbuf = NULL;
	u_int64_t		file_size = 0;
	s_spec			*needle = NULL;
	int				j = 0;
	int				bs = 0;
	int				rem = 0;
	int				x = 0;
	int				found_ind = FALSE;
	 off_t saveme;
	//char comment[32];
	for (j = 0; j < s->num_builtin; j++)
		{
		needle = &search_spec[j];
		foundat = buf;										/*reset the buffer for the next search spec*/
#ifdef DEBUG
		printf("	SEARCHING FOR %s's\n", needle->suffix);
#endif
		bs = 0;
		current_buflen = chunk_size;
		while (foundat)
			{
			needle->written = FALSE;
			found_ind = FALSE;
			memset(needle->comment, 0, COMMENT_LENGTH - 1);
                        if (chunk_size <= (foundat - buf)) {
#ifdef DEBUG
				printf("avoided seg fault in search_chunk()\n");
#endif
				foundat = NULL;
				break;
			}
			current_buflen = chunk_size - (foundat - buf);

			//if((foundat-buf)< 1 ) break;	
#ifdef DEBUG
			//foundat_off=foundat;
			//buf_off=buf;
			//printf("current buf:=%llu (foundat-buf)=%llu \n", current_buflen, (u_int64_t) (foundat_off - buf_off));
#endif
			if (signal_caught == SIGTERM || signal_caught == SIGINT)
				{
				user_interrupt(s, i);
				printf("Cleaning up.\n");
				signal_caught = 0;
				}

			if (get_mode(s, mode_quick))					/*RUN QUICK SEARCH*/
			{
#ifdef DEBUG

				//printf("quick mode is on\n");
#endif

				/*Check if we are not on a block head, adjust if so*/
				rem = (foundat - buf) % s->block_size;
				if (rem != 0)
					{
					foundat += (s->block_size - rem);
					}

				if (memwildcardcmp(needle->header, foundat, needle->header_len, needle->case_sen
					) != 0)
					{

					/*No match, jump to the next block*/
					if (current_buflen > s->block_size)
						{
						foundat += s->block_size;
						continue;
						}
					else									/*We are out of buffer lets go to the next search spec*/
						{
						foundat = NULL;
						break;
						}
					}

				header_pos = foundat;
			}
			else											/**********RUN STANDARD SEARCH********************/
				{
				foundat = bm_search(needle->header,
									needle->header_len,
									foundat,
									current_buflen,			//How much to search through
									needle->header_bm_table,
									needle->case_sen,		//casesensative
									SEARCHTYPE_FORWARD);

				header_pos = foundat;
				}

			if (foundat != NULL && foundat >= 0)			/*We got something, run the appropriate heuristic to find the EOF*/
				{
				current_buflen = chunk_size - (foundat - buf);

				if (get_mode(s, mode_ind_blk))
				{
#ifdef DEBUG
					printf("ind blk detection on\n");
#endif

					//dumpInd(foundat+12*1024,1024);
					for (x = 0; x < 3; x++)
						{
						bs = tryBS[x];

						if (ind_block(foundat, current_buflen, bs))
							{
							if (get_mode(s, mode_verbose))
								{
								sprintf(needle->comment, " (IND BLK bs:=%d)", bs);
								}

							//dumpInd(foundat+12*bs,bs);
#ifdef DEBUG
							printf("performing mem move\n");
#endif
							if(current_buflen >  13 * bs)//Make sure we have enough buffer
								{
								if (!memmove(foundat + 12 * bs, foundat + 13 * bs, current_buflen - 13 * bs))
								break;

								found_ind = TRUE;
#ifdef DEBUG
								printf("performing mem move complete\n");
#endif
								ind_ptr = foundat + 12 * bs;
								current_buflen -= bs;
								chunk_size -= bs;
								break;
								}
							}

						}

				}

				c_offset = (foundat - buf);
				current_pos = foundat;

				/*Now lets analyze the file and see if we can determine its size*/

				// printf("c_offset=%llu %x %x %llx\n", c_offset,foundat,buf,c_offset);
				foundat = extract_file(s, c_offset, foundat, current_buflen, needle, f_offset);
#ifdef DEBUG
				if (foundat == NULL)
					{
					printf("Foundat == NULL!!!\n");
					}
#endif
				if (get_mode(s, mode_write_all))
					{
					if (needle->written == FALSE)
						{

						/*write every header we find*/
						if (current_buflen >= needle->max_len)
							{
							file_size = needle->max_len;
							}
						else
							{
							file_size = current_buflen;
							}

						sprintf(needle->comment, " (Header dump)");
						extractbuf = (unsigned char *)malloc(file_size * sizeof(char));
						memcpy(extractbuf, header_pos, file_size);
						write_to_disk(s, needle, file_size, extractbuf, c_offset + f_offset);
						free(extractbuf);
						}
					}
				else if (!foundat)							/*Should we search further?*/
					{

					/*We couldn't determine where the file ends, now lets check to see
			* if we should try again
			*/
					if (current_buflen < needle->max_len)	/*We need to bridge the gap*/
					{
#ifdef DEBUG
						printf("	Bridge the gap\n");
#endif
						saveme = ftello(i->handle);
						/*grow the buffer and try to extract again*/
						newbuf = read_from_disk(c_offset + f_offset, i, needle->max_len);
						if (newbuf == NULL)
							break;
						current_pos = extract_file(s,
												   c_offset,
												   newbuf,
												   needle->max_len,
												   needle,
												   f_offset);
						
						/*Lets put the fp back*/
						fseeko(i->handle, saveme, SEEK_SET);
						

						free(newbuf);
					}
					else
						{
						foundat = header_pos;				/*reset the foundat pointer to the location of the last header*/
						foundat += needle->header_len + 1;	/*jump past the header*/
						}
					}


				}

			if (found_ind)
				{

				/*Put the ind blk back in, re-arrange the buffer so that the future blks names come out correct*/
#ifdef DEBUG
						printf("Replacing the ind block\n");
#endif
				/*This is slow, should we do this??????*/
				if (!memmove(ind_ptr + 1 * bs, ind_ptr, current_buflen - 13 * bs))
					break;
				memset(ind_ptr, 0, bs - 1);
				chunk_size += bs;
				memset(needle->comment, 0, COMMENT_LENGTH - 1);
				}
			}	//end while
		}

	return TRUE;
}

/********************************************************************************
 *Function: search_stream
 *Description: Analyze the file by reading 1 chunk (default: 100MB) at a time and 
 *passing it to	search_chunk
 *Return: TRUE/FALSE
 **********************************************************************************/
int search_stream(f_state *s, f_info *i)
{
	u_int64_t		bytesread = 0;
	u_int64_t		f_offset = 0;
	u_int64_t		chunk_size = ((u_int64_t) s->chunk_size) * MEGABYTE;
	unsigned char	*buf = (unsigned char *)malloc(sizeof(char) * chunk_size);

	setup_stream(s, i);

	audit_layout(s);
#ifdef DEBUG
	printf("\n\t READING THE FILE INTO MEMORY\n");
#endif

	while ((bytesread = fread(buf, 1, chunk_size, i->handle)) > 0)
		{
		if (signal_caught == SIGTERM || signal_caught == SIGINT)
			{
			user_interrupt(s, i);
			printf("Cleaning up.\n");
			signal_caught = 0;
			}

#ifdef DEBUG
		printf("\n\tbytes_read:=%llu\n", bytesread);
#endif
		search_chunk(s, buf, i, bytesread, f_offset);
		f_offset += bytesread;
		if (!get_mode(s, mode_quiet))
			{
			fprintf(stderr, "*");

			//displayPosition(s,i,f_offset);
			}

		/*FIX ME***
	* We should jump back and make sure we didn't miss any headers that are 
	* bridged between chunks.  What is the best way to do this?\
  	*/
		}

	if (!get_mode(s, mode_quiet))
		{
		fprintf(stderr, "|\n");
		}

#ifdef DEBUG
	printf("\n\tDONE READING bytes_read:=%llu\n", bytesread);
#endif
	if (signal_caught == SIGTERM || signal_caught == SIGINT)
		{
		user_interrupt(s, i);
		printf("Cleaning up.\n");
		signal_caught = 0;
		}

	free(buf);
	return FALSE;
}

void audit_start(f_state *s, f_info *i)
{
	if (!get_mode(s, mode_quiet))
		{
		fprintf(stderr, "Processing: %s\n|", i->file_name);
		}

	audit_msg(s, FOREMOST_DIVIDER);
	audit_msg(s, "File: %s", i->file_name);
	audit_msg(s, "Start: %s", current_time());
}

void audit_finish(f_state *s, f_info *i)
{
	audit_msg(s, "Finish: %s", current_time());
}

int process_file(f_state *s)
{

	//printf("processing file\n");
	f_info	*i = (f_info *)malloc(sizeof(f_info));
	char	temp[PATH_MAX];

	if ((realpath(s->input_file, temp)) == NULL)
		{
		print_error(s, s->input_file, strerror(errno));
		return TRUE;
		}

	i->file_name = strdup(s->input_file);
	i->is_stdin = FALSE;
	audit_start(s, i);

	//  printf("opening file %s\n",i->file_name);
#if defined(__LINUX)
	#ifdef DEBUG
	printf("Using 64 bit fopen\n");
	#endif
	i->handle = fopen64(i->file_name, "rb");
#elif defined(__WIN32)

	/*I would like to be able to read from
	* physical devices in Windows, have played
	* with different options to fopen and the
	* dd src says you need write access on WinXP
	* but nothing seems to work*/
	i->handle = fopen(i->file_name, "rb");
#else
	i->handle = fopen(i->file_name, "rb");
#endif
	if (i->handle == NULL)
		{
		print_error(s, s->input_file, strerror(errno));
		audit_msg(s, "Error: %s", strerror(errno));
		return TRUE;
		}

	i->total_bytes = find_file_size(i->handle);
	search_stream(s, i);
	audit_finish(s, i);

	fclose(i->handle);
	free(i);
	return FALSE;
}

int process_stdin(f_state *s)
{
	f_info	*i = (f_info *)malloc(sizeof(f_info));

	i->file_name = strdup("stdin");
	s->input_file = "stdin";
	i->handle = stdin;
	i->is_stdin = TRUE;

	/* We can't compute the size of this stream, we just ignore it*/
	i->total_bytes = 0;
	audit_start(s, i);

	search_stream(s, i);

	free(i->file_name);
	free(i);
	return FALSE;
}


================================================
FILE: extract.c
================================================

	 /* extract.c
 * Copyright (c) 2005, Nick Mikus
 * This file contains the file specific functions used to extract
 * data from an image.
 *
 * Each has a similar structure
 * f_state *s:  state of the program.
 * c_offset:	offset that the header was recorded within the current chunk
 * foundat:	The location the header was "foundat"
 * buflen:	How much buffer is left until the end of the current chunk
 * needle:	Search specification
 * f_offset:	Offset that the current chunk is located within the file
 */

#include "main.h"
#include "extract.h"
#include "ole.h"
extern unsigned char buffer[OUR_BLK_SIZE];
extern int	verbose;
extern int	dir_count;
extern int	block_list[OUR_BLK_SIZE / sizeof(int)];
extern int	*FAT;
extern char *extract_name;
extern int	extract;
extern int	FATblk;
extern int	highblk;

/********************************************************************************
 *Function: extract_zip
 *Description: Given that we have a ZIP header jump through the file headers
    until we reach the EOF.
 *Return: A pointer to where the EOF of the ZIP is in the current buffer
**********************************************************************************/
unsigned char *extract_zip(f_state *s, u_int64_t c_offset, unsigned char *foundat, u_int64_t buflen,
						   s_spec *needle, u_int64_t f_offset, char *type)
{
	unsigned char				*currentpos = NULL;
	unsigned char				*buf = foundat;
	unsigned short				comment_length = 0;
	unsigned char				*extractbuf = NULL;
	struct zipLocalFileHeader	localFH;
	u_int64_t					bytes_to_search = 50 * KILOBYTE;
	u_int64_t					file_size = 0;
	int							oOffice = FALSE;
	int							office2007 = FALSE;

	char						comment[32];
	localFH.genFlag=0;
	localFH.compressed=0;
	localFH.uncompressed =0;
	if (buflen < 100)
		return NULL;

	if (strncmp((char *) &foundat[30], "mimetypeapplication/vnd.sun.xml.", 32) == 0)
		{
		oOffice = TRUE;
		if (strncmp((char *) &foundat[62], "calc", 4) == 0)
			{
			needle->suffix = "sxc";
			}
		else if (strncmp((char *) &foundat[62], "impress", 7) == 0)
			{
			needle->suffix = "sxi";
			}
		else if (strncmp((char *) &foundat[62], "writer", 6) == 0)
			{
			needle->suffix = "sxw";
			}
		else
			{
			sprintf(comment, " (OpenOffice Doc?)");
			strcat(needle->comment, comment);
			needle->suffix = "sx";
			}
		}
	else
		{
		needle->suffix = "zip";
		}

	
	while (1)	//Jump through each local file header until the central directory structure is reached, much faster than searching 
		{
		
		if (foundat[2] == '\x03' && foundat[3] == '\x04')	//Verfiy we are looking at a local file header//
			{
			
			localFH.compression=htos(&foundat[8], FOREMOST_LITTLE_ENDIAN);
			localFH.compressed = htoi(&foundat[18], FOREMOST_LITTLE_ENDIAN);
			localFH.uncompressed = htoi(&foundat[22], FOREMOST_LITTLE_ENDIAN);
			localFH.filename_length = htos(&foundat[26], FOREMOST_LITTLE_ENDIAN);
			localFH.extra_length = htos(&foundat[28], FOREMOST_LITTLE_ENDIAN);;
			localFH.genFlag = htos(&foundat[6], FOREMOST_LITTLE_ENDIAN);	

			// Sanity checking
			if (localFH.compressed > needle->max_len)
				return foundat + needle->header_len;

			if (localFH.filename_length > 100)
				return foundat + needle->header_len;

			//Check if we should grab more from the disk
			if (localFH.compressed + 30 > buflen - (foundat - buf))
				{
				return NULL;								
				}
				
			//Size of the local file header data structure
			foundat += 30;									

			if (strcmp(needle->suffix,"zip")==0)
				{
				if (strncmp((char *)foundat, "content.xml", 11) == 0 && strcmp(needle->suffix,"zip")==0)
					{
					oOffice = TRUE;
					sprintf(comment, " (OpenOffice Doc?)");
					strcat(needle->comment, comment);
					needle->suffix = "sx";
					}
				else if (strstr((char *)foundat, ".class") || strstr((char *)foundat, ".jar") ||
						 strstr((char *)foundat, ".java"))
					{
					needle->suffix = "jar";
					}
				else if(strncmp((char *)foundat, "[Content_Types].xml",19)==0)
					{
						office2007=TRUE;
					}
				else if(strncmp((char *)foundat, "ppt/slides",10)==0 && office2007==TRUE)
					{
						needle->suffix = "pptx";
					}
				else if(strncmp((char *)foundat, "word/document.xml",17)==0 && office2007==TRUE)
					{	
						needle->suffix = "docx";
					}
				else if(strncmp((char *)foundat, "xl/workbook.xml",15)==0 && office2007==TRUE)
					{	
						needle->suffix = "xlsx";
					}
					
					
				else
					{
						printf("foundat=%s\n",foundat);
					}	
				}

			foundat += localFH.compressed;
			foundat += localFH.filename_length;
			foundat += localFH.extra_length;
			
			if (localFH.genFlag == 8)
				{
#ifdef DEBUG	
					fprintf(stderr,"We have extra stuff!!!");
#endif
				}
			
			
			if(localFH.genFlag & 1<<3 && localFH.uncompressed==0 &&  localFH.compressed==0 )
				{
#ifdef DEBUG
				fprintf(stderr,"No data to jmp Just search for the next file Footer (localFH.genFlag:=%d)\n",localFH.genFlag);
#endif
				break;
				}

	#ifdef DEBUG
				printf("localFH.compressed:=%d  localFH.uncompressed:=%d\n\t jumping %d bytes filename=%d bytes",
					   localFH.compressed,
					   localFH.uncompressed,localFH.filename_length+localFH.compressed+localFH.extra_length,localFH.filename_length);
				printx(foundat, 0, 16);
	#endif

			}	
		else if (oOffice && localFH.genFlag == 8)
			{
			break;
			}
		else
			{
			break;
			}
			
		
	}//end while loop
	
	if (oOffice)
		{

		//We have an OO doc how long should we search for?
		bytes_to_search = 1 * MEGABYTE;
		}
	else if (localFH.genFlag & 1<<3 && localFH.uncompressed==0 &&  localFH.compressed==0 )
		{
		bytes_to_search = needle->max_len;
		}
	else
		{
		bytes_to_search = (buflen < (foundat - buf) ? buflen : buflen - (foundat - buf));
		}

	//Make sure we are not searching more than what he have
        if (buflen <= (foundat - buf)) {
#ifdef DEBUG
		printf("avoided bug in extract_zip!\n");
#endif
		bytes_to_search = 0;
	} else {
		if (buflen - (foundat - buf) < bytes_to_search)
		{
		bytes_to_search = buflen - (foundat - buf);
		}
	}


	currentpos = foundat;
#ifdef DEBUG
	printf("Search for the footer bytes_to_search:=%lld buflen:=%lld\n", bytes_to_search, buflen);
#endif

	foundat = bm_search(needle->footer,
						needle->footer_len,
						foundat,
						bytes_to_search,
						needle->footer_bm_table,
						needle->case_sen,
						SEARCHTYPE_FORWARD);
#ifdef DEBUG
	printf("Search complete \n");
#endif

	if (foundat)											/*Found the end of the central directory structure, determine the exact length and extract*/
	{

		/*Jump to the comment length field*/
#ifdef DEBUG
		printf("distance searched:=%lu\n", foundat - currentpos);
#endif
		if (buflen - (foundat - buf) > 20)
			{
			foundat += 20;
			}
		else
			{
			return NULL;
			}

		comment_length = htos(foundat, FOREMOST_LITTLE_ENDIAN);
		foundat += comment_length + 2;
		file_size = (foundat - buf);
#ifdef DEBUG
		printf("File size %lld\n", file_size);
		printf("Found a %s type:=%s\n", needle->suffix, type);
#endif
		extractbuf = buf;
		if (strcmp(type,"all")==0 || strcmp(type,needle->suffix)==0)
		{
#ifdef DEBUG
			printf("Writing a %s to disk\n", needle->suffix);
#endif
			write_to_disk(s, needle, file_size, extractbuf, c_offset + f_offset);
		}

#ifdef DEBUG
		printf("Found a %s\n", needle->suffix);
#endif
		return foundat-2;
	}

	if (bytes_to_search > buflen - (currentpos - buf))
		return NULL;

#ifdef DEBUG
	printf("I give up \n");
#endif
	return currentpos;
}

/********************************************************************************
 *Function: extract_pdf
 *Description: Given that we have a PDF header check if it is Linearized, if so
    grab the file size and we are done, else search for the %%EOF
*Return: A pointer to where the EOF of the PDF is in the current buffer
**********************************************************************************/
unsigned char *extract_pdf(f_state *s, u_int64_t c_offset, unsigned char *foundat, u_int64_t buflen,
						   s_spec *needle, u_int64_t f_offset)
{
	unsigned char		*currentpos = NULL;
	unsigned char		*buf = foundat;
	unsigned char		*extractbuf = NULL;
	unsigned char		*tempsize;
	unsigned long int	size = 0;
	int					file_size = 0;
	unsigned char		*header = foundat;
	int					bytes_to_search = 0;
	char				comment[32];

	foundat += needle->header_len;	/* Jump Past the %PDF HEADER */
	currentpos = foundat;

#ifdef DEBUG
	printf("PDF SEARCH\n");
#endif

	/*Determine when we have searched enough*/
	if (buflen >= needle->max_len)
		{
		bytes_to_search = needle->max_len;
		}
	else
		{
		bytes_to_search = buflen;
		}

	/*Check if the buffer is less than 100 bytes, if so search what we have*/
	if (buflen < 512)
		return NULL;
	else
		{
		currentpos = foundat;

		/*Check for .obj in the first 100 bytes*/
		foundat = bm_search(needle->markerlist[1].value,
							needle->markerlist[1].len,
							foundat,
							100,
							needle->markerlist[1].marker_bm_table,
							needle->case_sen,
							SEARCHTYPE_FORWARD);

		if (!foundat)
		{
#ifdef DEBUG
			printf("no obj found\n");
#endif
			return currentpos + 100;
		}

		foundat = currentpos;

		/*Search for "./L " to see if the file is linearized*/
		foundat = bm_search(needle->markerlist[2].value,
							needle->markerlist[2].len,
							foundat,
							512,
							needle->markerlist[2].marker_bm_table,
							needle->case_sen,
							SEARCHTYPE_FORWARD);

		if (foundat)
			{
			foundat = bm_search(needle->markerlist[0].value,
								needle->markerlist[0].len,
								foundat,
								512,
								needle->markerlist[0].marker_bm_table,
								needle->case_sen,
								SEARCHTYPE_FORWARD);
			}
		else
		{
#ifdef DEBUG
			printf("not linearized\n");
#endif
		}
		}

	if (foundat)					/*The PDF is linearized extract the size and we are done*/
		{
		sprintf(comment, " (PDF is Linearized)");
		strcat(needle->comment, comment);

		foundat += needle->markerlist[0].len;
		tempsize = (unsigned char *)malloc(8 * sizeof(char));
		tempsize = memcpy(tempsize, foundat, 8);
		size = atoi((char *)tempsize);

		free(tempsize);
		if (size <= 0)
			return foundat;
		if (size > buflen)
			{
			if (size > needle->max_len)
				return foundat;
			else
				return NULL;
			}

		header += size;
		foundat = header;
		foundat -= needle->footer_len;

		/*Jump back 10 bytes and see if we actually have and EOF there*/
		foundat -= 10;
		currentpos = foundat;
		foundat = bm_search(needle->footer,
							needle->footer_len,
							foundat,
							needle->footer_len + 9,
							needle->footer_bm_table,
							needle->case_sen,
							SEARCHTYPE_FORWARD);
		if (foundat)				/*There is an valid EOF at the end, Write to disk*/
			{
			foundat += needle->footer_len + 1;
			file_size = (foundat - buf);

			extractbuf = buf;
			write_to_disk(s, needle, file_size, extractbuf, c_offset + f_offset);

			return foundat;
			}

		return NULL;

		}
	else							/*Search for Linearized PDF failed, just look for %%EOF */
	{
#ifdef DEBUG
		printf("	Linearized search failed, searching %d bytes, buflen:=%lld\n",
			   bytes_to_search,
			   buflen - (header - buf));
#endif
		foundat = currentpos;
		foundat = bm_search(needle->footer,
							needle->footer_len,
							foundat,
							bytes_to_search,
							needle->footer_bm_table,
							needle->case_sen,
							SEARCHTYPE_FORWARD);

		if (foundat)				/*Write the non-linearized PDF to disk*/
			{
			foundat += needle->footer_len + 1;
			file_size = (foundat - buf);
			extractbuf = buf;

			write_to_disk(s, needle, file_size, extractbuf, c_offset + f_offset);

			return foundat;

			}

		return NULL;
	}

}

/********************************************************************************
 *Function: extract_cpp
 *Description: Use keywords to attempt to find C/C++ source code
*Return: A pointer to where the EOF of the CPP file is in the current buffer
**********************************************************************************/
unsigned char *extract_cpp(f_state *s, u_int64_t c_offset, unsigned char *foundat, u_int64_t buflen,
						   s_spec *needle, u_int64_t f_offset)
{

	unsigned char	*header = foundat;
	unsigned char	*buf = foundat;
	unsigned char	*extractbuf = NULL;
	int				end = 0;
	int				start = 0;
	int				i = 0;
	int				marker_score = 0;
	int				ok = FALSE;
	int				file_size = 0;
	unsigned char	*footer = NULL;

	/*Search for a " or a < within 20 bytes of a #include statement*/
	for (i = 0; i < 20; i++)
		{
		if (foundat[i] == '\x22' || foundat[i] == '\x3C')
			{
			ok = TRUE;
			}
		}

	if (!ok)
		return foundat + needle->header_len;

	/*Keep running through the buffer until an non printable character is reached*/
	while (isprint(foundat[end]) || foundat[end] == '\x0a' || foundat[end] == '\x09')
		{
		end++;
		}

	foundat += end - 1;
	footer = foundat;

	if (end < 50)
		return foundat;

	/*Now lets go the other way and grab all those comments at the begining of the file*/
	while (isprint(buf[start]) || buf[start] == '\x0a' || buf[start] == '\x09')
		{
		start--;
		}

	header = &buf[start + 1];
	file_size = (footer - header);

	foundat = header;

	/*Now we have an ascii file to look for keywords in*/
	foundat = bm_search(needle->footer,
						needle->footer_len,
						header,
						file_size,
						needle->footer_bm_table,
						FALSE,
						SEARCHTYPE_FORWARD);
	if (foundat)
		marker_score += 1;

	foundat = header;
	foundat = bm_search(needle->markerlist[0].value,
						needle->markerlist[0].len,
						header,
						file_size,
						needle->markerlist[0].marker_bm_table,
						1,
						SEARCHTYPE_FORWARD);
	if (foundat)
		marker_score += 1;

	if (marker_score == 0)
		return foundat;

	if (foundat)
		{
		extractbuf = buf;
		write_to_disk(s, needle, file_size, extractbuf, c_offset + f_offset + start + 1);
		
		return footer;

		}

	return NULL;
}

/********************************************************************************
 *Function: extract_htm
 *Description: Given that we have a HTM header
    search for the file EOF and check that the bytes areound the header are ascii
*Return: A pointer to where the EOF of the HTM is in the current buffer
**********************************************************************************/
unsigned char *extract_htm(f_state *s, u_int64_t c_offset, unsigned char *foundat, u_int64_t buflen,
						   s_spec *needle, u_int64_t f_offset)
{
	unsigned char	*buf = foundat;
	unsigned char	*extractbuf = NULL;
	unsigned char	*currentpos = NULL;

	int				bytes_to_search = 0;
	int				i = 0;
	int				file_size = 0;

	/*Jump past the <HTML tag*/
	foundat += needle->header_len;

	/*Check the first 16 bytes to see if they are ASCII*/
	for (i = 0; i < 16; i++)
		{
		if (!isprint(foundat[i]) && foundat[i] != '\x0a' && foundat[i] != '\x09')
			{
			return foundat + 16;
			}
		}

	/*Determine if the buffer is large enough to encompass a reasonable search*/
	if (buflen < needle->max_len)
		{
		bytes_to_search = buflen - (foundat - buf);
		}
	else
		{
		bytes_to_search = needle->max_len;
		}

	/*Store the current position and search for the HTML> tag*/
	currentpos = foundat;
	foundat = bm_search(needle->footer,
						needle->footer_len,
						foundat,
						bytes_to_search,
						needle->footer_bm_table,
						needle->case_sen,
						SEARCHTYPE_FORWARD);
	if (foundat)	//Found the footer, write to disk
		{
		file_size = (foundat - buf) + needle->footer_len;
		extractbuf = buf;
		write_to_disk(s, needle, file_size, extractbuf, c_offset + f_offset);
		foundat += needle->footer_len;
		return foundat;

		}
	else
		{
		return NULL;
		}

}

/********************************************************************************
 *Function: validOLEheader
 *Description: run various tests aginst an OLE-HEADER to determine whether or not
 	it is valid.
*Return: TRUE/FALSE
**********************************************************************************/
int valid_ole_header(struct OLE_HDR *h)
{

	if (htos((unsigned char *) &h->reserved, FOREMOST_LITTLE_ENDIAN) != 0 ||
		htoi((unsigned char *) &h->reserved1, FOREMOST_LITTLE_ENDIAN) != 0 ||
		htoi((unsigned char *) &h->reserved2, FOREMOST_LITTLE_ENDIAN) != 0)
		{
		return FALSE;
		}

	/*The minimum sector shift is usually 2^6(64) and the uSectorShift is 2^9(512))*/
	if (htos((unsigned char *) &h->uMiniSectorShift, FOREMOST_LITTLE_ENDIAN) != 6 ||
		htos((unsigned char *) &h->uSectorShift, FOREMOST_LITTLE_ENDIAN) != 9 ||
		htoi((unsigned char *) &h->dir_flag, FOREMOST_LITTLE_ENDIAN) < 0)
		{
		return FALSE;
		}

	/*Sanity Checking*/
	if (htoi((unsigned char *) &h->num_FAT_blocks, FOREMOST_LITTLE_ENDIAN) <= 0 ||
		htoi((unsigned char *) &h->num_FAT_blocks, FOREMOST_LITTLE_ENDIAN) > 100)
		{
		return FALSE;
		}

	if (htoi((unsigned char *) &h->num_extra_FAT_blocks, FOREMOST_LITTLE_ENDIAN) < 0 ||
		htoi((unsigned char *) &h->num_extra_FAT_blocks, FOREMOST_LITTLE_ENDIAN) > 100)
		{
		return FALSE;
		}

	return TRUE;

}

/********************************************************************************
 *Function:checkOleName
 *Description: Determine what type of file is stored in the OLE format based on the
 	names of DIRENT in the FAT table.
*Return: A char* consisting of the suffix of the appropriate file.
**********************************************************************************/
char *check_ole_name(char *name)
{
	if (strstr(name, "WordDocument"))
		{
		return "doc";
		}
	else if (strstr(name, "Worksheet") || strstr(name, "Book") || strstr(name, "Workbook"))
		{
		return "xls";
		}
	else if (strstr(name, "Power"))
		{
		return "ppt";
		}
	else if (strstr(name, "Access") || strstr(name, "AccessObjSiteData"))
		{
		return "mbd";
		}
	else if (strstr(name, "Visio"))
		{
		return "vis";
		}
	else if (strstr(name, "Sfx"))
		{
		return "sdw";
		}
	else
		{
		return NULL;
		}

	return NULL;

}

int adjust_bs(int size, int bs)
{
	int rem = (size % bs);

	if (rem == 0)
		{

		return size;
		}

#ifdef DEBUG
	printf("\tnew size:=%d\n", size + (bs - rem));
#endif
	return (size + (bs - rem));

}

/********************************************************************************
 *Function: extract_ole
 *Description: Given that we have a OLE header, jump through the OLE structure and
    determine what type of file it is.
*Return: A pointer to where the EOF of the OLE is in the current buffer
**********************************************************************************/
unsigned char *extract_ole(f_state *s, u_int64_t c_offset, unsigned char *foundat, u_int64_t buflen,
						   s_spec *needle, u_int64_t f_offset, char *type)
{
	unsigned char	*buf = foundat;
	unsigned char	*extractbuf = NULL;
	char			*temp = NULL;
	char			*suffix = "ole";
	int				totalsize = 0;
	int				extrasize = 0;
	int				oldblk = 0;
	int				i, j;
	int				size = 0;
	int				blknum = 0;
	int				validblk = 512;
	int				file_size = 0;
	int				num_extra_FAT_blocks = 0;
	unsigned char	*htoi_c = NULL;
	int				extra_dir_blocks = 0;
	int				num_FAT_blocks = 0;
	int				next_FAT_block = 0;
	unsigned char	*p;
	int				fib = 1024;
	struct OLE_HDR	*h = NULL;

	int				result = 0;
	int				highblock = 0;
	unsigned long	miniSectorCutoff = 0;
	unsigned long	csectMiniFat = 0;

	/*Deal with globals defined in the OLE API, ugly*/
	if (dirlist != NULL)
		free(dirlist);
	if (FAT != NULL)
		free(FAT);
	init_ole();

	if (buflen < validblk)
		validblk = buflen;
	h = (struct OLE_HDR *)foundat;	/*cast the header block to point at foundat*/
#ifdef DEBUG
	dump_header(h);
#endif
	num_FAT_blocks = htoi((unsigned char *) &h->num_FAT_blocks, FOREMOST_LITTLE_ENDIAN);

	if (!valid_ole_header(h))
		return (buf + validblk);

	miniSectorCutoff = htoi((unsigned char *) &h->miniSectorCutoff, FOREMOST_LITTLE_ENDIAN);
	csectMiniFat = htoi((unsigned char *) &h->csectMiniFat, FOREMOST_LITTLE_ENDIAN);
	next_FAT_block = htoi((unsigned char *) &h->FAT_next_block, FOREMOST_LITTLE_ENDIAN);
	num_extra_FAT_blocks = htoi((unsigned char *) &h->num_extra_FAT_blocks, FOREMOST_LITTLE_ENDIAN);

	FAT = (int *)Malloc(OUR_BLK_SIZE * (num_FAT_blocks + 1));
	p = (unsigned char *)FAT;
	memcpy(p, &h[1], OUR_BLK_SIZE - FAT_START);
	if (next_FAT_block > 0)
		{
		p += (OUR_BLK_SIZE - FAT_START);
		blknum = next_FAT_block;
		for (i = 0; i < num_extra_FAT_blocks; i++)
			{
			if (!get_block(buf, blknum, p, buflen))
				return buf + validblk;
			validblk = (blknum + 1) * OUR_BLK_SIZE;
			p += OUR_BLK_SIZE - sizeof(int);
			blknum = htoi(p, FOREMOST_LITTLE_ENDIAN);
			}
		}

	blknum = htoi((unsigned char *) &h->root_start_block, FOREMOST_LITTLE_ENDIAN);

	if(blknum < 0)
	{
		return buf + 10;
	}

	highblock = htoi((unsigned char *) &h->dir_flag, FOREMOST_LITTLE_ENDIAN);
#ifdef DEBUG
	printf("getting dir block\n");
#endif

	//if(!get_dir_block (buf, blknum, buflen)) return buf+validblk;
	if (!get_block(buf, blknum, buffer, buflen))
		return buf + validblk;		/*GET DIR BLOCK*/
#ifdef DEBUG
	printf("done getting dir block\n");
#endif
	validblk = (blknum + 1) * OUR_BLK_SIZE;	
	while (blknum != END_OF_CHAIN)
	{
#ifdef DEBUG
		printf("finding dir info extra_dir_blks:=%d\n", extra_dir_blocks);
#endif
		if (extra_dir_blocks > 300)
			return buf + validblk;

		/**PROBLEMA**/
#ifdef DEBUG
		printf("***blknum:=%d FATblk:=%d ourblksize=%d\n", blknum, FATblk,OUR_BLK_SIZE);
#endif
		oldblk = blknum;
		htoi_c = (unsigned char *) &FAT[blknum / (OUR_BLK_SIZE / sizeof(int))];

		FATblk = htoi(htoi_c, FOREMOST_LITTLE_ENDIAN);
#ifdef DEBUG
		printf("***blknum:=%d FATblk:=%d\n", blknum, FATblk);
#endif

		if (!get_FAT_block(buf, blknum, block_list, buflen))
			return buf + validblk;
		blknum = htoi((unsigned char *) &block_list[blknum % 128], FOREMOST_LITTLE_ENDIAN);
#ifdef DEBUG
		printf("**blknum:=%d FATblk:=%d\n", blknum, FATblk);
#endif
		if (blknum == END_OF_CHAIN || oldblk == blknum)
		{
#ifdef DEBUG
			printf("EOC\n");
#endif
			break;
		}

		extra_dir_blocks++;
		result = get_dir_block(buf, blknum, buflen);
		if (result == SHORT_BLOCK)
		{
#ifdef DEBUG
			printf("SHORT BLK\n");
#endif
			break;
		}
		else if (!result)
			return buf + validblk;

	}

#ifdef DEBUG
	printf("DONE WITH WHILE\n");
#endif
	blknum = htoi((unsigned char *) &h->root_start_block, FOREMOST_LITTLE_ENDIAN);
	size = OUR_BLK_SIZE * (extra_dir_blocks + 1);
	dirlist = (struct DIRECTORY *)Malloc(size);
	memset(dirlist, 0, size);

	if (!get_block(buf, blknum, buffer, buflen))
		return buf + validblk;		/*GET DIR BLOCK*/

	if (!get_dir_info(buffer))
		{
		return foundat + validblk;
		}

	for (i = 0; i < extra_dir_blocks; i++)
		{
		if (!get_FAT_block(buf, blknum, block_list, buflen))
			return buf + validblk;
		blknum = htoi((unsigned char *) &block_list[blknum % 128], FOREMOST_LITTLE_ENDIAN);
		if (blknum == END_OF_CHAIN)
			break;
#ifdef DEBUG
		printf("getting dir blk blknum=%d\n", blknum);
#endif
		if (!get_block(buf, blknum, buffer, buflen))
			return buf + validblk;	/*GET DIR BLOCK*/
		if (!get_dir_info(buffer))
			{
			return buf + validblk;
			}
		}

#ifdef DEBUG
	printf("dir count is %d\n", i);
#endif
	for (dl = dirlist, i = 0; i < dir_count; i++, dl++)
		{
		memset(buffer, ' ', 75);
		j = htoi((unsigned char *) &dl->level, FOREMOST_LITTLE_ENDIAN) * 4;
		sprintf((char *) &buffer[j], "%-s", dl->name);
		j = strlen((char *)buffer);

		if (dl->name[0] == '@')
			return foundat + validblk;
		if (dl->type == STREAM)
			{
			buffer[j] = ' ';
			sprintf((char *) &buffer[60], "%8d\n", dl->size);

			if (temp == NULL)		/*check if we have alread defined the type*/
				{
				temp = check_ole_name(dl->name);
				if (temp)
					suffix = temp;
				}

			if (dl->size > miniSectorCutoff)
				{
				totalsize += adjust_bs(dl->size, 512);
				}
			else
				{
				totalsize += adjust_bs(dl->size, 64);
				}

#ifdef DEBUG
			fprintf(stdout, buffer);
#endif
			}
		else
			{
			sprintf((char *) &buffer[j], "\n");
#ifdef DEBUG
			printf("\tnot stream data \n");
			fprintf(stdout, buffer);
#endif

			extrasize += adjust_bs(dl->size, 512);

			}
		}

	totalsize += fib;
#ifdef DEBUG
	printf("DIR SIZE:=%d, numFATblks:=%d MiniFat:=%d\n",
		   adjust_bs(((dir_count) * 128), 512),
		   (num_FAT_blocks * 512),
		   adjust_bs((64 * csectMiniFat), 512));
#endif
	totalsize += adjust_bs(((dir_count) * 128), 512);
	totalsize += (num_FAT_blocks * 512);
	totalsize += adjust_bs((64 * csectMiniFat), 512);
	if ((highblk + 5) > highblock && highblk > 0)
		{
		highblock = highblk + 5;
		}

	highblock = highblock * 512;

#ifdef DEBUG
	printf("\t highblock:=%d\n", highblock);
#endif
	if (highblock > totalsize)
	{
#ifdef DEBUG
		printf("	Total size:=%d a difference of %lld\n", totalsize, buflen - totalsize);
		printf("	Extra size:=%d \n", extrasize);
		printf("	Highblock is greater than totalsize\n");
#endif
		totalsize = highblock;
	}

	totalsize = adjust_bs(totalsize, 512);
#ifdef DEBUG
	printf("	Total size:=%d a difference of %lld\n", totalsize, buflen - totalsize);
	printf("	Extra size:=%d \n", extrasize);
#endif

	if (buflen < totalsize)
	{
#ifdef DEBUG
		printf("	***Error not enough left in the buffer left:=%lld needed=%d***\n",
			   buflen,
			   totalsize);
#endif
		totalsize = buflen;
	}

	foundat = buf;
	highblock -= 5 * 512;
	if (highblock > 0 && highblock < buflen)
		{
		foundat += highblock;
		}
	else
		{
		foundat += totalsize;
		}

	/*Return to the highest blknum read in the file, that way we don't miss files that are close*/
	file_size = totalsize;
	extractbuf = buf;

	if (suffix)
		needle->suffix = suffix;

	if (!strstr(needle->suffix, type) && strcmp(type,"all")!=0)
		{
		return foundat;
		}

	write_to_disk(s, needle, file_size, extractbuf, c_offset + f_offset);
	return foundat;

}

//********************************************************************************/
int check_mov(unsigned char *atom)
{
#ifdef DEBUG
	printf("Atom:= %c%c%c%c\n", atom[0], atom[1], atom[2], atom[3]);
#endif
	if (strncmp((char *)atom, "free", 4) == 0 || strncmp((char *)atom, "mdat", 4) == 0 ||
		strncmp((char *)atom, "free", 4) == 0 || strncmp((char *)atom, "wide", 4) == 0 ||
		strncmp((char *)atom, "PICT", 4) == 0)
		{
		return TRUE;
		}

	if (strncmp((char *)atom, "trak", 4) == 0 || strncmp((char *)atom, "mdat", 4) == 0 ||
		strncmp((char *)atom, "mp3", 3) == 0 || strncmp((char *)atom, "wide", 4) == 0 ||
		strncmp((char *)atom, "moov", 4) == 0)
		{
		return TRUE;
		}

	return FALSE;
}

/********************************************************************************
 *Function: extract_mov
 *Description: Given that we have a MOV header JUMP through the mov data structures
    until we reach EOF
*Return: A pointer to where the EOF of the MOV is in the current buffer
**********************************************************************************/
unsigned char *extract_mov(f_state *s, u_int64_t c_offset, unsigned char *foundat, u_int64_t buflen,
						   s_spec *needle, u_int64_t f_offset)
{
	unsigned char	*buf = foundat - 4;
	unsigned char	*extractbuf = NULL;
	unsigned int	atomsize = 0;
	unsigned int	filesize = 0;
	int				mdat = FALSE;
	foundat -= 4;
	buflen += 4;
	while (1)						/*Loop through all the atoms until the EOF is reached*/
		{
		atomsize = htoi(foundat, FOREMOST_BIG_ENDIAN);
#ifdef DEBUG
		printf("Atomsize:=%d\n", atomsize);
#endif
		if (atomsize <= 0 || atomsize > needle->max_len)
			{
			return foundat + needle->header_len + 4;
			}

		filesize += atomsize;		/*Add the atomsize to the total file size*/
		if (filesize > buflen)
		{
#ifdef DEBUG
			printf("file size > buflen fs:=%d bf:=%lld\n", filesize, buflen);
#endif
			if (buflen >= needle->max_len)
				return foundat + needle->header_len + 4;
			else
				{
				return NULL;
				}
		}

		foundat += atomsize;
		if (buflen - (foundat - buf) < 5)
			{
			if (mdat)
				{
				break;
				}
			else
			{
#ifdef DEBUG
				printf("No mdat found");
#endif
				return foundat;
			}
			}

		/*Check if we have an mdat atom, these are required thus can be used to
	* Weed out corrupted file*/
		if (strncmp((char *)foundat + 4, "mdat", 4) == 0)
			{
			mdat = TRUE;
			}

		if (check_mov(foundat + 4)) /*Check to see if we are at a valid header*/
		{
#ifdef DEBUG
			printf("Checkmov succeeded\n");
#endif
		}
		else
		{
#ifdef DEBUG
			printf("Checkmov failed\n");
#endif
			if (mdat)
				{
				break;
				}
			else
			{
#ifdef DEBUG
				printf("No mdat found");
#endif
				return foundat;

			}
		}
		}							//End loop

	if (foundat)
		{

		filesize = (foundat - buf);
#ifdef DEBUG
		printf("file size:=%d\n", filesize);
#endif
		extractbuf = buf;
		write_to_disk(s, needle, filesize, extractbuf, c_offset + f_offset - 4);
		return foundat;
		}

#ifdef DEBUG
	printf("NULL Atomsize:=%d\n", atomsize);
#endif
	return NULL;

}

/********************************************************************************
 *Function: extract_wmv
 *Description: Given that we have a WMV header
    search for the file header and grab the file size.
*Return: A pointer to where the EOF of the WMV is in the current buffer
**********************************************************************************/
unsigned char *extract_wmv(f_state *s, u_int64_t c_offset, unsigned char *foundat, u_int64_t buflen,
						   s_spec *needle, u_int64_t f_offset)
{

	unsigned char	*currentpos = NULL;
	unsigned char	*header = foundat;
	unsigned char	*extractbuf = NULL;
	unsigned char	*buf = foundat;
	unsigned int		size = 0;
	u_int64_t		file_size = 0;
	u_int64_t			headerSize = 0;
	u_int64_t			fileObjHeaderSize = 0;
	int				numberofHeaderObjects = 0;
	int				reserved[2];
	int				bytes_to_search = 0;

	/*If we have less than a WMV header bail out*/
	if (buflen < 70)
		return NULL;

	foundat += 16;		/*Jump to the header size*/
	headerSize = htoll(foundat, FOREMOST_LITTLE_ENDIAN);
	//printx(foundat,0,8);
	foundat += 8;
	numberofHeaderObjects = htoi(foundat, FOREMOST_LITTLE_ENDIAN);
	foundat += 4;		//Jump to the begin File properties obj
	reserved[0] = foundat[0];
	reserved[1] = foundat[1];
	foundat += 2;
	//printf("found WMV\n");
	//end header obj
	//****************************************************/
	//Sanity Check
	//printf("WMV num_header_objs=%d headerSize=%llu\n",numberofHeaderObjects,headerSize);

	if (headerSize <= 0 || numberofHeaderObjects <= 0 || reserved[0] != 1)
		{
		printf("WMV err num_header_objs=%d headerSize=%llu\n",numberofHeaderObjects,headerSize);
		return foundat;
		}

	currentpos = foundat;
	if (buflen - (foundat - buf) >= needle->max_len)
		bytes_to_search = needle->max_len;
	else
		bytes_to_search = buflen - (foundat - buf);

	/*Note we are not searching for the footer here, just the file header ID so we can get the file size*/
	foundat = bm_search(needle->footer,
						needle->footer_len,
						foundat,
						bytes_to_search,
						needle->footer_bm_table,
						needle->case_sen,
						SEARCHTYPE_FORWARD);
	if (foundat)
		{
		foundat += 16;	/*jump to the headersize*/
		fileObjHeaderSize = htoll(foundat, FOREMOST_LITTLE_ENDIAN);
		//printx(foundat,0,8);
		foundat += 24;	//Jump to the file size obj
		size = htoi(foundat, FOREMOST_LITTLE_ENDIAN);
		//printx(foundat,0,8);
		
#ifdef DEBUG
		printf("SIZE:=%u fileObjHeaderSize=%llu\n", size,fileObjHeaderSize);
#endif
		}
	else
		{
		return NULL;
		}

	/*Sanity check data*/
	if (size > 0 && size <= needle->max_len && size <= buflen)
		{
		header += size;
#ifdef DEBUG
		printf("	Found a WMV at:=%lld,File size:=%lld\n", c_offset, size);
		printf("	Headersize:=%d, numberofHeaderObjects:= %d ,reserved:=%d,%d\n",
			   headerSize,
			   numberofHeaderObjects,
			   reserved[0],
			   reserved[1]);
#endif

		/*Everything seem ok, write to disk*/
		file_size = (header - buf);
		extractbuf = buf;
		write_to_disk(s, needle, file_size, extractbuf, c_offset + f_offset);
		foundat += file_size;
		return header;
		}

	return NULL;

}

/********************************************************************************
 *Function: extract_riff
 *Description: Given that we have a RIFF header parse header and grab the file size.
 *Return: A pointer to where the EOF of the RIFF is in the current buffer
 **********************************************************************************/
unsigned char *extract_riff(f_state *s, u_int64_t c_offset, unsigned char *foundat, u_int64_t buflen,
							s_spec *needle, u_int64_t f_offset, char *type)
{
	unsigned char	*buf = foundat;
	unsigned char	*extractbuf = NULL;
	int				size = 0;
	u_int64_t		file_size = 0;

	size = htoi(&foundat[4], FOREMOST_LITTLE_ENDIAN);		/* Grab the total file size in little endian from offset 4*/
	if (strncmp((char *) &foundat[8], "AVI", 3) == 0)		/*Sanity Check*/
		{
		if (strncmp((char *) &foundat[12], "LIST", 4) == 0) /*Sanity Check*/
			{
			if (size > 0 && size <= needle->max_len && size <= buflen)
			{
#ifdef DEBUG
				printf("\n	Found an AVI at:=%lld,File size:=%d\n", c_offset, size);
#endif
				file_size = size;
				extractbuf = buf;
				needle->suffix = "avi";
				if (!strstr(needle->suffix, type) && strcmp(type,"all")!=0)
					return foundat + size;
				write_to_disk(s, needle, file_size, extractbuf, c_offset + f_offset);
				foundat += size;
				return foundat;
			}

			return buf + needle->header_len;

			}
		else
			{
			return buf + needle->header_len;
			}
		}
	else if (strncmp((char *) &foundat[8], "WAVE", 4) == 0) /*Sanity Check*/
		{
		if (size > 0 && size <= needle->max_len && size <= buflen)
		{
#ifdef DEBUG
			printf("\n	Found a WAVE at:=%lld,File size:=%d\n", c_offset, size);
#endif

			file_size = size;
			extractbuf = buf;
			needle->suffix = "wav";
			if (!strstr(needle->suffix, type) && strcmp(type,"all")!=0)
				return foundat + size;

			write_to_disk(s, needle, file_size, extractbuf, c_offset + f_offset);
			foundat += file_size;
			return foundat;
		}

		return buf + needle->header_len;

		}
	else
		{
		return buf + needle->header_len;
		}

	return NULL;

}

/********************************************************************************
 *Function: extract_bmp
 *Description: Given that we have a BMP header parse header and grab the file size.
 *Return: A pointer to where the EOF of the BMP is in the current buffer
 **********************************************************************************/
unsigned char *extract_bmp(f_state *s, u_int64_t c_offset, unsigned char *foundat, u_int64_t buflen,
						   s_spec *needle, u_int64_t f_offset)
{
	unsigned char	*buf = foundat;
	int				size = 0;
	int				headerlength = 0;
	int				v_size = 0;
	int				h_size = 0;
	unsigned char	*extractbuf = NULL;
	u_int64_t		file_size = 0;
	char			comment[32];
	int				dataOffset = 0;
	int				dataSize = 0;

	if (buflen < 100)
		return buf + needle->header_len;

	/*JUMP the first to bytes of the header (BM)*/
	size = htoi(&foundat[2], FOREMOST_LITTLE_ENDIAN);	/*Grab the total file size in little_endian*/

	/*Sanity Check*/
	if (size <= 100 || size > needle->max_len)
		return buf + needle->header_len;

	dataOffset = htoi(&foundat[10], FOREMOST_LITTLE_ENDIAN);
	dataSize = htoi(&foundat[34], FOREMOST_LITTLE_ENDIAN);

	headerlength = htoi(&foundat[14], FOREMOST_LITTLE_ENDIAN);

	if (dataSize + dataOffset != size)
		{

		//printf("newtest != dataSize:=%d dataOffset:=%d\n",dataSize,dataOffset);
		}

	//Header length
	if (headerlength > 1000 || headerlength <= 0)
		return buf + needle->header_len;

	//foundat+=4;
	v_size = htoi(&foundat[22], FOREMOST_LITTLE_ENDIAN);
	h_size = htoi(&foundat[18], FOREMOST_LITTLE_ENDIAN);

	//Vertical length
	if (v_size <= 0 || v_size > 2000 || h_size <= 0)
		return buf + needle->header_len;

#ifdef DEBUG
	printf("\n	The size of the BMP is %d, Header length:=%d , Vertical Size:= %d, dataSize:=%d dataOffset:=%d\n",
	   size,
		   headerlength,
		   v_size,
		   dataSize,
		   dataOffset);
#endif
	if (size <= buflen)
		{

		sprintf(comment, " (%d x %d)", h_size, v_size);
		strcat(needle->comment, comment);

		file_size = size;
		extractbuf = buf;
		
		write_to_disk(s, needle, file_size, extractbuf, (c_offset + f_offset));
		foundat += file_size;
		return foundat;

		}

	return NULL;
}

/********************************************************************************
 *Function: extract_gif
 *Description: Given that we have a GIF header parse the given buffer to determine
 *	where the file ends.
 *Return: A pointer to where the EOF of the GIF is in the current buffer
 **********************************************************************************/
unsigned char *extract_gif(f_state *s, u_int64_t c_offset, unsigned char *foundat, u_int64_t buflen,
						   s_spec *needle, u_int64_t f_offset)
{
	unsigned char	*buf = foundat;
	unsigned char	*currentpos = foundat;
	unsigned char	*extractbuf = NULL;
	int				bytes_to_search = 0;
	unsigned short	width = 0;
	unsigned short	height = 0;
	u_int64_t		file_size = 0;
	char			comment[32];
	foundat += 4;		/*Jump the first 4 bytes of the gif header (GIF8)*/

	/*Check if the GIF is type 89a or 87a*/
	if (strncmp((char *)foundat, "9a", 2) == 0 || strncmp((char *)foundat, "7a", 2) == 0)
		{
		foundat += 2;	/*Jump the length of the header*/
		width = htos(foundat, FOREMOST_LITTLE_ENDIAN);
		height = htos(&foundat[2], FOREMOST_LITTLE_ENDIAN);

		sprintf(comment, " (%d x %d)", width, height);
		strcat(needle->comment, comment);

		currentpos = foundat;
		if (buflen - (foundat - buf) >= needle->max_len)
			bytes_to_search = needle->max_len;
		else
			bytes_to_search = buflen - (foundat - buf);
		foundat = bm_search(needle->footer,
							needle->footer_len,
							foundat,
							bytes_to_search,
							needle->footer_bm_table,
							needle->case_sen,
							SEARCHTYPE_FORWARD);
		if (foundat)
		{

			/*We found the EOF, write the file to disk and return*/
#ifdef DEBUG
			printx(foundat, 0, 16);
#endif
			file_size = (foundat - buf) + needle->footer_len;
#ifdef DEBUG
			printf("The GIF file size is  %llu  c_offset:=%llu\n", file_size, c_offset);
#endif
			extractbuf = buf;
			write_to_disk(s, needle, file_size, extractbuf, c_offset + f_offset);
			foundat += needle->footer_len;
			return foundat;
		}

		return NULL;

		}
	else				/*Invalid GIF header return the current pointer*/
		{
		return foundat;
		}

}

/********************************************************************************
 *Function: extract_mpg
 * Not done yet
 **********************************************************************************/
unsigned char *extract_mpg(f_state *s, u_int64_t c_offset, unsigned char *foundat, u_int64_t buflen,
						   s_spec *needle, u_int64_t f_offset)
{
	unsigned char	*buf = foundat;
	unsigned char	*currentpos = NULL;

	unsigned char	*extractbuf = NULL;
	int				bytes_to_search = 0;
	unsigned short	size = 0;
	u_int64_t		file_size = 0;

	/*
    size=htos(&foundat[4],FOREMOST_BIG_ENDIAN);
    printf("size:=%d\n",size);

    printx(foundat,0,16);
    foundat+=4;
    */
	int				j = 0;
	if (foundat[15] == (unsigned char)'\xBB')
		{
		}
	else
		{

		return buf + needle->header_len;
		}

	if (buflen <= 2 * KILOBYTE)
		{
		bytes_to_search = buflen;
		}
	else
		{
		bytes_to_search = 2 * KILOBYTE;
		}

	while (1)
		{
		j = 0;
		currentpos = foundat;
#ifdef DEBUG
		printf("Searching for marker\n");
#endif
		foundat = bm_search(needle->markerlist[0].value,
							needle->markerlist[0].len,
							foundat,
							bytes_to_search,
							needle->markerlist[0].marker_bm_table,
							needle->case_sen,
							SEARCHTYPE_FORWARD);

		if (foundat)
		{
#ifdef DEBUG
			printf("Found after searching %d\n", foundat - currentpos);
#endif
			while (1)
				{

				if (foundat[3] >= (unsigned char)'\xBB' && foundat[3] <= (unsigned char)'\xEF')
				{
#ifdef DEBUG
					printf("jumping %d:\n", j);
#endif
					size = htos(&foundat[4], FOREMOST_BIG_ENDIAN);
#ifdef DEBUG
					printf("\t hit: ");
					printx(foundat, 0, 16);
					printf("size:=%d\n\tjump: ", size);
#endif
					file_size += (foundat - buf) + size;
					if (size <= 0 || size > buflen - (foundat - buf))
					{
#ifdef DEBUG
						printf("Not enough room in the buffer ");
#endif
						if (size <= 50 * KILOBYTE && size > 0)
							{

							/*We should probably search more*/
							if (file_size < needle->max_len)
								{
								return NULL;
								}
							else
								{
								break;
								}
							}
						else
							{
							return currentpos + needle->header_len;
							}
					}

					foundat += size + 6;
#ifdef DEBUG
					printx(foundat, 0, 16);
#endif
					j++;
				}
				else
					{

					break;
					}
				}

			if (foundat[3] == (unsigned char)'\xB9')
				{
				break;
				}
			else if (foundat[3] != (unsigned char)'\xBA' && foundat[3] != (unsigned char)'\x00')
				{

				/*This is the error state where this doesn't seem to be an mpg anymore*/
				size = htos(&foundat[4], FOREMOST_BIG_ENDIAN);
#ifdef DEBUG
				printf("\t ***TEST: %x\n", foundat[3]);
				printx(foundat, 0, 16);

				printf("size:=%d\n", size);
#endif
				if ((currentpos - buf) >= 1 * MEGABYTE)
					{
					foundat = currentpos;
					break;
					}

				return currentpos + needle->header_len;

				}
			else if (foundat[3] == (unsigned char)'\xB3')
				{
				foundat += 3;
				}
			else
				{
				foundat += 3;
				}
		}
		else
			{
			if ((currentpos - buf) >= 1 * MEGABYTE)
				{
				foundat = currentpos;
				break;
				}
			else
			{
#ifdef DEBUG
				printf("RETURNING BUF\n");
#endif
				return buf + needle->header_len;
			}
			}
		}

	if (foundat)
		{
		file_size = (foundat - buf) + needle->footer_len;
		if (file_size < 1 * KILOBYTE)
			return buf + needle->header_len;
		}
	else
		{
		return buf + needle->header_len;
		}

	if (file_size > buflen)
		file_size = buflen;
	foundat = buf;
#ifdef DEBUG
	printf("The file size is  %llu  c_offset:=%llu\n", file_size, c_offset);
#endif

	extractbuf = buf;
	write_to_disk(s, needle, file_size, extractbuf, c_offset + f_offset);
	foundat += file_size;
	return foundat;
}


/********************************************************************************
 *Function: extract_mp4
 * Not done yet
 **********************************************************************************/
unsigned char *extract_mp4(f_state *s, u_int64_t c_offset, unsigned char *foundat, u_int64_t buflen,
						   s_spec *needle, u_int64_t f_offset)
{
	unsigned char	*buf = foundat;

	unsigned char	*extractbuf = NULL;
	unsigned int	size = 0;
	u_int64_t		file_size = 0;

   
	while(1)
	{
	 	size=htoi(&foundat[28],FOREMOST_BIG_ENDIAN);
		if(size ==0)
		{
			//printf("size ==0\n");
			foundat+=28;
			break;
		}
    		//printf("size:=%d\n",size);
		if(size > 0 && size < buflen)
		{
			if(!isprint(foundat[32]) ||  !isprint(foundat[33]))
			{
				//printf("print err\n");
				break;
				//return foundat+8;
			}
			foundat+=size;
			
		}
		else
		{
			if (size < needle->max_len)
			{
				//printf("Searching More\n");
				return NULL;
			}
			else
			{
				//printf("ERR\n");
				//return foundat+8;
				break;
			}
		}	
	
		//printx(foundat,0,32);

	}
	if (foundat)
	{
		file_size = (foundat - buf) + needle->footer_len;
		if (file_size < 1 * KILOBYTE)
			return buf + needle->header_len;
	}
	

	if (file_size > buflen)
		file_size = buflen;
	foundat = buf;


	extractbuf = buf;	
	write_to_disk(s, needle, file_size, extractbuf, c_offset + f_offset);
	foundat += file_size;
	return foundat;
}


/********************************************************************************
 *Function: extract_png
 *Description: Given that we have a PNG header parse the given buffer to determine
 *	where the file ends.
 *Return: A pointer to where the EOF of the PNG is in the current buffer
 **********************************************************************************/
unsigned char *extract_png(f_state *s, u_int64_t c_offset, unsigned char *foundat, u_int64_t buflen,
						   s_spec *needle, u_int64_t f_offset)
{
	unsigned char	*buf = foundat;
	unsigned char	*currentpos = NULL;

	unsigned char	*extractbuf = NULL;
	int				size = 0;
	int				height = 0;
	int				width = 0;
	u_int64_t		file_size = 0;
	char			comment[32];

	if (buflen < 100)
		return NULL;
	foundat += 8;
	width = htoi(&foundat[8], FOREMOST_BIG_ENDIAN);
	height = htoi(&foundat[12], FOREMOST_BIG_ENDIAN);

	if (width < 1 || height < 1)
		return foundat;

	if (width > 3000 || height > 3000)
		return foundat;

	sprintf(comment, " (%d x %d)", width, height);
	strcat(needle->comment, comment);

	while (1)	/* Jump through the headers until we reach the "data" part of the file*/
		{
		size = htoi(foundat, FOREMOST_BIG_ENDIAN);
#ifdef DEBUG
		printx(foundat, 0, 16);
		printf("Size:=%d\n", size);
#endif

		currentpos = foundat;
		if (size <= 0 || size > buflen - (foundat - buf))
		{
#ifdef DEBUG
			printf("buflen - (foundat-buf)=%lu\n", buflen - (foundat - buf));
#endif
			return currentpos;
		}

		/*12 is the length of the size, TYPE, and CRC field*/
		foundat += size + 12;

		if (isprint(foundat[4]))
			{
			if (strncmp((char *) &foundat[4], "IEND", 4) == 0)
				{
				break;
				}
			}
		else
		{
#ifdef DEBUG
			printx(foundat, 0, 16);
			printf("Not ascii returning\n");
#endif
			return currentpos;
		}

		}

	if (foundat)
		{
		file_size = (foundat - buf) + htoi(foundat, FOREMOST_BIG_ENDIAN) + 12;

		if (file_size > buflen)
			file_size = buflen;
		foundat = buf;
#ifdef DEBUG
		printf("The file size is  %llu  c_offset:=%llu\n", file_size, c_offset);
#endif
		extractbuf = buf;
		write_to_disk(s, needle, file_size, extractbuf, c_offset + f_offset);
		foundat += file_size;
		return foundat;
		}

	return NULL;
}

/********************************************************************************
 *Function: extract_jpeg
 *Description: Given that we have a JPEG header parse the given buffer to determine
 *	where the file ends.
 *Return: A pointer to where the EOF of the JPEG is in the current buffer
 **********************************************************************************/
unsigned char *extract_jpeg(f_state *s, u_int64_t c_offset, unsigned char *foundat, u_int64_t buflen,
							s_spec *needle, u_int64_t f_offset)
{
	unsigned char	*buf = foundat;
	unsigned char	*currentpos = NULL;

	unsigned char	*extractbuf = NULL;
	unsigned short	headersize;
	int				bytes_to_search = 0;
	int				hasTable = FALSE;
	int				hasHuffman = FALSE;
	u_int64_t		file_size = 0;

	// char comment[32];

	/*Check if we have a valid header*/
	if (buflen < 128)
		{
		return NULL;
		}

	if (foundat[3] == (unsigned char)'\xe0')
		{

		//JFIF header
		//sprintf(comment," (JFIF)");
		//strcat(needle->comment,comment);
		}
	else if (foundat[3] == (unsigned char)'\xe1')
		{

		//sprintf(comment," (EXIF)");
		//strcat(needle->comment,comment);
		}
	else
		return foundat + needle->header_len;	//Invalid keep searching
	while (1)									/* Jump through the headers until we reach the "data" part of the file*/
	{
#ifdef DEBUG
		printx(foundat, 0, 16);
#endif
		foundat += 2;
		headersize = htos(&foundat[2], FOREMOST_BIG_ENDIAN);
#ifdef DEBUG
		printf("Headersize:=%d buflen:=%lld\n", headersize, buflen);
#endif

		
		if (((foundat + headersize) - buf) > buflen){ return NULL; }	

		foundat += headersize;
		
		if (foundat[2] != (unsigned char)'\xff')
			{
			break;
			}

		/*Ignore 2 "0xff" side by side*/
		if (foundat[2] == (unsigned char)'\xff' && foundat[3] == (unsigned char)'\xff')
			{
			foundat++;
			}

		if (foundat[3] == (unsigned char)'\xdb' || foundat[4] == (unsigned char)'\xdb')
			{
			hasTable = TRUE;
			}
		else if (foundat[3] == (unsigned char)'\xc4')
			{
			hasHuffman = TRUE;
			}
	}

	/*All jpegs must contain a Huffman marker as well as a quantization table*/
	if (!hasTable || !hasHuffman)
	{
#ifdef DEBUG
		printf("No Table or Huffman \n");
#endif
		return buf + needle->header_len;
	}

	currentpos = foundat;

	//sprintf("Searching for footer\n");
	if (buflen < (foundat - buf)) {
#ifdef DEBUG
		printf("avoided bug in extract_jpeg!\n");
#endif
		bytes_to_search = 0;
	} else {
		if (buflen - (foundat - buf) >= needle->max_len)
			bytes_to_search = needle->max_len;
		else
			bytes_to_search = buflen - (foundat - buf);
	}

	foundat = bm_search(needle->footer,
						needle->footer_len,
						foundat,
						bytes_to_search,
						needle->footer_bm_table,
						needle->case_sen,
						SEARCHTYPE_FORWARD);

	if (foundat)								/*Found found a valid JPEG*/
		{

		/*We found the EOF, write the file to disk and return*/
		file_size = (foundat - buf) + needle->footer_len;
#ifdef DEBUG
		printf("The jpeg file size is  %llu  c_offset:=%llu\n", file_size, c_offset);
#endif

		//extractbuf=(unsigned char*) malloc(file_size*sizeof(char));
		//memcpy(extractbuf,buf,file_size);
		extractbuf = buf;
		write_to_disk(s, needle, file_size, extractbuf, c_offset + f_offset);
		foundat += needle->footer_len;

		////free(extractbuf);
		return foundat;
		}
	else
		{
		return NULL;
		}

}	//End extract_jpeg

/********************************************************************************
 *Function: extract_generic
 *Description:
 *Return: A pointer to where the EOF of the
 **********************************************************************************/
unsigned char *extract_generic(f_state *s, u_int64_t c_offset, unsigned char *foundat,
							   u_int64_t buflen, s_spec *needle, u_int64_t f_offset)
{
	unsigned char	*buf = foundat;
	unsigned char	*endptr = foundat;
	unsigned char	*beginptr = foundat;
	unsigned char	*extractbuf = NULL;
	int		bytes_to_search = 0;
	u_int64_t	file_size = 0;
	int begin=0;
	int end=0;
	

	if (buflen - (foundat - buf) >= needle->max_len)
		bytes_to_search = needle->max_len;
	else
		bytes_to_search = buflen - (foundat - buf);

  	if(needle->searchtype ==SEARCHTYPE_FORWARD_NEXT)
	{
			foundat+=needle->header_len;
			foundat = bm_search(needle->header,
							needle->header_len,
							foundat,
							bytes_to_search,
							needle->footer_bm_table,
							needle->case_sen,
							SEARCHTYPE_FORWARD);
	}
	else if(needle->searchtype ==SEARCHTYPE_ASCII)
	{
			
	
			while (isprint(foundat[end]) || foundat[end] == '\x0a' || foundat[end] == '\x0d' || foundat[end] == '\x09')
			{
				end++;
			}
			
			foundat+=end;
			endptr=foundat;
			foundat=buf;
			
			while (isprint(foundat[begin-1]) || foundat[begin-1] == '\x0a' || foundat[begin-1] == '\x0d' || foundat[begin-1] == '\x09')
			{
				begin--;
			}
			
			foundat+=begin;
			beginptr=foundat;
			
			buf=beginptr;
			foundat=endptr;
			//printx(buf,0,4);	
			
			file_size=end-begin;	
			//fprintf(stderr,"file_size=%llu end=%d begin=%d ptrsize=%d ptrsize2=%d\n",file_size,end,begin,endptr-beginptr,foundat-buf);
			if(buf==foundat) 
			{
					fprintf(stderr,"Returning Foundat\n");
					return foundat+needle->header_len;
			}			
	}
  	else if (needle->footer == NULL || strlen((char *)needle->footer) < 1)
	{
#ifdef DEBUG
		printf("footer is NULL\n");
#endif
		foundat = NULL;
	}
	else
	{
#ifdef DEBUG
		printf("footer is not NULL %p\n", needle->footer);
#endif
		foundat = bm_search(needle->footer,
							needle->footer_len,
							foundat,
							bytes_to_search,
							needle->footer_bm_table,
							needle->case_sen,
							SEARCHTYPE_FORWARD);
	}

	if (foundat)
	{
#ifdef DEBUG
		printf("found %s!!!\n", needle->footer);
#endif
		if(needle->searchtype ==SEARCHTYPE_FORWARD_NEXT || needle->searchtype ==SEARCHTYPE_ASCII)
		{
				file_size = (foundat - buf);
		}
		else
		{
				file_size = (foundat - buf) + needle->footer_len;
		}	
	}
	else
	{
		file_size = needle->max_len;
	}

	if (file_size == 0)
	{
		file_size = needle->max_len;
	}

	if (file_size > (buflen-begin))
	{
		file_size = buflen;
	}
	
#ifdef DEBUG
	printf("The file size is  %llu  c_offset:=%llu\n", file_size, c_offset);
#endif

	extractbuf = buf;
	write_to_disk(s, needle, file_size, extractbuf, c_offset + f_offset);
	
	if(needle->searchtype !=SEARCHTYPE_ASCII)
	{
		foundat=buf;
		foundat += needle->header_len;
	}
	return foundat;		
	
	
	
}

/********************************************************************************
 *Function: extract_exe
 *Description:
 *Return: A pointer to where the EOF of the
 **********************************************************************************/
unsigned char *extract_exe(f_state *s, u_int64_t c_offset, unsigned char *foundat, u_int64_t buflen,
						   s_spec *needle, u_int64_t f_offset)
{
	unsigned char	*buf = foundat;
	unsigned char	*extractbuf = NULL;
	u_int64_t		file_size = 0;
	unsigned short	pe_offset = 0;
	unsigned int	SizeOfCode = 0;
	unsigned int	SizeOfInitializedData = 0;
	unsigned int	SizeOfUninitializedData = 0;
	unsigned int	rva = 0;
	unsigned int	offset = 0;
	unsigned short	sections = 0;
	unsigned int	sizeofimage = 0;
	unsigned int	raw_section_size = 0;
	unsigned int	size_of_headers = 0;
	unsigned short	dll = 0;
	unsigned int	sum = 0;
	unsigned short	exe_char = 0;
	unsigned int	align = 0;
	int				i = 0;
	time_t			compile_time = 0;
	struct tm		*ret_time;
	char			comment[32];
	char			ascii_time[32];

	if (buflen < 100)
		return foundat + 2;
	pe_offset = htos(&foundat[60], FOREMOST_LITTLE_ENDIAN);
	if (pe_offset < 1 || pe_offset > 1000 || pe_offset > buflen)
		{
		return foundat + 60;
		}

	foundat += pe_offset;
	if (foundat[0] != (unsigned char)'\x50' || foundat[1] != (unsigned char)'\x45')
		{
		return foundat;
		}

	sections = htos(&foundat[6], FOREMOST_LITTLE_ENDIAN);
	if (buflen < (40 * sections + 224))
		{
		return foundat;
		}

	compile_time = (time_t) htoi(&foundat[8], FOREMOST_LITTLE_ENDIAN);
	ret_time = gmtime(&compile_time);
	sprintf(ascii_time,
			"%02d/%02d/%04d %02d:%02d:%02d",
			ret_time->tm_mon + 1,
			ret_time->tm_mday,
			ret_time->tm_year + 1900,
			ret_time->tm_hour,
			ret_time->tm_min,
			ret_time->tm_sec);
	chop(ascii_time);

	sprintf(comment, ascii_time);
	strcat(needle->comment, comment);
	exe_char = htos(&foundat[22], FOREMOST_LITTLE_ENDIAN);
	if (exe_char & 0x2000)
		{
		dll = 1;
		}
	else if (exe_char & 0x1000)
		{

		//printf("System File!!!\n");
		}
	else if (exe_char & 0x0002)
		{

		//printf("EXE !!!\n");
		}
	else
		{
		return foundat;
		}

	foundat += 0x18;	/*Jump to opt header should be 0x0b 0x01*/

	SizeOfCode = htoi(&foundat[4], FOREMOST_LITTLE_ENDIAN);
	SizeOfInitializedData = htoi(&foundat[8], FOREMOST_LITTLE_ENDIAN);
	SizeOfUninitializedData = htoi(&foundat[12], FOREMOST_LITTLE_ENDIAN);
	rva = htoi(&foundat[16], FOREMOST_LITTLE_ENDIAN);
	align = htoi(&foundat[36], FOREMOST_LITTLE_ENDIAN);

	sizeofimage = htoi(&foundat[56], FOREMOST_LITTLE_ENDIAN);
	size_of_headers = htoi(&foundat[60], FOREMOST_LITTLE_ENDIAN);
	foundat += 224;

	/*Start of sections*/
	for (i = 0; i < sections; i++)
		{

		//strncpy(name,foundat,8);
		offset = htoi(&foundat[20], FOREMOST_LITTLE_ENDIAN);
		raw_section_size = htoi(&foundat[16], FOREMOST_LITTLE_ENDIAN);

		//printf("\t%s size=%d offset=%d\n",name,raw_section_size,offset);
		foundat += 40;

		//rem+=(raw_section_size%align);
		//sum+=raw_section_size;
		sum = offset + raw_section_size;
		}

	/*
    printf("rva is %d sum= %d\n",rva,sum);
    printf("soi is %d,soh is %d \n",sizeofimage,size_of_headers);
    printf("we are off by %d\n",sum-buflen);
    printf("soc=%d ,soidr=%d, souid=%d\n",SizeOfCode,SizeOfInitializedData,SizeOfUninitializedData);
    printf("fs=%d ,extr=%d\n",SizeOfCode+SizeOfInitializedData,SizeOfUninitializedData);
		*/
	file_size = sum;
	if (file_size < 512 || file_size > 4 * MEGABYTE)
		{
		return foundat + 60;
		}

	if (file_size > buflen)
		file_size = buflen;
	foundat = buf;
#ifdef DEBUG
	printf("The file size is  %llu  c_offset:=%llu\n", file_size, c_offset);
#endif

	extractbuf = buf;
	if (dll == 1)
		{
		strcpy(needle->suffix, "dll");
		write_to_disk(s, needle, file_size, extractbuf, c_offset + f_offset);
		strcpy(needle->suffix, "exe");
		}
	else
		{
		write_to_disk(s, needle, file_size, extractbuf, c_offset + f_offset);
		}

	foundat += needle->header_len;
	return (buf + file_size);
}


/********************************************************************************
 *Function: extract_reg
 *Description:
 *Return: A pointer to where the EOF of the
 **********************************************************************************/
unsigned char *extract_reg(f_state *s, u_int64_t c_offset, unsigned char *foundat, u_int64_t buflen,
						   s_spec *needle, u_int64_t f_offset)
{
	unsigned char	*buf = foundat;
	unsigned char	*extractbuf = NULL;
	int sizeofreg = htoi(&foundat[0x28], FOREMOST_LITTLE_ENDIAN);
	int file_size=0;
	if(sizeofreg < 0 || sizeofreg > needle->max_len)	
	{
		return (foundat+4);
	}	
	foundat+=sizeofreg;
	file_size = (foundat - buf);

	extractbuf = buf;


	write_to_disk(s, needle, file_size , extractbuf, c_offset + f_offset);

			
	return NULL;
}
/********************************************************************************
 *Function: extract_rar
 *Description:
 *Return: A pointer to where the EOF of the
 **********************************************************************************/
unsigned char *extract_rar(f_state *s, u_int64_t c_offset, unsigned char *foundat, u_int64_t buflen,
						   s_spec *needle, u_int64_t f_offset)
{
	unsigned char	*buf = foundat;
	unsigned char	*extractbuf = NULL;
	u_int64_t		file_size = 0;
	unsigned short	headersize = 0;
	unsigned short	flags = 0;
	unsigned int	filesize = 0;
	unsigned int	tot_file_size = 0;
	unsigned int	ufilesize = 0;
	int				i = 0;
	int				scan = 0;
	int				flag = 0;
	int				passwd = 0;
	u_int64_t		bytes_to_search = 50 * KILOBYTE;
	char			comment[32];

	/*Marker Block*/
	headersize = htos(&foundat[5], FOREMOST_LITTLE_ENDIAN);
	foundat += headersize;

	/*Archive Block*/
	headersize = htos(&foundat[5], FOREMOST_LITTLE_ENDIAN);
	filesize = htoi(&foundat[7], FOREMOST_LITTLE_ENDIAN);

	if (foundat[2] != '\x73')
		{
		return foundat; /*Error*/
		}

	flags = htos(&foundat[3], FOREMOST_LITTLE_ENDIAN);
	if ((flags & 0x01) != 0)
		{
		sprintf(comment, " Multi-volume:");
		strcat(needle->comment, comment);
		}

	if (flags & 0x02)
		{
		sprintf(comment, " an archive comment is present:");
		strcat(needle->comment, comment);
		}

	foundat += headersize;

	if (foundat[2] != '\x74')
		{
		for (i = 0; i < 500; i++)
			{
			if (foundat[i] == '\x74')
				{
				foundat += i - 2;
				scan = 1;
				break;
				}
			}
		}

	if (headersize == 13 && foundat[2] != '\x74')
		{

		if (scan == 0)
			{
			sprintf(comment, "Encrypted Headers!");
			strcat(needle->comment, comment);
			}

		if (buflen - (foundat - buf) >= needle->max_len)
			bytes_to_search = needle->max_len;
		else
			bytes_to_search = buflen - (foundat - buf);

		//printf("bytes_to_search:=%d needle->footer_len:=%d needle->header_len:=%d\n",bytes_to_search,needle->footer_len,needle->header_len);
		foundat = bm_search(needle->footer,
							needle->footer_len,
							foundat,
							bytes_to_search,
							needle->footer_bm_table,
							needle->case_sen,
							SEARCHTYPE_FORWARD);
		if (foundat == NULL)
			{
			tot_file_size = bytes_to_search;
			foundat = buf + tot_file_size;
			}
		}
	else
		{

		/*Loop through files*/
		while (foundat[2] == '\x74')
			{

			headersize = htos(&foundat[5], FOREMOST_LITTLE_ENDIAN);
			filesize = htoi(&foundat[7], FOREMOST_LITTLE_ENDIAN);
			ufilesize = htoi(&foundat[11], FOREMOST_LITTLE_ENDIAN);

			if (headersize < 1 || headersize > buflen)
				flag = 1;
			if (filesize < 0 || filesize > buflen)
				flag = 1;
			if ((headersize + filesize) > buflen)
				flag = 1;
			if (ufilesize < 0)
				flag = 1;

			flags = htos(&foundat[3], FOREMOST_LITTLE_ENDIAN);
			if ((flags & 0x04) != 0)
				{
				passwd = 1;
				}

			tot_file_size = (foundat - buf);
			if ((tot_file_size + headersize + filesize) > buflen)
				{
				break;
				}

			foundat += headersize + filesize;
			}

		if (passwd == 1)
			{
			sprintf(comment, "Password Protected:");
			strcat(needle->comment, comment);
			}

		if (flag == 1)
			{
			sprintf(comment, "Encrypted Headers!");
			strcat(needle->comment, comment);
			foundat = bm_search(needle->footer,
								needle->footer_len,
								foundat,
								bytes_to_search,
								needle->footer_bm_table,
								needle->case_sen,
								SEARCHTYPE_FORWARD);
			if (foundat == NULL)
				{
				tot_file_size = bytes_to_search;
				foundat = buf + tot_file_size;
				}
			}

		if (foundat[2] != '\x7B' && tot_file_size == 0)
			{

			//printf("Error 7B!!!! %x\n",foundat[2]);
			return foundat;
			}

		foundat += 7;

		}

	if (foundat)
		{

		/*We found the EOF, write the file to disk and return*/
		tot_file_size = (foundat - buf);
		if (tot_file_size > buflen)
			file_size = buflen;

		extractbuf = buf;
		write_to_disk(s, needle, tot_file_size, extractbuf, c_offset + f_offset);
		return foundat;
		}
	else
		{
		return NULL;
		}

	return NULL;
}

unsigned char *extract_file(f_state *s, u_int64_t c_offset, unsigned char *foundat, u_int64_t buflen,
							s_spec *needle, u_int64_t f_offset)
{
	if (needle->type == JPEG)
		{
		return extract_jpeg(s, c_offset, foundat, buflen, needle, f_offset);
		}
	else if (needle->type == GIF)
		{
		return extract_gif(s, c_offset, foundat, buflen, needle, f_offset);
		}
	else if (needle->type == PNG)
		{
		return extract_png(s, c_offset, foundat, buflen, needle, f_offset);
		}
	else if (needle->type == BMP)
		{
		return extract_bmp(s, c_offset, foundat, buflen, needle, f_offset);
		}
	else if (needle->type == RIFF)
		{
		needle->suffix = "rif";
		return extract_riff(s, c_offset, foundat, buflen, needle, f_offset, "all");
		}
	else if (needle->type == AVI)
		{
		return extract_riff(s, c_offset, foundat, buflen, needle, f_offset, "avi");
		}
	else if (needle->type == WAV)
		{
		needle->suffix = "rif";
		return extract_riff(s, c_offset, foundat, buflen, needle, f_offset, "wav");
		}
	else if (needle->type == WMV)
		{
		return extract_wmv(s, c_offset, foundat, buflen, needle, f_offset);
		}
	else if (needle->type == OLE)
		{
		needle->suffix = "ole";
		return extract_ole(s, c_offset, foundat, buflen, needle, f_offset, "all");
		}
	else if (needle->type == DOC)
		{
		return extract_ole(s, c_offset, foundat, buflen, needle, f_offset, "doc");
		}
	else if (needle->type == PPT)
		{
		return extract_ole(s, c_offset, foundat, buflen, needle, f_offset, "ppt");
		}
	else if (needle->type == XLS)
		{
		needle->suffix = "ole";
		return extract_ole(s, c_offset, foundat, buflen, needle, f_offset, "xls");
		}
	else if (needle->type == PDF)
		{
		return extract_pdf(s, c_offset, foundat, buflen, needle, f_offset);
		}
	else if (needle->type == CPP)
		{
		return extract_cpp(s, c_offset, foundat, buflen, needle, f_offset);
		}
	else if (needle->type == HTM)
		{
		return extract_htm(s, c_offset, foundat, buflen, needle, f_offset);
		}
	else if (needle->type == MPG)
		{
		return extract_mpg(s, c_offset, foundat, buflen, needle, f_offset);
		}
	else if (needle->type == MP4)
		{
		return extract_mp4(s, c_offset, foundat, buflen, needle, f_offset);
		}
	else if (needle->type == ZIP)
		{
		return extract_zip(s, c_offset, foundat, buflen, needle, f_offset, "all");
		}
	else if (needle->type == RAR)
		{
		return extract_rar(s, c_offset, foundat, buflen, needle, f_offset);
		}
	else if (needle->type == SXW)
		{
		return extract_zip(s, c_offset, foundat, buflen, needle, f_offset, "sxw");
		}
	else if (needle->type == SXC)
		{
		return extract_zip(s, c_offset, foundat, buflen, needle, f_offset, "sxc");
		}
	else if (needle->type == SXI)
		{
		return extract_zip(s, c_offset, foundat, buflen, needle, f_offset, "sxi");
		}
	else if (needle->type == EXE)
		{
		return extract_exe(s, c_offset, foundat, buflen, needle, f_offset);
		}
	else if (needle->type == MOV || needle->type == VJPEG)
		{
		return extract_mov(s, c_offset, foundat, buflen, needle, f_offset);
		}
	else if (needle->type == CONF)
		{
		return extract_generic(s, c_offset, foundat, buflen, needle, f_offset);
		}
	else
		{
		return NULL;
		}
	return NULL;	
}


================================================
FILE: extract.h
================================================
/*
	local file header signature     4 bytes  (0x04034b50)
        version needed to extract       2 bytes
        general purpose bit flag        2 bytes
        compression method              2 bytes
        last mod file time              2 bytes
        last mod file date              2 bytes
        crc-32                          4 bytes
        compressed size                 4 bytes
        uncompressed size               4 bytes
        filename length                 2 bytes
        extra field length              2 bytes
*/

/*
 	central file header signature   4 bytes  (0x02014b50)
        version made by                 2 bytes
        version needed to extract       2 bytes
        general purpose bit flag        2 bytes
        compression method              2 bytes
        last mod file time              2 bytes
        last mod file date              2 bytes
        crc-32                          4 bytes
        compressed size                 4 bytes
        uncompressed size               4 bytes
        filename length                 2 bytes
        extra field length              2 bytes
        file comment length             2 bytes
        disk number start               2 bytes
        internal file attributes        2 bytes
        external file attributes        4 bytes
        relative offset of local header 4 bytes
*/

/* end of central dir signature    4 bytes  (0x06054b50)
        number of this disk             2 bytes
        number of the disk with the
        start of the central directory  2 bytes
        total number of entries in
        the central dir on this disk    2 bytes
        total number of entries in
        the central dir                 2 bytes
        size of the central directory   4 bytes
        offset of start of central
        directory with respect to
        the starting disk number        4 bytes
        zipfile comment length          2 bytes
        zipfile comment (variable size)
	*/
struct zipLocalFileHeader
{
	unsigned int	signature;					//0
	unsigned short	version;					//4
	unsigned short	genFlag;					//6
	signed short	compression;				//8
	unsigned short	last_mod_time;				//10
	unsigned short	last_mod_date;				//12
	unsigned int	crc;						//14
	unsigned int	compressed;					//18
	unsigned int	uncompressed;				//22
	unsigned short	filename_length;			//26
	unsigned short	extra_length;				//28
};
struct zipCentralFileHeader
{
	unsigned int	signature;					//0
	unsigned char	version_extract[2];			//4
	unsigned char	version_madeby[2];			//6
	unsigned short	genFlag;					//8
	unsigned short	compression;				//10
	unsigned short	last_mod_time;				//12
	unsigned short	last_mod_date;				//14
	unsigned int	crc;						//16
	unsigned int	compressed;					//20
	unsigned int	uncompressed;				//24
	unsigned short	filename_length;			//28
	unsigned short	extra_length;				//30
	unsigned short	filecomment_length;			//32
	unsigned short	disk_number_start;			//34
};
struct zipEndCentralFileHeader
{
	unsigned int	signature;					//0
	unsigned short	numOfdisk;					//4
	unsigned short	compression;				//6
	unsigned short	start_of_central_dir;		//8
	unsigned short	num_entries_in_central_dir; //10
	unsigned int	size_of_central_dir;		//12
	unsigned int	offset;						//16
	unsigned short	comment_length;				//20
};

void print_zip(struct zipLocalFileHeader *fileHeader, struct zipCentralFileHeader *centralHeader)
{
	printf("\n	Local Header Data\n");
	printf("GenFlag:=%d,compressed:=%d,uncompressed:=%d\n",
		   fileHeader->genFlag,
		   fileHeader->compressed,
		   fileHeader->uncompressed);
	printf("Compression:=%d, filename_len:=%d,extralen:=%d\n",
		   fileHeader->compression,
		   fileHeader->filename_length,
		   fileHeader->extra_length);

	printf("	Central Header Data\n");
	printf("GenFlag:=%d,compressed:=%d,uncompressed:=%d\n",
		   centralHeader->genFlag,
		   centralHeader->compressed,
		   centralHeader->uncompressed);
	printf("Compression:=%d, Version Madeby:=%x%x\n",
		   centralHeader->compression,
		   centralHeader->version_madeby[0],
		   centralHeader->version_madeby[1]);
}


================================================
FILE: foremost.8
================================================
.TH FOREMOST "8" "v1.5 - May 2009"

.SH NAME
foremost \- Recover files using their headers, footers, and data structures

.SH SYNOPSIS
.B foremost[\fB-h\fR][\fB-V\fR][\fB-d\fR][\fB-vqwQT\fR][\fB-b\fR<blocksize>][\fB-o\fR<dir>]
[\fB-t\fR<type>][\fB-s\fR<num>][\fB-i\fR<file>] 

.SH BUILTIN FORMATS
.PP
Recover files from a disk image based on file types specified by the
user using the -t switch.

.TP
.B jpg
Support for the JFIF and Exif formats including implementations used 
in modern digital cameras.


.TP
.B gif
.TP
.B png
.TP
.B bmp
Support for windows bmp format.
.TP
.B avi
.TP
.B exe 
Support for Windows PE binaries, will extract DLL and EXE files along
with their compile times.
.TP
.B mpg 
Support for most MPEG files (must begin with 0x000001BA) 
.TP
.B mp4
.TP
.B wav
.TP
.B riff 
This will extract AVI and RIFF since they use the same file 
format (RIFF). note faster than running each separately. 
.TP
.B wmv
Note may also extract -wma files as they have similar format.
.TP
.B mov
.TP
.B pdf
.TP
.B ole
This will grab any file using the OLE file structure.  This includes
PowerPoint, Word, Excel, Access, and StarWriter
.TP
.B doc
Note it is more efficient to run OLE as you get more bang for your buck.  
If you wish to ignore all other ole files then use this.
.TP
.B zip
Note is will extract .jar files as well because they use a similar format.
Open Office docs are just zip'd XML files so they are extracted as well.  
These include SXW, SXC, SXI, and SX? for undetermined OpenOffice files.
Office 2007 files are also XML based (PPTX,DOCX,XLSX)
.TP
.B rar
.TP
.B htm
.TP
.B cpp
C source code detection, note this is primitive and may 
generate documents other than C code.
.TP
.B all
Run all pre-defined extraction methods. [Default if no -t is specified]

.SH DESCRIPTION
.PP
Recover files from a disk image based on headers and footers specified by the
user.

.TP
\fB\-h\fR
Show a help screen and exit.

.TP

\fB\-V\fR
Show copyright information and exit.
.TP

\fB\-d\fR
Turn on indirect block detection, this works well for Unix file systems.
.TP
\fB\-T\fR
Time stamp the output directory so you don't have to delete the output
dir when running multiple times.

.TP
\fB\-v\fR
Enables verbose mode. This causes more information regarding the current
state of the program to be displayed on the screen, and is highly recommended.


.TP
\fB\-q\fR
Enables quick mode. In quick mode, only the start of each sector is 
searched for matching headers. That is, the header is searched only up to 
the length of the longest header. The rest of the sector, usually about 500 
bytes, is ignored. This mode makes foremost run considerably faster, but it 
may cause you to miss files that are embedded in other files. For example, 
using quick mode you will not be able to find JPEG images embedded in 
Microsoft Word documents. 

Quick mode should not be used when examining NTFS file systems. Because 
NTFS will store small files inside the Master File Table, these files will 
be missed during quick mode.
.br

.TP
\fB\-Q\fR
Enables Quiet mode. Most error messages will be suppressed.
.br

.TP
\fB\-w\fR
Enables write audit only mode.  No files will be extracted. 
.br

.TP
\fB\-a\fR
Enables write all headers, perform no error detection in terms of corrupted files.
.br

.TP
\fB\-b\fR \fInumber\fR
Allows you to specify the block size used in foremost.  This is relevant for 
file naming and quick searches.  The default is 512.
	ie.	foremost -b 1024 image.dd
.br
.TP
\fB\-k\fR \fInumber\fR
Allows you to specify the chunk size used in foremost.  This can improve 
speed if you have enough RAM to fit the image in.  It reduces the checking 
that occurs between chunks of the buffer.  For example if you had > 500MB of RAM.
	ie.	foremost -k 500 image.dd
.br

.TP
\fB\-i\fR \fIfile\fR
The \fIfile\fR is used as the input file.  If no input file is specified
or the input file cannot be read then stdin is used.

.TP
\fB-o\fR \fIdirectory\fR
Recovered files are written to the directory
\fIdirectory\fR. 

.TP
\fB-c\fR \fIfile\fR
Sets the configuration file to use. If none is specified, the file 
"foremost.conf" from the current directory is used, if that doesn't
exist then "/etc/foremost.conf" is used. The format for
the configuration file is described in the default configuration
file included with this program. See the \fICONFIGURATION FILE\fR
section below for more information.

.TP

\fB-s\fR \fInumber\fR
Skips \fInumber\fR blocks in the input file before beginning the search
for headers.    
	ie.  foremost -s 512 -t jpeg -i /dev/hda1
.TP


.PP

.SH CONFIGURATION FILE
The configuration file is used to control what types of files foremost
searches for. A sample configuration file, foremost.conf, is included with
this distribution. For each file type, the configuration file describes
the file's extension, whether the header and footer are case sensitive,
the maximum file size, and the header and footer for the file. The footer
field is optional, but header, size, case sensitivity, and extension are
not!

Any line that begins with a pound sign 
is considered a comment and ignored. Thus,
to skip a file type just put a pound sign at the beginning of that line

Headers and footers are decoded before use. To specify a value in
hexadecimal use \\x[0-f][0-f], and for octal use \\[0-7][0-7][0-7].  Spaces
can be represented by \\s. Example: "\\x4F\\123\\I\\sCCI" decodes to "OSI CCI".

To match any single character (aka a wildcard) use a ?. If you need to
search for the ? character, you will need to change the wildcard line
*and* every occurrence of the old wildcard character in the configuration
file. Do not forget those hex and octal values! ? is equal to \\x3f and
\\063.

There is a sample set of headers in the README file.

.SH EXAMPLES
.TP
.SH Search for jpeg format skipping the first 100 blocks
foremost -s 100 -t jpg -i image.dd 
.TP
.SH Only generate an audit file, and print to the screen (verbose mode)
foremost -av image.dd 
.TP
.SH Search all defined types
foremost -t all -i image.dd
.TP
.SH Search for gif and pdf's 
foremost -t gif,pdf -i image.dd
.TP
.SH Search for office documents and jpeg files in a Unix file system in verbose mode.  
foremost -vd -t ole,jpeg -i image.dd
.TP
.SH Run the default case
foremost image.dd
.PP

.SH AUTHORS
Original Code written by Special Agent Kris Kendall and Special Agent Jesse Kornblum of 
the United States Air Force Office of Special Investigations.

Modification by Nick Mikus a Research Associate at the Naval Postgraduate 
School Center for Information Systems Security Studies and Research.  The modification
of Foremost was part of a masters thesis at NPS.

.SH BUGS
When compiling foremost on systems with versions of glibc 2.1.x or older,
you will get some (harmless) compiler warnings regarding the implicit 
declaration of fseeko and ftello. You can safely ignore these warnings.
.PP

.SH "REPORTING BUGS"
Because Foremost could be used to obtain evidence for criminal 
prosecutions, we
take all bug reports \fIvery\fR seriously. Any bug that jeopardizes the
forensic integrity of this program could have serious consequenses. When submitting a bug report, please include a description
of the problem, how you found it, and your contact information.
.PP
Send bug reports to:
.br
namikus AT users d0t sf d0t net
.PP
.SH COPYRIGHT
This program is a work of the US Government. In accordance with 17 USC 105,
copyright protection is not available for any work of the US Government.
.PP
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

.SH "SEE ALSO"
There is more information in the README file. 
.PP
Foremost was originally designed to imitate the functionality of CarvThis, 
a DOS program written by the Defense Computer Forensics Lab in in 1999.




================================================
FILE: foremost.conf
================================================
#
# Foremost configuration file
#-------------------------------------------------------------------------
# Note the foremost configuration file is provided to support formats which
# don't have built-in extraction functions.  If the format is built-in to foremost
# simply run foremost with -t <suffix> and provide the format you wish to extract. 
#
# The configuration file is used to control what types of files foremost
# searches for. A sample configuration file, foremost.conf, is included with
# this distribution. For each file type, the configuration file describes
# the file's extension, whether the header and footer are case sensitive,
# the maximum file size, and the header and footer for the file. The footer
# field is optional, but header, size, case sensitivity, and extension are
# not!
#
# Any line that begins with a '#' is considered a comment and ignored. Thus,
# to skip a file type just put a '#' at the beginning of that line
#

# Headers and footers are decoded before use. To specify a value in
# hexadecimal use \x[0-f][0-f], and for octal use \[0-3][0-7][0-7].  Spaces
# can be represented by \s. Example: "\x4F\123\I\sCCI" decodes to "OSI CCI".
#
# To match any single character (aka a wildcard) use a '?'. If you need to
# search for the '?' character, you will need to change the 'wildcard' line
# *and* every occurrence of the old wildcard character in the configuration
# file. Don't forget those hex and octal values! '?' is equal to 0x3f and
# \063.
#
# If you would like to extract files without an extension enter the value
# "NONE" in the extension column (note: you can change the value of this
# "no suffix" flag by setting the variable FOREMOST_NOEXTENSION_SUFFIX
# in foremost.h and recompiling).
#
# The ASCII option will extract all ASCII printable characters before and after 
# the keyword provided.
#
# The NEXT keyword after a footer instructs foremost to search forwards for data 
# that starts with the header provided and terminates or is followed by data in 
# the footer -- the footer data is not included in the output.  The data in the 
# footer, when used with the NEXT keyword effectively allows you to search for 
# data that you know for sure should not be in the output file.  This method for 
# example, lets you search for two 'starting' headers in a document that doesn't 
# have a good ending footer and you can't say exactly what the footer is, but 
# you know if you see another header, that should end the search and an output
# file should be written.

# To redefine the wildcard character, change the setting below and all
# occurances in the formost.conf file.
#
#wildcard  ?
#
#		case	size	header			footer
#extension   sensitive	
#
#---------------------------------------------------------------------
# EXAMPLE WITH NO SUFFIX
#---------------------------------------------------------------------
#
# Here is an example of how to use the no extension option. Any files 
# containing the string "FOREMOST" would be extracted to a file without 
# an extension (eg: 00000000,00000001)
#      NONE     y      1000     FOREMOST
#
#---------------------------------------------------------------------
# GRAPHICS FILES
#---------------------------------------------------------------------	
#
#
# AOL ART files
#	art	y	150000	\x4a\x47\x04\x0e	\xcf\xc7\xcb
#  	art	y 	150000	\x4a\x47\x03\x0e	\xd0\xcb\x00\x00
#
# GIF and JPG files (very common)
#	(NOTE THESE FORMATS HAVE BUILTIN EXTRACTION FUNCTION)
#	gif	y	155000000	\x47\x49\x46\x38\x37\x61	\x00\x3b
#  	gif	y 	155000000	\x47\x49\x46\x38\x39\x61	\x00\x00\x3b
#  	jpg	y	20000000	\xff\xd8\xff\xe0\x00\x10	\xff\xd9
#  	jpg	y	20000000	\xff\xd8\xff\xe1 \xff\xd9 
#  	jpg	y	20000000	\xff\xd8	\xff\xd9
#
# PNG   (used in web pages)
#	(NOTE THIS FORMAT HAS A BUILTIN EXTRACTION FUNCTION)
#  	png	y	200000	\x50\x4e\x47?	\xff\xfc\xfd\xfe
#
#
# BMP 	
#	(NOTE THIS FORMAT HAS A BUILTIN EXTRACTION FUNCTION)
#	bmp	y	100000	BM??\x00\x00\x00
#
# TIF
#  	tif	y	200000000	\x49\x49\x2a\x00
#
#---------------------------------------------------------------------	
# ANIMATION FILES
#---------------------------------------------------------------------	
#
# AVI (Windows animation and DiVX/MPEG-4 movies)
#	(NOTE THIS FORMAT HAS A BUILTIN EXTRACTION FUNCTION)
#  	avi	y	4000000 RIFF????AVI
#
# Apple Quicktime
#	(NOTE THIS FORMAT HAS A BUILTIN EXTRACTION FUNCTION)
#	mov	y	4000000	????????\x6d\x6f\x6f\x76
#	mov	y	4000000	????????\x6d\x64\x61\x74
#
# MPEG Video
#	mpg	y	4000000	mpg	eof
#	mpg	y	20000000 \x00\x00\x01\xba      \x00\x00\x01\xb9
#	mpg     y 	20000000 \x00\x00\x01\xb3 	\x00\x00\x01\xb7
#
# Macromedia Flash
#	fws	y	4000000	FWS
#
#---------------------------------------------------------------------	
# MICROSOFT OFFICE 
#---------------------------------------------------------------------	
#
# Word documents
#	(NOTE THIS FORMAT HAS A BUILTIN EXTRACTION FUNCTION)
#	doc	y	12500000  \xd0\xcf\x11\xe0\xa1\xb1
#
# Outlook files
#	pst	y	400000000 \x21\x42\x4e\xa5\x6f\xb5\xa6
#	ost	y	400000000 \x21\x42\x44\x4e
#
# Outlook Express
#	dbx	y	4000000	\xcf\xad\x12\xfe\xc5\xfd\x74\x6f
#	idx	y	4000000	\x4a\x4d\x46\x39
#	mbx	y	4000000	\x4a\x4d\x46\x36
#
#---------------------------------------------------------------------	
# WORDPERFECT
#---------------------------------------------------------------------
#
#	wpc	y	100000	?WPC
#
#---------------------------------------------------------------------	
# HTML		(NOTE THIS FORMAT HAS A BUILTIN EXTRACTION FUNCTION)
#---------------------------------------------------------------------	
#
#	htm	n	50000   <html			</html>
#
#---------------------------------------------------------------------	
# ADOBE PDF	(NOTE THIS FORMAT HAS A BUILTIN EXTRACTION FUNCTION)
#---------------------------------------------------------------------	
#
#	pdf	y	5000000	%PDF-  %EOF 
#
#
#---------------------------------------------------------------------	
# AOL (AMERICA ONLINE)
#---------------------------------------------------------------------	
#
# AOL Mailbox
#	mail	y	500000	 \x41\x4f\x4c\x56\x4d
#
#
#	
#---------------------------------------------------------------------	
# PGP (PRETTY GOOD PRIVACY)
#---------------------------------------------------------------------	
#
# PGP Disk Files
#	pgd	y	500000	\x50\x47\x50\x64\x4d\x41\x49\x4e\x60\x01
#
# Public Key Ring
#	pgp	y	100000	\x99\x00
# Security Ring
#	pgp	y	100000	\x95\x01
#	pgp	y	100000	\x95\x00
# Encrypted Data or ASCII armored keys
#	pgp	y	100000	\xa6\x00
# (there should be a trailer for this...)
#	txt	y	100000	-----BEGIN\040PGP
#
#
#---------------------------------------------------------------------	
# RPM (Linux package format)
#---------------------------------------------------------------------	
#	rpm	y	1000000	\xed\xab
#
#
#---------------------------------------------------------------------	
# SOUND FILES
#---------------------------------------------------------------------	
#	(NOTE THIS FORMAT HAS A BUILTIN EXTRACTION FUNCTION)
#	wav     y	200000	RIFF????WAVE
#
# Real Audio Files
#	ra	y	1000000	\x2e\x72\x61\xfd
#	ra	y	1000000	.RMF
#
#	asf     y       8000000	 \x30\x26\xB2\x75\x8E\x66\xCF\x11\xA6\xD9\x00\xAA\x00\x62\xCE\x6C
#
#	wmv     y       20000000 \x30\x26\xB2\x75\x8E\x66\xCF\x11\xA6\xD9\x00\xAA\x00\x62\xCE\x6C
#
#	wma     y       8000000  \x30\x26\xB2\x75    \x00\x00\x00\xFF
#
#	wma     y       8000000  \x30\x26\xB2\x75    \x52\x9A\x12\x46
#
#	mp3     y    	8000000 \xFF\xFB??\x44\x00\x00
#	mp3     y    	8000000 \x57\x41\x56\45            \x00\x00\xFF\
#	mp3     y    	8000000 \xFF\xFB\xD0\            \xD1\x35\x51\xCC\
#	mp3     y    	8000000 \x49\x44\x33\
#	mp3     y    	8000000 \x4C\x41\x4D\x45\
#---------------------------------------------------------------------	
# WINDOWS REGISTRY FILES
#---------------------------------------------------------------------	
# 
# Windows NT registry
#	dat	y	4000000	regf
# Windows 95 registry
#	dat	y	4000000	CREG
#
#    	lnk     y    	5000	\x4C\x00\x00\x00\x01\x14\x02\x00\x00\x00\x00\x00\xC0\x00\x00
#    	chm     y    	100000	\x49\x54\x53\x46\x03\x00\x00\x00\x60\x00\x00\x00\x01\x00\x00
#    	cookie  n    	4096    id=
#    	rdp     y    	4096	\xFF\xFE\x73\x00\x63\x00\x72\x00\x65\x00\x65\x00\x6E\x00\x20\x00\x6D
#
#---------------------------------------------------------------------	
# MISCELLANEOUS
#---------------------------------------------------------------------	
#	(NOTE THIS FORMAT HAS BUILTIN EXTRACTION FUNCTION)
#	zip	y	10000000	PK\x03\x04	\x3c\xac
#	(NOTE THIS FORMAT HAS BUILTIN EXTRACTION FUNCTION)
#	rar	y	10000000	Rar!
#
#	java	y	1000000	\xca\xfe\xba\xbe
#
#	cpp	y	20000	#include	#include	ASCII
#---------------------------------------------------------------------	
# ScanSoft PaperPort "Max" files
#---------------------------------------------------------------------	
#      max   y     1000000    \x56\x69\x47\x46\x6b\x1a\x00\x00\x00\x00   \x00\x00\x05\x80\x00\x00 
#---------------------------------------------------------------------	
# PINs Password Manager program
#---------------------------------------------------------------------	
#      pins  y     8000     \x50\x49\x4e\x53\x20\x34\x2e\x32\x30\x0d


================================================
FILE: helpers.c
================================================

	 /* MD5DEEP - helpers.c
 *
 * By Jesse Kornblum
 *
 * This is a work of the US Government. In accordance with 17 USC 105,
 * copyright protection is not available for any work of the US Government.
 *
 * This program is distributed in the hope that it will be useful, but
 * WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 *
 */

#include "main.h"

/* Removes any newlines at the end of the string buf.
   Works for both *nix and Windows styles of newlines.
   Returns the new length of the string. */
unsigned int chop (char *buf)
	{

	/* Windows newlines are 0x0d 0x0a, *nix are 0x0a */
	unsigned int	len = strlen(buf);
	if (buf[len - 1] == 0x0a)
		{
		if (buf[len - 2] == 0x0d)
			{
			buf[len - 2] = buf[len - 1];
			}
		buf[len - 1] = buf[len];
		}
	return strlen(buf);
	}

char *units(unsigned int c)
{
	switch (c)
		{
		case 0:		return "B";
		case 1:		return "KB";
		case 2:		return "MB";
		case 3:		return "GB";
		case 4:		return "TB";
		case 5:		return "PB";
		case 6:		return "EB";
		/* Steinbach's Guideline for Systems Programming:
       Never test for an error condition you don't know how to handle.

       Granted, given that no existing system can handle anything 
       more than 18 exabytes, this shouldn't be an issue. But how do we
       communicate that 'this shouldn't happen' to the user? */
		default:	return "??";
		}
}

char *human_readable(off_t size, char *buffer)
{
	unsigned int	count = 0;
	while (size > 1024)
		{
		size /= 1024;
		++count;
		}

	/* The size will be, at most, 1023, and the units will be
     two characters no matter what. Thus, the maximum length of
     this string is six characters. e.g. strlen("1023 EB") = 6 */
	if (sizeof(off_t) == 4)
		{
		snprintf(buffer, 8, "%u %s", (unsigned int)size, units(count));
		}
	else if (sizeof(off_t) == 8)
		{
		snprintf(buffer, 8, "%llu %s", (u_int64_t) size, units(count));
		}

	return buffer;
}

char *current_time(void)
{
	time_t	now = time(NULL);
	char	*ascii_time = ctime(&now);
	chop(ascii_time);
	return ascii_time;
}

/* Shift the contents of a string so that the values after 'new_start'
   will now begin at location 'start' */
void shift_string(char *fn, int start, int new_start)
{
	if (start < 0 || start > strlen(fn) || new_start < 0 || new_start < start)
		return;

	while (new_start < strlen(fn))
		{
		fn[start] = fn[new_start];
		new_start++;
		start++;
		}

	fn[start] = 0;
}

void make_magic(void)
{
	printf("%s%s",
		   "\x53\x41\x4E\x20\x44\x49\x4D\x41\x53\x20\x48\x49\x47\x48\x20\x53\x43\x48\x4F\x4F\x4C\x20\x46\x4F\x4F\x54\x42\x41\x4C\x4C\x20\x52\x55\x4C\x45\x53\x21",
	   NEWLINE);
}

#if defined(__UNIX)

/* Return the size, in bytes of an open file stream. On error, return 0 */
	#if defined(__LINUX)

off_t find_file_size(FILE *f)
{
	off_t		num_sectors = 0;
	int			fd = fileno(f);
	struct stat sb;

	if (fstat(fd, &sb))
		{
		return 0;
		}

	if (S_ISREG(sb.st_mode) || S_ISDIR(sb.st_mode))
		return sb.st_size;
	else if (S_ISCHR(sb.st_mode) || S_ISBLK(sb.st_mode))
		{
		if (ioctl(fd, BLKGETSIZE, &num_sectors))
		{
		#if defined(__DEBUG)
			fprintf(stderr, "%s: ioctl call to BLKGETSIZE failed.%s", __progname, NEWLINE);
		#endif
		}
		else
			return (num_sectors * 512);
		}

	return 0;
}

	#elif defined(__MACOSX)

		#include <stdint.h>
		#include <sys/ioctl.h>
		#include <sys/disk.h>

off_t find_file_size(FILE *f)
{
		#ifdef DEBUG
	printf("	FIND MAC file size\n");
		#endif
	return 0;	/*FIX ME this function causes strange problems on MACOSX, so for now return 0*/
	struct stat info;
	off_t		total = 0;
	off_t		original = ftello(f);
	int			ok = TRUE, fd = fileno(f);

	/* I'd prefer not to use fstat as it will follow symbolic links. We don't
     follow symbolic links. That being said, all symbolic links *should*
     have been caught before we got here. */
	fstat(fd, &info);

	/* Block devices, like /dev/hda, don't return a normal filesize.
     If we are working with a block device, we have to ask the operating
     system to tell us the true size of the device. 
     
     The following only works on Linux as far as I know. If you know
     how to port this code to another operating system, please contact
     the current maintainer of this program! */
	if (S_ISBLK(info.st_mode))
		{
		daddr_t blocksize = 0;
		daddr_t blockcount = 0;

		/* Get the block size */
		if (ioctl(fd, DKIOCGETBLOCKSIZE, blocksize) < 0)
			{
			ok = FALSE;
		#if defined(__DEBUG)
			perror("DKIOCGETBLOCKSIZE failed");
		#endif
			}

		/* Get the number of blocks */
		if (ok)
			{
			if (ioctl(fd, DKIOCGETBLOCKCOUNT, blockcount) < 0)
			{
		#if defined(__DEBUG)
				perror("DKIOCGETBLOCKCOUNT failed");
		#endif
			}
			}

		total = blocksize * blockcount;

		}

	else
		{

		/* I don't know why, but if you don't initialize this value you'll
       get wildly innacurate results when you try to run this function */
		if ((fseeko(f, 0, SEEK_END)))
			return 0;
		total = ftello(f);
		if ((fseeko(f, original, SEEK_SET)))
			return 0;
		}

	return (total - original);
}

	#else

/* This is code for general UNIX systems 
   (e.g. NetBSD, FreeBSD, OpenBSD, etc) */
static off_t midpoint(off_t a, off_t b, long blksize)
{
	off_t	aprime = a / blksize;
	off_t	bprime = b / blksize;
	off_t	c, cprime;

	cprime = (bprime - aprime) / 2 + aprime;
	c = cprime * blksize;

	return c;
}

off_t find_dev_size(int fd, int blk_size)
{

	off_t	curr = 0, amount = 0;
	void	*buf;

	if (blk_size == 0)
		return 0;

	buf = malloc(blk_size);

	for (;;)
		{
		ssize_t nread;

		lseek(fd, curr, SEEK_SET);
		nread = read(fd, buf, blk_size);
		if (nread < blk_size)
			{
			if (nread <= 0)
				{
				if (curr == amount)
					{
					free(buf);
					lseek(fd, 0, SEEK_SET);
					return amount;
					}

				curr = midpoint(amount, curr, blk_size);
				}
			else
				{	/* 0 < nread < blk_size */
				free(buf);
				lseek(fd, 0, SEEK_SET);
				return amount + nread;
				}
			}
		else
			{
			amount = curr + blk_size;
			curr = amount * 2;
			}
		}

	free(buf);
	lseek(fd, 0, SEEK_SET);
	return amount;
}

off_t find_file_size(FILE *f)
{
	int			fd = fileno(f);
	struct stat sb;
	return 0;		/*FIX ME SOLARIS FILE SIZE CAUSES SEG FAULT, for now just return 0*/

	if (fstat(fd, &sb))
		return 0;

	if (S_ISREG(sb.st_mode) || S_ISDIR(sb.st_mode))
		return sb.st_size;
	else if (S_ISCHR(sb.st_mode) || S_ISBLK(sb.st_mode))
		return find_dev_size(fd, sb.st_blksize);

	return 0;
}

	#endif /* UNIX Flavors */
#endif /* ifdef __UNIX */

#if defined(__WIN32)
off_t find_file_size(FILE *f)
{
	off_t	total = 0, original = ftello(f);

	if ((fseeko(f, 0, SEEK_END)))
		return 0;

	total = ftello(f);
	if ((fseeko(f, original, SEEK_SET)))
		return 0;

	return total;
}

#endif /* ifdef __WIN32 */

void print_search_specs(f_state *s)
{
	int i = 0;
	int j = 0;
	printf("\nDUMPING BUILTIN SEARCH INFO\n\t");
	for (i = 0; i < s->num_builtin; i++)
		{

		printf("%s:\n\t footer_len:=%d, header_len:=%d, max_len:=%llu ",
			   search_spec[i].suffix,
			   search_spec[i].footer_len,
			   search_spec[i].header_len,
			   search_spec[i].max_len);
		printf("\n\t header:\t");
		printx(search_spec[i].header, 0, search_spec[i].header_len);
		printf("\t footer:\t");
		printx(search_spec[i].footer, 0, search_spec[i].footer_len);
		for (j = 0; j < search_spec[i].num_markers; j++)
			{
			printf("\tmarker: \t");
			printx(search_spec[i].markerlist[j].value, 0, search_spec[i].markerlist[j].len);
			}

		}

}

void print_stats(f_state *s)
{
	int i = 0;
	audit_msg(s, "\n%d FILES EXTRACTED\n\t", s->fileswritten);
	for (i = 0; i < s->num_builtin; i++)
		{

		if (search_spec[i].found != 0)
			{
			if (search_spec[i].type == OLE)
				search_spec[i].suffix = "ole";
			else if (search_spec[i].type == RIFF)
				search_spec[i].suffix = "rif";
			else if (search_spec[i].type == ZIP)
				search_spec[i].suffix = "zip";
			audit_msg(s, "%s:= %d", search_spec[i].suffix, search_spec[i].found);
			}
		}
}

int charactersMatch(char a, char b, int caseSensitive)
{

	//if(a==b) return 1;
	if (a == wildcard || a == b)
		return 1;
	if (caseSensitive || (a < 'A' || a > 'z' || b < 'A' || b > 'z'))
		return 0;

	/* This line is equivalent to (abs(a-b)) == 'a' - 'A' */
	return (abs(a - b) == 32);
}

int memwildcardcmp(const void *s1, const void *s2, size_t n, int caseSensitive)
{
	if (n != 0)
		{
		register const unsigned char	*p1 = s1, *p2 = s2;
		do
			{
			if (!charactersMatch(*p1++, *p2++, caseSensitive))
				return (*--p1 -*--p2);
			}
		while (--n != 0);
		}

	return (0);
}

void printx(unsigned char *buf, int start, int end)
{
	int i = 0;
	for (i = start; i < end; i++)
		{
		printf("%x ", buf[i]);
		}

	printf("\n");
}

char *reverse_string(char *to, char *from, int startLocation, int endLocation)
{
	int i = endLocation;
	int j = 0;
	for (j = startLocation; j < endLocation; j++)
		{
		i--;
		to[j] = from[i];
		}

	return to;
}

unsigned short htos(unsigned char s[], int endian)
{

	unsigned char	*bytes = (unsigned char *)malloc(sizeof(unsigned short) * sizeof(char));
	unsigned short	size = 0;
	char			temp = 'x';
	bytes = memcpy(bytes, s, sizeof(short));

	if (endian == FOREMOST_BIG_ENDIAN && BYTE_ORDER == LITTLE_ENDIAN)
		{

		//printf("switching the byte order\n");
		temp = bytes[0];
		bytes[0] = bytes[1];
		bytes[1] = temp;

		}
	else if (endian == FOREMOST_LITTLE_ENDIAN && BYTE_ORDER == BIG_ENDIAN)
		{
		temp = bytes[0];
		bytes[0] = bytes[1];
		bytes[1] = temp;
		}

	size = *((unsigned short *)bytes);
	free(bytes);
	return size;
}

unsigned int htoi(unsigned char s[], int endian)
{

	int				length = sizeof(int);
	unsigned char	*bytes = (unsigned char *)malloc(length * sizeof(char));
	unsigned int	size = 0;

	bytes = memcpy(bytes, s, length);

	if (endian == FOREMOST_BIG_ENDIAN && BYTE_ORDER == LITTLE_ENDIAN)
		{

		bytes = (unsigned char *)reverse_string((char *)bytes, (char *)s, 0, length);
		}
	else if (endian == FOREMOST_LITTLE_ENDIAN && BYTE_ORDER == BIG_ENDIAN)
		{

		bytes = (unsigned char *)reverse_string((char *)bytes, (char *)s, 0, length);
		}

	size = *((unsigned int *)bytes);

	free(bytes);
	return size;
}

u_int64_t htoll(unsigned char s[], int endian)
{
	int				length = sizeof(u_int64_t);
	unsigned char	*bytes = (unsigned char *)malloc(length * sizeof(char));
	u_int64_t	size = 0;
	bytes = memcpy(bytes, s, length);
#ifdef DEBUG
	printf("htoll len=%d endian=%d\n",length,endian);
#endif	
	if (endian == FOREMOST_BIG_ENDIAN && BYTE_ORDER == LITTLE_ENDIAN)
		{
#ifdef DEBUG
		printf("reverse0\n");
#endif
		bytes = (unsigned char *)reverse_string((char *)bytes, (char *)s, 0, length);
		}
	else if (endian == FOREMOST_LITTLE_ENDIAN && BYTE_ORDER == BIG_ENDIAN)
		{
#ifdef DEBUG
	printf("reverse1\n");
#endif
		bytes = (unsigned char *)reverse_string((char *)bytes, (char *)s, 0, length);
		}

	size = *((u_int64_t *)bytes);
#ifdef DEBUG
	printf("htoll size=%llu\n",size);
	printx(bytes,0,length);
#endif	
	

	free(bytes);
	return size;
}

/* display Position: Tell the user how far through the infile we are */
int displayPosition(f_state *s, f_info *i, u_int64_t pos)
{

	int			percentDone = 0;
	static int	last_val = 0;
	int			count;
	int			flag = FALSE;
	int			factor = 4;
	int			multiplier = 25;
	int			number_of_stars = 0;
	char		buffer[256];
	long double skip = s->skip * s->block_size;

	long double tot_bytes = (long double)((i->total_bytes));
	tot_bytes -= skip;
	if (i->total_bytes > 0)
		{
		percentDone = (((long double)pos) / ((long double)tot_bytes)) * 100;
		if (percentDone != last_val)
			flag = TRUE;
		last_val = percentDone;
		}
	else
		{
		flag = TRUE;
		factor = 4;
		multiplier = 25;
		}

	if (flag)
		{
		number_of_stars = percentDone / factor;

		printf("%s: |", s->input_file);
		for (count = 0; count < number_of_stars; count++)
			{
			printf("*");
			}

		for (count = 0; count < (multiplier - number_of_stars); count++)
			{
			printf(" ");
			}

		if (i->total_bytes > 0)
			{
			printf("|\t %d%% done\n", percentDone);
			}
		else
			{
			printf("|\t %s done\n", human_readable(pos, buffer));

			}
		}

	if (percentDone == 100)
		{
		last_val = 0;
		}

	return TRUE;
}


================================================
FILE: main.c
================================================



/* FOREMOST
 *
 * By Jesse Kornblum and Kris Kendall
 * 
 * This is a work of the US Government. In accordance with 17 USC 105,
 * copyright protection is not available for any work of the US Government.
 *
 * This program is distributed in the hope that it will be useful, but
 * WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 *
 *
 */
#include "main.h"

#ifdef __WIN32

/* Allows us to open standard input in binary mode by default 
   See http://gnuwin32.sourceforge.net/compile.html for more */
int _CRT_fmode = _O_BINARY;
#endif

void catch_alarm(int signum)
{
	signal_caught = signum;
	signal(signum, catch_alarm);
}

void register_signal_handler(void)
{
	signal_caught = 0;

	if (signal(SIGINT, catch_alarm) == SIG_IGN)
		signal(SIGINT, SIG_IGN);
	if (signal(SIGTERM, catch_alarm) == SIG_IGN)
		signal(SIGTERM, SIG_IGN);

#ifndef __WIN32

	/* Note: I haven't found a way to get notified of
     console resize events in Win32.  Right now the statusbar
     will be too long or too short if the user decides to resize
     their console window while foremost runs.. */

	/* RBF - Handle TTY events  */

	// The function setttywidth is in the old helpers.c
	// signal(SIGWINCH, setttywidth);
#endif
}

void try_msg(void)
{
	fprintf(stderr, "Try `%s -h` for more information.%s", __progname, NEWLINE);
}

/* The usage function should, at most, display 22 lines of text to fit
   on a single screen */
void usage(void)
{
	fprintf(stderr, "%s version %s by %s.%s", __progname, VERSION, AUTHOR, NEWLINE);
	fprintf(stderr,
			"%s %s [-v|-V|-h|-T|-Q|-q|-a|-w-d] [-t <type>] [-s <blocks>] [-k <size>] \n\t[-b <size>] [-c <file>] [-o <dir>] [-i <file] %s%s",
		CMD_PROMPT,
			__progname,
			NEWLINE,
			NEWLINE);
	fprintf(stderr, "-V  - display copyright information and exit%s", NEWLINE);
	fprintf(stderr, "-t  - specify file type.  (-t jpeg,pdf ...) %s", NEWLINE);
	fprintf(stderr, "-d  - turn on indirect block detection (for UNIX file-systems) %s", NEWLINE);
	fprintf(stderr, "-i  - specify input file (default is stdin) %s", NEWLINE);
	fprintf(stderr,
			"-a  - Write all headers, perform no error detection (corrupted files) %s",
			NEWLINE);
	fprintf(stderr,
			"-w  - Only write the audit file, do not write any detected files to the disk %s",
			NEWLINE);
	fprintf(stderr,
			"-o  - set output directory (defaults to %s)%s",
			DEFAULT_OUTPUT_DIRECTORY,
			NEWLINE);
	fprintf(stderr,
			"-c  - set configuration file to use (defaults to %s)%s",
			DEFAULT_CONFIG_FILE,
			NEWLINE);
	fprintf(stderr,
			"-q  - enables quick mode. Search are performed on 512 byte boundaries.%s",
			NEWLINE);
	fprintf(stderr, "-Q  - enables quiet mode. Suppress output messages. %s", NEWLINE);

	/* RBF - What should verbose mode be? */
	fprintf(stderr, "-v  - verbose mode. Logs all messages to screen%s", NEWLINE);
}

void process_command_line(int argc, char **argv, f_state *s)
{

	int		i;
	char	*ptr1, *ptr2;

	while ((i = getopt(argc, argv, "o:b:c:t:s:i:k:hqmQTadvVw")) != -1)
		{
		switch (i)
			{

			case 'v':
				set_mode(s, mode_verbose);
				break;

			case 'd':
				set_mode(s, mode_ind_blk);
				break;

			case 'w':
				set_mode(s, mode_write_audit);	/*Only write audit*/
				break;

			case 'a':
				set_mode(s, mode_write_all);	/*Write all headers*/
				break;

			case 'b':
				set_block(s, atoi(optarg));
				break;

			case 'o':
				set_output_directory(s, optarg);
				break;

			case 'q':
				set_mode(s, mode_quick);
				break;

			case 'Q':
				set_mode(s, mode_quiet);
				break;

			case 'c':
				set_config_file(s, optarg);
				break;

			case 'm':
				set_mode(s, mode_multi_file);

			case 'k':
				set_chunk(s, atoi(optarg));
				break;

			case 's':
				set_skip(s, atoi(optarg));
				break;

			case 'i':
				set_input_file(s, optarg);
				break;

			case 'T':
				s->time_stamp = TRUE;
				break;

			case 't':

				/*See if we have multiple file types to define*/
				ptr1 = ptr2 = optarg;
				while (1)
					{
					if (!*ptr2)
						{
						if (!set_search_def(s, ptr1, 0))
							{
							usage();
							exit(EXIT_SUCCESS);
							}
						break;
						}

					if (*ptr2 == ',')
						{
						*ptr2 = '\0';
						if (!set_search_def(s, ptr1, 0))
							{
							usage();
							exit(EXIT_SUCCESS);
							}

						*ptr2++ = ',';
						ptr1 = ptr2;
						}
					else
						{
						ptr2++;
						}
					}
				break;

			case 'h':
				usage();
				exit(EXIT_SUCCESS);

			case 'V':
				printf("%s%s", VERSION, NEWLINE);

				/* We could just say printf(COPYRIGHT), but that's a good way
	 to introduce a format string vulnerability. Better to always
	 use good programming practice... */
				printf("%s", COPYRIGHT);
				exit(EXIT_SUCCESS);

			default:
				try_msg();
				exit(EXIT_FAILURE);

			}

		}

#ifdef __DEBUG
	dump_state(s);
#endif

}

int main(int argc, char **argv)
{

	FILE	*testFile = NULL;
	f_state *s = (f_state *)malloc(sizeof(f_state));
	int		input_files = 0;
	char	**temp = argv;
	DIR* 	dir;

#ifndef __GLIBC__
	__progname = basename(argv[0]);
#endif

	/*Initialize the global state struct*/
	if (initialize_state(s, argc, argv))
		fatal_error(s, "Unable to initialize state");

	register_signal_handler();
	process_command_line(argc, argv, s);

	load_config_file(s);

	if (s->num_builtin == 0)
		{

		/*Nothing specified via the command line or the conf
	file so default to all builtin search types*/
		set_search_def(s, "all", 0);
		}
	
	if (create_output_directory(s))
		fatal_error(s, "Unable to open output directory");	

	if (!get_mode(s, mode_write_audit))
		{
		create_sub_dirs(s);
		}

	if (open_audit_file(s))
		fatal_error(s, "Can't open audit file");

	/* Scan for valid files to open */
	while (*argv != NULL)
	{
		if(strcmp(*argv,"-c")==0)
		{
			/*jump past the conf file so we don't process it.*/
			argv+=2;
		}
		testFile = fopen(*argv, "rb");
		if (testFile)
		{
			fclose(testFile);
			dir = opendir(*argv);
			
			if(!strstr(s->config_file,*argv)!=0 && !dir)
			{
				input_files++;
			}
			
			if(dir) closedir(dir);		
		}

		++argv;
	}

	argv = temp;
	if (input_files > 1)
		{
		set_mode(s, mode_multi_file);
		}

	++argv;
	while (*argv != NULL)
		{
		testFile = fopen(*argv, "rb");

		if (testFile)
			{
				fclose(testFile);
				dir = opendir(*argv);
				if(!strstr(s->config_file,*argv)!=0 && !dir)
				{
					set_input_file(s, *argv);
					process_file(s);
				}
				if(dir) closedir(dir);	
			}

		++argv;
		}

	if (input_files == 0)
		{

		//printf("using stdin\n");
		process_stdin(s);
		}

	print_stats(s);

	/*Lets try to clean up some of the extra sub_dirs*/
	cleanup_output(s);

	if (close_audit_file(s))
		{

		/* Hells bells. This is bad, but really, what can we do about it? 
       Let's just report the error and try to get out of here! */
		print_error(s, AUDIT_FILE_NAME, "Error closing audit file");
		}

	free_state(s);
	free(s);
	return EXIT_SUCCESS;
}


================================================
FILE: main.h
================================================

/* FOREMOST
 *
 * By Jesse Kornblum
 *
 * This is a work of the US Government. In accordance with 17 USC 105,
 * copyright protection is not available for any work of the US Government.
 *
 * This program is distributed in the hope that it will be useful, but
 * WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 *
 */
 
//#define DEBUG 1
   
#ifndef __FOREMOST_H
#define __FOREMOST_H

/* Version information is defined in the Makefile */

#define AUTHOR      "Jesse Kornblum, Kris Kendall, and Nick Mikus"

/* We use \r\n for newlines as this has to work on Win32. It's redundant for
   everybody else, but shouldn't cause any harm. */
#define COPYRIGHT   "This program is a work of the US Government. "\
"In accordance with 17 USC 105,\r\n"\
"copyright protection is not available for any work of the US Government.\r\n"\
"This is free software; see the source for copying conditions. There is NO\r\n"\
"warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.\r\n"

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <dirent.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
#include <time.h>
#include <math.h>
#include <ctype.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <signal.h>

/* For va_arg */
#include <stdarg.h>

#ifdef __LINUX
#include <sys/ioctl.h>
#include <sys/mount.h>
#define   u_int64_t   unsigned long long
#endif 


#ifdef __LINUX

#ifndef __USE_BSD
#define __USE_BSD
#endif
#include <endian.h>

#elif defined (__SOLARIS)

#define BIG_ENDIAN    4321
#define LITTLE_ENDIAN 1234

#include <sys/isa_defs.h>
#ifdef _BIG_ENDIAN       
#define BYTE_ORDER BIG_ENDIAN
#else
#define BYTE_ORDER LITTLE_ENDIAN
#endif

#elif defined (__WIN32)
#include <sys/param.h>

#elif defined (__MACOSX)
#include <machine/endian.h>
#define __U16_TYPE unsigned short
#endif


#define TRUE   1
#define FALSE  0
#define ONE_MEGABYTE  1048576


/* RBF - Do we need these type definitions? */ 
#ifdef __SOLARIS
#define   u_int32_t   unsigned int
#define   u_int64_t   unsigned long long
#endif 


/* The only time we're *not* on a UNIX system is when we're on Windows */
#ifndef __WIN32
#ifndef __UNIX
#define __UNIX
#endif  /* ifndef __UNIX */
#endif  /* ifndef __WIN32 */


#ifdef __UNIX

#ifndef __U16_TYPE
#define __U16_TYPE unsigned short
#endif

#include <libgen.h>

#ifndef BYTE_ORDER 

#define BIG_ENDIAN    4321
#define LITTLE_ENDIAN 1234

#define BYTE_ORDER LITTLE_ENDIAN

#endif
/* This avoids compiler warnings on older systems */
int fseeko(FILE *stream, off_t offset, int whence);
off_t ftello(FILE *stream);


#define CMD_PROMPT "$"
#define DIR_SEPARATOR   '/'
#define NEWLINE "\n"
#define LINE_LENGTH 74
#define BLANK_LINE \
"                                                                          "

#endif /* #ifdef __UNIX */

/* This allows us to open standard input in binary mode by default 
   See http://gnuwin32.sourceforge.net/compile.html for more */
#include <fcntl.h>

/* Code specific to Microsoft Windows */
#ifdef __WIN32

/* By default, Windows uses long for off_t. This won't do. We
   need an unsigned number at minimum. Windows doesn't have 64 bit
   numbers though. */
#ifdef off_t
#undef off_t
#endif
#define off_t unsigned long

#define CMD_PROMPT "c:\\>"
#define  DIR_SEPARATOR   '\\'
#define NEWLINE "\r\n"
#define LINE_LENGTH 72
#define BLANK_LINE \
"                                                                        "


/* It would be nice to use 64-bit file lengths in Windows */
#define ftello   ftell
#define fseeko   fseek

#ifndef __CYGWIN
#define  snprintf         _snprintf
#endif

#define  u_int32_t        unsigned long

/* We create macros for the Windows equivalent UNIX functions.
   No worries about lstat to stat; Windows doesn't have symbolic links */
#define lstat(A,B)      stat(A,B)

#define u_int64_t unsigned __int64

#ifndef __CYGWIN
	#define realpath(A,B)   _fullpath(B,A,PATH_MAX) 
#endif
/* Not used in md5deep anymore, but left in here in case I 
   ever need it again. Win32 documentation searches are evil.
   int asprintf(char **strp, const char *fmt, ...);
*/

char *basename(char *a);
extern char *optarg;
extern int optind;
int getopt(int argc, char *const argv[], const char *optstring);

#endif   /* ifdef _WIN32 */


/* On non-glibc systems we have to manually set the __progname variable */
#ifdef __GLIBC__
extern char *__progname;
#else
char *__progname;
#endif /* ifdef __GLIBC__ */

/* -----------------------------------------------------------------
   Program Defaults
   ----------------------------------------------------------------- */
#define MAX_STRING_LENGTH   1024
#define COMMENT_LENGTH   64

/* Modes refer to options that can be set by the user. */

#define mode_none                0
#define mode_verbose          1<<1
#define mode_quiet            1<<2
#define mode_ind_blk          1<<3
#define mode_quick            1<<4
#define mode_write_all        1<<5
#define mode_write_audit      1<<6
#define mode_multi_file	      1<<7

#define MAX_NEEDLES                   254
#define NUM_SEARCH_SPEC_ELEMENTS        6
#define MAX_SUFFIX_LENGTH               8
#define MAX_FILE_TYPES                100
#define FOREMOST_NOEXTENSION_SUFFIX "NONE"
/* Modes 3 to 31 are reserved for future use. We shouldn't use
   modes higher than 31 as Win32 can't go that high. */

#define DEFAULT_MODE              mode_none
#define DEFAULT_CONFIG_FILE       "foremost.conf"
#define DEFAULT_OUTPUT_DIRECTORY  "output"
#define AUDIT_FILE_NAME           "audit.txt"
#define FOREMOST_DIVIDER          "------------------------------------------------------------------"

#define JPEG 0
#define GIF 1
#define BMP 2
#define MPG 3
#define PDF 4
#define DOC 5
#define AVI 6
#define WMV 7
#define HTM 8
#define ZIP 9
#define MOV 10
#define XLS 11
#define PPT 12
#define WPD 13
#define CPP 14
#define OLE 15
#define GZIP 16
#define RIFF 17
#define WAV 18
#define VJPEG 19
#define SXW 20
#define SXC 21
#define SXI 22
#define CONF 23
#define PNG 24
#define RAR 25
#define EXE 26
#define ELF 27
#define REG 28
#define DOCX 29
#define XLSX 30
#define PPTX 31
#define MP4 32


#define KILOBYTE                  1024
#define MEGABYTE                  1024 * KILOBYTE
#define GIGABYTE                  1024 * MEGABYTE
#define TERABYTE                  1024 * GIGABYTE
#define PETABYTE                  1024 * TERABYTE
#define EXABYTE                   1024 * PETABYTE

#define UNITS_BYTES                     0
#define UNITS_KILOB                     1
#define UNITS_MEGAB                     2
#define UNITS_GIGAB                     3
#define UNITS_TERAB                     4
#define UNITS_PETAB                     5
#define UNITS_EXAB                      6

#define SEARCHTYPE_FORWARD      0
#define SEARCHTYPE_REVERSE      1
#define SEARCHTYPE_FORWARD_NEXT 2
#define SEARCHTYPE_ASCII        3

#define FOREMOST_BIG_ENDIAN 0
#define FOREMOST_LITTLE_ENDIAN 1
/*DEFAULT CHUNK SIZE In MB*/
#define CHUNK_SIZE 100 


/* Wildcard is a global variable because it's used by very simple
   functions that don't need the whole state passed to them */

/* -----------------------------------------------------------------
   State Variable and Global Variables
   ----------------------------------------------------------------- */
char wildcard;
typedef struct f_state 
{
  off_t mode;
  char *config_file;
  char *input_file;
  char *output_directory;
  char *start_time;
  char *invocation;
  char *audit_file_name;
  FILE *audit_file;
  int audit_file_open;
  int num_builtin;
  int chunk_size; /*IN MB*/
  int fileswritten;
  int block_size;
  int skip;
  
  int time_stamp;
} f_state;

typedef struct marker
{
    unsigned char* value;
    int len;
    size_t marker_bm_table[UCHAR_MAX+1];
}marker;

typedef struct s_spec
{
    char* suffix;
    int type;
    u_int64_t max_len;
    unsigned char* header;
    unsigned int header_len;
    size_t header_bm_table[UCHAR_MAX+1];

    unsigned char* footer;
    unsigned int footer_len;
    size_t footer_bm_table[UCHAR_MAX+1];
    marker markerlist[5];
    int num_markers;
    int searchtype;                               

    int case_sen;
    
    int found;
    
    char comment[MAX_STRING_LENGTH];/*Used for audit*/
    int written; /*used for -a mode*/
}s_spec;

s_spec search_spec[50];  /*ARRAY OF BUILTIN SEARCH TYPES*/

typedef struct f_info {
  char *file_name;
  off_t total_bytes;

  /* We never use the total number of bytes in a file, 
     only the number of megabytes when we display a time estimate */
  off_t total_megs;
  off_t bytes_read;

#ifdef __WIN32
  /* Win32 is a 32-bit operating system and can't handle file sizes
     larger than 4GB. We use this to keep track of overflows */
  off_t last_read;
  off_t overflow_count;
#endif

  FILE *handle;
  int is_stdin;
} f_info;

/* Set if the user hits ctrl-c */
int signal_caught;

/* -----------------------------------------------------------------
   Function definitions
   ----------------------------------------------------------------- */

/* State functions */

int initialize_state(f_state *s, int argc, char **argv);
void free_state(f_state *s);

char *get_invocation(f_state *s);
char *get_start_time(f_state *s);

int set_config_file(f_state *s, char *fn);
char* get_config_file(f_state *s);

int set_output_directory(f_state *s, char *fn);
char* get_output_directory(f_state *s);

void set_audit_file_open(f_state *s);
int get_audit_file_open(f_state *s);

void set_mode(f_state *s, off_t new_mode);
int get_mode(f_state *s, off_t check_mode);

int set_search_def(f_state *s,char* ft,u_int64_t max_file_size);
void get_search_def(f_state s);

void set_input_file(f_state *s,char* filename);
void get_input_file(f_state *s);

void set_chunk(f_state *s, int size);

void init_bm_table(unsigned char *needle, size_t table[UCHAR_MAX + 1], size_t len, int casesensitive,int searchtype);

void set_skip(f_state *s, int size);
void set_block(f_state *s, int size);


#ifdef __DEBUG
void dump_state(f_state *s);
#endif

/* The audit file */
int open_audit_file(f_state *s);
void audit_msg(f_state *s, char *format, ...);
int close_audit_file(f_state *s);


/* Set up our output directory */
int create_output_directory(f_state *s);
int write_to_disk(f_state *s,s_spec * needle,u_int64_t len,unsigned char* buf,  u_int64_t t_offset);
int create_sub_dirs(f_state *s);
void cleanup_output(f_state *s);

/* Configuration Files */
int load_config_file(f_state *s);


/* Helper functions */
char *current_time(void);
off_t find_file_size(FILE *f);
char *human_readable(off_t size, char *buffer);
char *units(unsigned int c);
unsigned int chop(char *buf);
void print_search_specs(f_state *s);
int memwildcardcmp(const void *s1, const void *s2,size_t n,int caseSensitive);
int charactersMatch(char a, char b, int caseSensitive);
void printx(unsigned char* buf,int start, int end);
unsigned short htos(unsigned char s[],int endian);
unsigned int htoi(unsigned char s[],int endian);
u_int64_t htoll(unsigned char s[],int endian);
int displayPosition(f_state* s,f_info* i,u_int64_t pos);


/* Interface functions 
   These functions stay the same regardless if we're using a
   command line interface or a GUI */
void fatal_error(f_state *s, char *msg);
void print_error(f_state *s, char *fn, char *msg);
void print_message(f_state *s, char *format, va_list argp);
void print_stats(f_state *s);

/* Engine */
int process_file(f_state *s);
int process_stdin(f_state *s);
unsigned char *bm_search(unsigned char *needle, size_t needle_len,unsigned char *haystack, size_t haystack_len,
	size_t table[UCHAR_MAX + 1], int case_sen,int searchtype);
unsigned char *bm_search_skipn(unsigned char *needle, size_t needle_len,unsigned char *haystack, size_t haystack_len,
	size_t table[UCHAR_MAX + 1], int casesensitive,int searchtype, int start_pos) ;	
#endif /* __FOREMOST_H */

/* BUILTIN */
unsigned char* extract_file(f_state *s,  u_int64_t c_offset,unsigned char *foundat,  u_int64_t buflen, s_spec * needle, u_int64_t f_offset);







================================================
FILE: ole.h
================================================
#define TRUE			1
#define FALSE			0
#define SPECIAL_BLOCK	- 3
#define END_OF_CHAIN	- 2
#define UNUSED			- 1

#define NO_ENTRY		0
#define STORAGE			1
#define STREAM			2
#define ROOT			5
#define SHORT_BLOCK		3

#define FAT_START		0x4c
#define OUR_BLK_SIZE	512
#define DIRS_PER_BLK	4
#ifndef __CYGWIN
	#define MIN(x, y)	((x) < (y) ? (x) : (y))
#endif

#include <stdarg.h>
#include <string.h>
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <ctype.h>

struct OLE_HDR
{
	char			magic[8];				/*0*/
	char			clsid[16];				/*8*/
       __U16_TYPE      uMinorVersion;                  /*24*/
       __U16_TYPE      uDllVersion;                    /*26*/
       __U16_TYPE      uByteOrder;                             /*28*/
       __U16_TYPE      uSectorShift;                   /*30*/
       __U16_TYPE      uMiniSectorShift;               /*32*/
       __U16_TYPE      reserved;                               /*34*/
       u_int32_t       reserved1;                              /*36*/
       u_int32_t       reserved2;                              /*40*/
       u_int32_t       num_FAT_blocks;                 /*44*/
       u_int32_t       root_start_block;               /*48*/
       u_int32_t       dfsignature;                    /*52*/
       u_int32_t       miniSectorCutoff;               /*56*/
       u_int32_t       dir_flag;                               /*60 first sec in the mini fat chain*/
       u_int32_t       csectMiniFat;                   /*64 number of sectors in the minifat */
       u_int32_t       FAT_next_block;                 /*68*/
       u_int32_t       num_extra_FAT_blocks;   /*72*/
	/* FAT block list starts here !! first 109 entries  */
};

struct OLE_DIR
{
	char			name[64];
	unsigned short	namsiz;
	char			type;
	char			bflags;					//0 or 1
	unsigned long	prev_dirent;
	unsigned long	next_dirent;
	unsigned long	dir_dirent;
	char			clsid[16];
	unsigned long	userFlags;
	int				secs1;
	int				days1;
	int				secs2;
	int				days2;
	unsigned long	start_block;			//starting SECT of stream
	unsigned long	size;
	short			reserved;				//must be 0
};

struct DIRECTORY
{
	char	name[64];
	int		type;
	int		level;
	int		start_block;
	int		size;
	int		next;
	int		prev;
	int		dir;
	int		s1;
	int		s2;
	int		d1;
	int		d2;
}
*dirlist, *dl;

int				get_dir_block(unsigned char *fd, int blknum, int buffersize);
int				get_dir_info(unsigned char *src);
void			extract_stream(char *fd, int blknum, int size);
void			dump_header(struct OLE_HDR *h);
int				dump_dirent(int which_one);
int				get_block(unsigned char *fd, int blknum, unsigned char *dest, long long int buffersize);
int				get_FAT_block(unsigned char *fd, int blknum, int *dest, int buffersize);
int				reorder_dirlist(struct DIRECTORY *dir, int level);

unsigned char	*get_ole_block(unsigned char *fd, int blknum, unsigned long long buffersize);
struct OLE_HDR	*reverseBlock(struct OLE_HDR *dest, struct OLE_HDR *h);

void			dump_ole_header(struct OLE_HDR *h);
void			*Malloc(size_t bytes);
void			die(char *fmt, void *arg);
void			init_ole();


================================================
FILE: state.c
================================================


#include "main.h"

int initialize_state (f_state * s, int argc, char **argv)
	{
	char	**argv_copy = argv;

	/* The routines in current_time return statically allocated memory.
     We strdup the result so that we don't accidently free() the wrong
     thing later on. */
	s->start_time = strdup(current_time());
	wildcard = '?';
	s->audit_file_open = FALSE;
	s->mode = DEFAULT_MODE;
	s->input_file = NULL;
	s->fileswritten = 0;
	s->block_size = 512;

	/* We use the setter fuctions here to call realpath */
	set_config_file(s, DEFAULT_CONFIG_FILE);
	set_output_directory(s, DEFAULT_OUTPUT_DIRECTORY);

	s->invocation = (char *)malloc(sizeof(char) * MAX_STRING_LENGTH);
	s->invocation[0] = 0;
	s->chunk_size = CHUNK_SIZE;
	s->num_builtin = 0;
	s->skip = 0;
	s->time_stamp = FALSE;
	do
		{
		strncat(s->invocation, *argv_copy, MAX_STRING_LENGTH - strlen(s->invocation));
		strncat(s->invocation, " ", MAX_STRING_LENGTH - strlen(s->invocation));
		++argv_copy;
		}
	while (*argv_copy);

	return FALSE;
	}

void free_state(f_state *s)
{
	free(s->start_time);
	free(s->output_directory);
	free(s->config_file);
}

int get_audit_file_open(f_state *s)
{
	return (s->audit_file_open);
}

char *get_invocation(f_state *s)
{
	return (s->invocation);
}

char *get_start_time(f_state *s)
{
	return (s->start_time);
}

char *get_config_file(f_state *s)
{
	return (s->config_file);
}

int set_config_file(f_state *s, char *fn)
{
	char	temp[PATH_MAX];

	/* If the configuration file doesn't exist, this realpath will return
     NULL. We don't error check here as the user may specify a file
     that doesn't currently exist */
	realpath(fn, temp);

	/* RBF - Does this create a memory leak? What happens to the old value? */
	s->config_file = strdup(temp);
	return FALSE;
}

char *get_output_directory(f_state *s)
{
	return (s->output_directory);
}

int set_output_directory(f_state *s, char *fn)
{
	char	temp[PATH_MAX];
  int 	fullpathlen=0;
	/* We don't error check here as it's quite possible that the
     output directory doesn't exist yet. If it doesn't, realpath
     resolves the path correctly, but still returns NULL. */
  //strncpy(s->output_directory,fn,PATH_MAX);
  
	realpath(fn, temp);
	fullpathlen=strlen(temp);

	if(fullpathlen!=0)
	{
		s->output_directory = strdup(temp);
	}
	else
	{
		/*Realpath failed just use cwd*/
		s->output_directory = strdup(fn);
	}
	return FALSE;
}

int get_mode(f_state *s, off_t check_mode)
{
	return (s->mode & check_mode);
}

void set_mode(f_state *s, off_t new_mode)
{
	s->mode |= new_mode;
}

void set_chunk(f_state *s, int size)
{
	s->chunk_size = size;
}

void set_skip(f_state *s, int size)
{
	s->skip = size;
}

void set_block(f_state *s, int size)
{
	s->block_size = size;
}

void write_audit_header(f_state *s)
{
	audit_msg(s, "Foremost version %s by %s", VERSION, AUTHOR);
	audit_msg(s, "Audit File");
	audit_msg(s, "");
	audit_msg(s, "Foremost started at %s", get_start_time(s));
	audit_msg(s, "Invocation: %s", get_invocation(s));
	audit_msg(s, "Output directory: %s", get_output_directory(s));
	audit_msg(s, "Configuration file: %s", get_config_file(s));
}

int open_audit_file(f_state *s)
{
	char	fn[MAX_STRING_LENGTH];

	snprintf(fn,
			 MAX_STRING_LENGTH,
			 "%s%c%s",
			 get_output_directory(s),
			 DIR_SEPARATOR,
			 AUDIT_FILE_NAME);

	if ((s->audit_file = fopen(fn, "w")) == NULL)
		{
		print_error(s, fn, strerror(errno));
		fatal_error(s, "Can't open audit file");
		}

	s->audit_file_open = TRUE;
	write_audit_header(s);

	return FALSE;
}

int close_audit_file(f_state *s)
{
	audit_msg(s, FOREMOST_DIVIDER);
	audit_msg(s, "");
	audit_msg(s, "Foremost finished at %s", current_time());

	if (fclose(s->audit_file))
		{
		print_error(s, AUDIT_FILE_NAME, strerror(errno));
		return TRUE;
		}

	return FALSE;
}

void audit_msg(f_state *s, char *format, ...)
{
	va_list argp;
	va_start(argp, format);

	if (get_mode(s, mode_verbose)) {
		print_message(s, format, argp);
		va_end(argp);
		va_start(argp, format);
	}

	vfprintf(s->audit_file, format, argp);
	va_end(argp);

	fprintf(s->audit_file, "%s", NEWLINE);
	fflush(stdout);
}

void set_input_file(f_state *s, char *filename)
{
	s->input_file = (char *)malloc((strlen(filename) + 1) * sizeof(char));
	strncpy(s->input_file, filename, strlen(filename) + 1);
}

/*Initialize any search specs*/
int init_builtin(f_state *s, int type, char *suffix, char *header, char *footer, int header_len,
				 int footer_len, u_int64_t max_len, int case_sen)
{

	int i = s->num_builtin;

	search_spec[i].type = type;
	search_spec[i].suffix = (char *)malloc((strlen(suffix)+1) * sizeof(char));
	search_spec[i].num_markers = 0;
	strcpy(search_spec[i].suffix, suffix);

	search_spec[i].header_len = header_len;
	search_spec[i].footer_len = footer_len;

	search_spec[i].max_len = max_len;
	search_spec[i].found = 0;
	search_spec[i].header = (unsigned char *)malloc(search_spec[i].header_len * sizeof(unsigned char));
	search_spec[i].footer = (unsigned char *)malloc(search_spec[i].footer_len * sizeof(unsigned char));
	search_spec[i].case_sen = case_sen;
	memset(search_spec[i].comment, 0, COMMENT_LENGTH - 1);

	memcpy(search_spec[i].header, header, search_spec[i].header_len);
	memcpy(search_spec[i].footer, footer, search_spec[i].footer_len);

	init_bm_table(search_spec[i].header,
				  search_spec[i].header_bm_table,
				  search_spec[i].header_len,
				  search_spec[i].case_sen,
				  SEARCHTYPE_FORWARD);
	init_bm_table(search_spec[i].footer,
				  search_spec[i].footer_bm_table,
				  search_spec[i].footer_len,
				  search_spec[i].case_sen,
				  SEARCHTYPE_FORWARD);
	s->num_builtin++;

	return i;
}

/*Markers are a method to search for any unique information besides just the header and the footer*/
void add_marker(f_state *s, int index, char *marker, int markerlength)
{
	int i = search_spec[index].num_markers;
	if (marker == NULL)
		{
		search_spec[index].num_markers = 0;
		return;
		}

	search_spec[index].markerlist[i].len = markerlength;
	search_spec[index].markerlist[i].value = (unsigned char *)malloc(search_spec[index].markerlist[i].len * sizeof(unsigned char));

	memcpy(search_spec[index].markerlist[i].value, marker, search_spec[index].markerlist[i].len);
	init_bm_table(search_spec[index].markerlist[i].value,
				  search_spec[index].markerlist[i].marker_bm_table,
				  search_spec[index].markerlist[i].len,
				  TRUE,
				  SEARCHTYPE_FORWARD);
	search_spec[index].num_markers++;
}

/*Initial every search spec we know about*/
void init_all(f_state *state)
{
	int index = 0;
	init_builtin(state, JPEG, "jpg", "\xff\xd8\xff", "\xff\xd9", 3, 2, 20 * MEGABYTE, TRUE);
	index = init_builtin(state, GIF, "gif", "\x47\x49\x46\x38", "\x00\x3b", 4, 2, MEGABYTE, TRUE);
	add_marker(state, index, "\x00\x00\x3b", 3);
	init_builtin(state, BMP, "bmp", "BM", NULL, 2, 0, 2 * MEGABYTE, TRUE);
	init_builtin(state,
				 WMV,
				 "wmv",
				 "\x30\x26\xB2\x75\x8E\x66\xCF\x11",
				 "\xA1\xDC\xAB\x8C\x47\xA9",
				 8,
				 6,
				 40 * MEGABYTE,
				 TRUE);
	init_builtin(state, MOV, "mov", "moov", NULL, 4, 0, 40 * MEGABYTE, TRUE);
	init_builtin(state, MP4, "mp4", "\x00\x00\x00\x1c\x66\x74\x79\x70", NULL, 8, 0, 600 * MEGABYTE, TRUE);
	init_builtin(state, RIFF, "rif", "RIFF", "INFO", 4, 4, 20 * MEGABYTE, TRUE);
	init_builtin(state, HTM, "htm", "<html", "</html>", 5, 7, MEGABYTE, FALSE);
	init_builtin(state,
				 OLE,
				 "ole",
				 "\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1\x00\x00\x00\x00\x00\x00\x00\x00",
				 NULL,
				 16,
				 0,
				 5 * MEGABYTE,
				 TRUE);
	init_builtin(state,
				 ZIP,
				 "zip",
				 "\x50\x4B\x03\x04",
				 "\x4b\x05\x06\x00",
				 4,
				 4,
				 100 * MEGABYTE,
				 TRUE);
	init_builtin(state,
				 RAR,
				 "rar",
				 "\x52\x61\x72\x21\x1A\x07\x00",
				 "\x00\x00\x00\x00\x00\x00\x00\x00",
				 7,
				 8,
				 100 * MEGABYTE,
				 TRUE);
	init_builtin(state, EXE, "exe", "MZ", NULL, 2, 0, 1 * MEGABYTE, TRUE);

	index = init_builtin(state,
						 PNG,
						 "png",
						 "\x89\x50\x4E\x47\x0D\x0A\x1A\x0A",
						 "IEND",
						 8,
						 4,
						 1 * MEGABYTE,
						 TRUE);
	index = init_builtin(state,
						 MPG,
						 "mpg",
						 "\x00\x00\x01\xba",
						 "\x00\x00\x01\xb9",
						 4,
						 4,
						 50 * MEGABYTE,
						 TRUE);
	add_marker(state, index, "\x00\x00\x01", 3);

	index = init_builtin(state, PDF, "pdf", "%PDF-1.", "%%EOF", 7, 5, 40 * MEGABYTE, TRUE);
	add_marker(state, index, "/L ", 3);
	add_marker(state, index, "obj", 3);
	add_marker(state, index, "/Linearized", 11);
	add_marker(state, index, "/Length", 7);
}

/*Process any command line args following the -t switch)*/
int set_search_def(f_state *s, char *ft, u_int64_t max_file_size)
{
	int index = 0;

	if (strcmp(ft, "jpg") == 0 || strcmp(ft, "jpeg") == 0)
		{
		if (max_file_size == 0)
			max_file_size = 20 * MEGABYTE;
		init_builtin(s, JPEG, "jpg", "\xff\xd8\xff", "\xff\xd9", 3, 2, max_file_size, TRUE);
		}
	else if (strcmp(ft, "gif") == 0)
		{
		if (max_file_size == 0)
			max_file_size = 1 * MEGABYTE;
		index = init_builtin(s,
							 GIF,
							 "gif",
							 "\x47\x49\x46\x38",
							 "\x00\x3b",
							 4,
							 2,
							 max_file_size,
							 TRUE);

		add_marker(s, index, "\x00\x00\x3b", 3);
		}
	else if (strcmp(ft, "bmp") == 0)
		{

		if (max_file_size == 0)
			max_file_size = 2 * MEGABYTE;

		init_builtin(s, BMP, "bmp", "BM", NULL, 2, 0, max_file_size, TRUE);
		}
	else if (strcmp(ft, "mp4") == 0)
		{
			init_builtin(s, MP4, "mp4", "\x00\x00\x00\x1c\x66\x74\x79\x70", NULL, 8, 0, 600 * MEGABYTE, TRUE);
		}
	else if (strcmp(ft, "exe") == 0)
		{

		if (max_file_size == 0)
			max_file_size = 1 * MEGABYTE;

		init_builtin(s, EXE, "exe", "MZ", NULL, 2, 0, max_file_size, TRUE);
		}
	else if (strcmp(ft, "elf") == 0)
		{

		if (max_file_size == 0)
			max_file_size = 1 * MEGABYTE;

		init_builtin(s, ELF, "elf", "0x7fELF", NULL, 4, 0, max_file_size, TRUE);
		}	
	else if (strcmp(ft, "reg") == 0)
		{

		if (max_file_size == 0)
			max_file_size = 2 * MEGABYTE;

		init_builtin(s, REG, "reg", "regf", NULL, 4, 0, max_file_size, TRUE);

		}	
	else if (strcmp(ft, "mpg") == 0 || strcmp(ft, "mpeg") == 0)
		{
		if (max_file_size == 0)
			max_file_size = 50 * MEGABYTE;

		//20000000 \x00\x00\x01\xb3      \x00\x00\x01\xb7 //system data
		index = init_builtin(s,
							 MPG,
							 "mpg",
							 "\x00\x00\x01\xba",
							 "\x00\x00\x01\xb9",
							 4,
							 4,
							 max_file_size,
							 TRUE);
		add_marker(s, index, "\x00\x00\x01", 3);

		/*
	    add_marker(s,index,"\x00\x00\x01\xBB",4);
	    add_marker(s,index,"\x00\x00\x01\xBE",4);
	    add_marker(s,index,"\x00\x00\x01\xB3",4);
	    */
		}
	else if (strcmp(ft, "wmv") == 0)
		{

		if (max_file_size == 0)
			max_file_size = 20 * MEGABYTE;

		init_builtin(s,
					 WMV,
					 "wmv",
					 "\x30\x26\xB2\x75\x8E\x66\xCF\x11",
					 "\xA1\xDC\xAB\x8C\x47\xA9",
					 8,
					 6,
					 max_file_size,
					 TRUE);
		}
	else if (strcmp(ft, "avi") == 0)
		{

		if (max_file_size == 0)
			max_file_size = 20 * MEGABYTE;

		init_builtin(s, AVI, "avi", "RIFF", "INFO", 4, 4, max_file_size, TRUE);
		}

	else if (strcmp(ft, "rif") == 0)
		{

		if (max_file_size == 0)
			max_file_size = 20 * MEGABYTE;
		init_builtin(s, RIFF, "rif", "RIFF", "INFO", 4, 4, max_file_size, TRUE);
		}
	else if (strcmp(ft, "wav") == 0)
		{

		if (max_file_size == 0)
			max_file_size = 20 * MEGABYTE;
		init_builtin(s, WAV, "wav", "RIFF", "INFO", 4, 4, max_file_size, TRUE);

		}
	else if (strcmp(ft, "html") == 0 || strcmp(ft, "htm") == 0)
		{

		if (max_file_size == 0)
			max_file_size = 1 * MEGABYTE;
		init_builtin(s, HTM, "htm", "<html", "</html>", 5, 7, max_file_size, FALSE);
		}

	else if (strcmp(ft, "ole") == 0 || strcmp(ft, "office") == 0)
		{

		if (max_file_size == 0)
			max_file_size = 10 * MEGABYTE;
		init_builtin(s,
					 OLE,
					 "ole",
					 "\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1\x00\x00\x00\x00\x00\x00\x00\x00",
					 NULL,
					 16,
					 0,
					 max_file_size,
					 TRUE);
		}
	else if (strcmp(ft, "doc") == 0)
		{
		if (max_file_size == 0)
			max_file_size = 20 * MEGABYTE;
		init_builtin(s,
					 DOC,
					 "doc",
					 "\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1\x00\x00\x00\x00\x00\x00\x00\x00",
					 NULL,
					 16,
					 0,
					 max_file_size,
					 TRUE);
		}
	else if (strcmp(ft, "xls") == 0)
		{
		if (max_file_size == 0)
			max_file_size = 10 * MEGABYTE;

		init_builtin(s,
					 XLS,
					 "xls",
					 "\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1\x00\x00\x00\x00\x00\x00\x00\x00",
					 NULL,
					 16,
					 0,
					 max_file_size,
					 TRUE);

		}
	else if (strcmp(ft, "ppt") == 0)
		{

		if (max_file_size == 0)
			max_file_size = 10 * MEGABYTE;
		init_builtin(s,
					 PPT,
					 "ppt",
					 "\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1\x00\x00\x00\x00\x00\x00\x00\x00",
					 NULL,
					 16,
					 0,
					 max_file_size,
					 TRUE);
		}
	else if (strcmp(ft, "zip") == 0)
		{
		if (max_file_size == 0)
			max_file_size = 100 * MEGABYTE;

		init_builtin(s,
					 ZIP,
					 "zip",
					 "\x50\x4B\x03\x04",
					 "\x50\x4b\x05\x06",
					 4,
					 4,
					 max_file_size,
					 TRUE);

		}
	else if (strcmp(ft, "rar") == 0)
		{
		if (max_file_size == 0)
			max_file_size = 100 * MEGABYTE;

		init_builtin(s,
					 RAR,
					 "rar",
					 "\x52\x61\x72\x21\x1A\x07\x00",
					 "\x00\x00\x00\x00\x00\x00\x00\x00",
					 7,
					 8,
					 max_file_size,
					 TRUE);

		}
	else if (strcmp(ft, "sxw") == 0)
		{
		if (max_file_size == 0)
			max_file_size = 10 * MEGABYTE;

		init_builtin(s,
					 SXW,
					 "sxw",
					 "\x50\x4B\x03\x04",
					 "\x4b\x05\x06\x00",
					 4,
					 4,
					 max_file_size,
					 TRUE);

		}
	else if (strcmp(ft, "sxc") == 0)
		{
		if (max_file_size == 0)
			max_file_size = 10 * MEGABYTE;

		init_builtin(s,
					 SXC,
					 "sxc",
					 "\x50\x4B\x03\x04",
					 "\x4b\x05\x06\x00",
					 4,
					 4,
					 max_file_size,
					 TRUE);

		}
	else if (strcmp(ft, "sxi") == 0)
		{
		if (max_file_size == 0)
			max_file_size = 10 * MEGABYTE;

		init_builtin(s,
					 SXI,
					 "sxi",
					 "\x50\x4B\x03\x04",
					 "\x4b\x05\x06\x00",
					 4,
					 4,
					 max_file_size,
					 TRUE);

		}
	else if (strcmp(ft, "docx") == 0)
		{
		if (max_file_size == 0)
			max_file_size = 10 * MEGABYTE;

		init_builtin(s,
					 DOCX,
					 "docx",
					 "\x50\x4B\x03\x04",
					 "\x4b\x05\x06\x00",
					 4,
					 4,
					 max_file_size,
					 TRUE);

		}
	else if (strcmp(ft, "pptx") == 0)
		{
		if (max_file_size == 0)
			max_file_size = 10 * MEGABYTE;

		init_builtin(s,
					 PPTX,
					 "pptx",
					 "\x50\x4B\x03\x04",
					 "\x4b\x05\x06\x00",
					 4,
					 4,
					 max_file_size,
					 TRUE);

		}
	else if (strcmp(ft, "xlsx") == 0)
		{
		if (max_file_size == 0)
			max_file_size = 10 * MEGABYTE;

		init_builtin(s,
					 XLSX,
					 "xlsx",
					 "\x50\x4B\x03\x04",
					 "\x4b\x05\x06\x00",
					 4,
					 4,
					 max_file_size,
					 TRUE);

		}
	else if (strcmp(ft, "gzip") == 0 || strcmp(ft, "gz") == 0)
		{
		if (max_file_size == 0)
			max_file_size = 100 * MEGABYTE;

		init_builtin(s, GZIP, "gz", "\x1F\x8B", "\x00\x00\x00\x00", 2, 4, max_file_size, TRUE);
		}
	else if (strcmp(ft, "pdf") == 0)
		{
		if (max_file_size == 0)
			max_file_size = 20 * MEGABYTE;

		index = init_builtin(s, PDF, "pdf", "%P
Download .txt
gitextract_vb7wi003/

├── CHANGES
├── Makefile
├── README
├── api.c
├── cli.c
├── config.c
├── dir.c
├── engine.c
├── extract.c
├── extract.h
├── foremost.8
├── foremost.conf
├── helpers.c
├── main.c
├── main.h
├── ole.h
└── state.c
Download .txt
SYMBOL INDEX (107 symbols across 12 files)

FILE: api.c
  function init_ole (line 25) | void init_ole()
  function die (line 58) | void die(char *fmt, void *arg)
  function get_dir_block (line 64) | int get_dir_block(unsigned char *fd, int blknum, int buffersize)
  function get_dir_info (line 93) | int get_dir_info(unsigned char *src)
  function reorder_dirlist (line 203) | int reorder_dirlist(struct DIRECTORY *dir, int level)
  function get_block (line 244) | int get_block(unsigned char *fd, int blknum, unsigned char *dest, long l...
  function get_FAT_block (line 288) | int get_FAT_block(unsigned char *fd, int blknum, int *dest, int buffersize)
  function dump_header (line 314) | void dump_header(struct OLE_HDR *h)
  type OLE_HDR (line 380) | struct OLE_HDR
  type OLE_HDR (line 380) | struct OLE_HDR
  type OLE_HDR (line 380) | struct OLE_HDR
  function dump_ole_header (line 412) | void dump_ole_header(struct OLE_HDR *h)
  function dump_dirent (line 477) | int dump_dirent(int which_one)

FILE: cli.c
  function fatal_error (line 5) | void fatal_error (f_state * s, char *msg)
  function print_error (line 16) | void print_error(f_state *s, char *fn, char *msg)
  function print_message (line 22) | void print_message(f_state *s, char *format, va_list argp)

FILE: config.c
  function translate (line 5) | int translate (char *str)
  function extractSearchSpecData (line 106) | int extractSearchSpecData(f_state *state, char **tokenarray)
  function process_line (line 182) | int process_line(f_state *s, char *buffer, int line_number)
  function load_config_file (line 273) | int load_config_file(f_state *s)

FILE: dir.c
  function is_empty_directory (line 5) | int is_empty_directory (DIR * temp)
  function cleanup_output (line 17) | void cleanup_output(f_state *s)
  function make_new_directory (line 50) | int make_new_directory(f_state *s, char *fn)
  function create_output_directory (line 110) | int create_output_directory(f_state *s)
  function create_sub_dirs (line 170) | int create_sub_dirs(f_state *s)
  function write_to_disk (line 311) | int write_to_disk(f_state *s, s_spec *needle, u_int64_t len, unsigned ch...

FILE: engine.c
  function user_interrupt (line 17) | int user_interrupt (f_state * s, f_info * i)
  function setup_stream (line 144) | void setup_stream(f_state *s, f_info *i)
  function audit_layout (line 184) | void audit_layout(f_state *s)
  function dumpInd (line 196) | void dumpInd(unsigned char *ind, int bs)
  function ind_block (line 223) | int ind_block(unsigned char *foundat, u_int64_t buflen, int bs)
  function search_chunk (line 311) | int search_chunk(f_state *s, unsigned char *buf, f_info *i, u_int64_t ch...
  function search_stream (line 563) | int search_stream(f_state *s, f_info *i)
  function audit_start (line 623) | void audit_start(f_state *s, f_info *i)
  function audit_finish (line 635) | void audit_finish(f_state *s, f_info *i)
  function process_file (line 640) | int process_file(f_state *s)
  function process_stdin (line 690) | int process_stdin(f_state *s)

FILE: extract.c
  type zipLocalFileHeader (line 42) | struct zipLocalFileHeader
  function valid_ole_header (line 615) | int valid_ole_header(struct OLE_HDR *h)
  function adjust_bs (line 691) | int adjust_bs(int size, int bs)
  type OLE_HDR (line 736) | struct OLE_HDR
  type OLE_HDR (line 752) | struct OLE_HDR
  type DIRECTORY (line 855) | struct DIRECTORY
  function check_mov (line 1009) | int check_mov(unsigned char *atom)
  type tm (line 2112) | struct tm

FILE: extract.h
  type zipLocalFileHeader (line 50) | struct zipLocalFileHeader
  type zipCentralFileHeader (line 64) | struct zipCentralFileHeader
  type zipEndCentralFileHeader (line 81) | struct zipEndCentralFileHeader
  function print_zip (line 93) | void print_zip(struct zipLocalFileHeader *fileHeader, struct zipCentralF...

FILE: helpers.c
  function chop (line 20) | unsigned int chop (char *buf)
  function shift_string (line 91) | void shift_string(char *fn, int start, int new_start)
  function make_magic (line 106) | void make_magic(void)
  function off_t (line 118) | off_t find_file_size(FILE *f)
  function off_t (line 152) | off_t find_file_size(FILE *f)
  function off_t (line 223) | static off_t midpoint(off_t a, off_t b, long blksize)
  function off_t (line 235) | off_t find_dev_size(int fd, int blk_size)
  function off_t (line 284) | off_t find_file_size(FILE *f)
  function off_t (line 305) | off_t find_file_size(FILE *f)
  function print_search_specs (line 321) | void print_search_specs(f_state *s)
  function print_stats (line 348) | void print_stats(f_state *s)
  function charactersMatch (line 368) | int charactersMatch(char a, char b, int caseSensitive)
  function memwildcardcmp (line 381) | int memwildcardcmp(const void *s1, const void *s2, size_t n, int caseSen...
  function printx (line 397) | void printx(unsigned char *buf, int start, int end)
  function htos (line 421) | unsigned short htos(unsigned char s[], int endian)
  function htoi (line 450) | unsigned int htoi(unsigned char s[], int endian)
  function u_int64_t (line 476) | u_int64_t htoll(unsigned char s[], int endian)
  function displayPosition (line 512) | int displayPosition(f_state *s, f_info *i, u_int64_t pos)

FILE: main.c
  function catch_alarm (line 26) | void catch_alarm(int signum)
  function register_signal_handler (line 32) | void register_signal_handler(void)
  function try_msg (line 55) | void try_msg(void)
  function usage (line 62) | void usage(void)
  function process_command_line (line 98) | void process_command_line(int argc, char **argv, f_state *s)
  function main (line 226) | int main(int argc, char **argv)

FILE: main.h
  type f_state (line 296) | typedef struct f_state
  type marker (line 316) | typedef struct marker
  type s_spec (line 323) | typedef struct s_spec
  type f_info (line 349) | typedef struct f_info {

FILE: ole.h
  type OLE_HDR (line 30) | struct OLE_HDR
  type OLE_DIR (line 53) | struct OLE_DIR
  type DIRECTORY (line 73) | struct DIRECTORY
  type OLE_HDR (line 93) | struct OLE_HDR
  type DIRECTORY (line 97) | struct DIRECTORY
  type OLE_HDR (line 100) | struct OLE_HDR
  type OLE_HDR (line 100) | struct OLE_HDR
  type OLE_HDR (line 100) | struct OLE_HDR
  type OLE_HDR (line 102) | struct OLE_HDR

FILE: state.c
  function initialize_state (line 5) | int initialize_state (f_state * s, int argc, char **argv)
  function free_state (line 41) | void free_state(f_state *s)
  function get_audit_file_open (line 48) | int get_audit_file_open(f_state *s)
  function set_config_file (line 68) | int set_config_file(f_state *s, char *fn)
  function set_output_directory (line 87) | int set_output_directory(f_state *s, char *fn)
  function get_mode (line 111) | int get_mode(f_state *s, off_t check_mode)
  function set_mode (line 116) | void set_mode(f_state *s, off_t new_mode)
  function set_chunk (line 121) | void set_chunk(f_state *s, int size)
  function set_skip (line 126) | void set_skip(f_state *s, int size)
  function set_block (line 131) | void set_block(f_state *s, int size)
  function write_audit_header (line 136) | void write_audit_header(f_state *s)
  function open_audit_file (line 147) | int open_audit_file(f_state *s)
  function close_audit_file (line 170) | int close_audit_file(f_state *s)
  function audit_msg (line 185) | void audit_msg(f_state *s, char *format, ...)
  function set_input_file (line 203) | void set_input_file(f_state *s, char *filename)
  function init_builtin (line 210) | int init_builtin(f_state *s, int type, char *suffix, char *header, char ...
  function add_marker (line 250) | void add_marker(f_state *s, int index, char *marker, int markerlength)
  function init_all (line 272) | void init_all(f_state *state)
  function set_search_def (line 349) | int set_search_def(f_state *s, char *ft, u_int64_t max_file_size)
  function init_bm_table (line 744) | void init_bm_table(unsigned char *needle, size_t table[UCHAR_MAX + 1], s...
  function dump_state (line 785) | void dump_state(f_state *s)
Condensed preview — 17 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (226K chars).
[
  {
    "path": "CHANGES",
    "chars": 7386,
    "preview": "Version  1.5.7\n\t-Added support for MP4 files\nVersion  1.5.6\n\t-Added support for Office 2007 file as well as bug fixes\nVe"
  },
  {
    "path": "Makefile",
    "chars": 5182,
    "preview": "\nRAW_CC = gcc\nRAW_FLAGS = -Wall -O2\nLINK_OPT = \nVERSION = 1.5.7\n# Try to determine the host system\nSYS := $(shell uname "
  },
  {
    "path": "README",
    "chars": 4487,
    "preview": "\nFOREMOST \n----------------------------------------------------------------------\n\nForemost is a Linux program to recove"
  },
  {
    "path": "api.c",
    "chars": 13929,
    "preview": "/*\n\tModified API from http://chicago.sourceforge.net/devel/docs/ole/\n\tBasically the same API, added error checking and t"
  },
  {
    "path": "cli.c",
    "chars": 510,
    "preview": "\n\n#include \"main.h\"\n\nvoid fatal_error (f_state * s, char *msg)\n\t{\n\tfprintf(stderr, \"%s: %s%s\", __progname, msg, NEWLINE)"
  },
  {
    "path": "config.c",
    "chars": 6907,
    "preview": "\n\n#include \"main.h\"\n\nint translate (char *str)\n\t{\n\tchar\tnext;\n\tchar\t*rd = str, *wr = str, *bad;\n\tchar\ttemp[1 + 3 + 1];\n\t"
  },
  {
    "path": "dir.c",
    "chars": 9096,
    "preview": "\n\n#include \"main.h\"\n\nint is_empty_directory (DIR * temp)\n\t{\n\n\t/* Empty directories contain two entries for . and .. \n   "
  },
  {
    "path": "engine.c",
    "chars": 16300,
    "preview": "\n\t /* FOREMOST\n *\n * By Jesse Kornblum, Kris Kendall, & Nick Mikus\n *\n * This is a work of the US Government. In accorda"
  },
  {
    "path": "extract.c",
    "chars": 63371,
    "preview": "\n\t /* extract.c\n * Copyright (c) 2005, Nick Mikus\n * This file contains the file specific functions used to extract\n * d"
  },
  {
    "path": "extract.h",
    "chars": 4098,
    "preview": "/*\n\tlocal file header signature     4 bytes  (0x04034b50)\n        version needed to extract       2 bytes\n        genera"
  },
  {
    "path": "foremost.8",
    "chars": 7856,
    "preview": ".TH FOREMOST \"8\" \"v1.5 - May 2009\"\n\n.SH NAME\nforemost \\- Recover files using their headers, footers, and data structures"
  },
  {
    "path": "foremost.conf",
    "chars": 9141,
    "preview": "#\n# Foremost configuration file\n#-------------------------------------------------------------------------\n# Note the fo"
  },
  {
    "path": "helpers.c",
    "chars": 12100,
    "preview": "\n\t /* MD5DEEP - helpers.c\n *\n * By Jesse Kornblum\n *\n * This is a work of the US Government. In accordance with 17 USC 1"
  },
  {
    "path": "main.c",
    "chars": 6909,
    "preview": "\n\n\n/* FOREMOST\n *\n * By Jesse Kornblum and Kris Kendall\n * \n * This is a work of the US Government. In accordance with 1"
  },
  {
    "path": "main.h",
    "chars": 12527,
    "preview": "\r\n/* FOREMOST\r\n *\r\n * By Jesse Kornblum\r\n *\r\n * This is a work of the US Government. In accordance with 17 USC 105,\r\n * "
  },
  {
    "path": "ole.h",
    "chars": 3106,
    "preview": "#define TRUE\t\t\t1\n#define FALSE\t\t\t0\n#define SPECIAL_BLOCK\t- 3\n#define END_OF_CHAIN\t- 2\n#define UNUSED\t\t\t- 1\n\n#define NO_E"
  },
  {
    "path": "state.c",
    "chars": 17802,
    "preview": "\n\n#include \"main.h\"\n\nint initialize_state (f_state * s, int argc, char **argv)\n\t{\n\tchar\t**argv_copy = argv;\n\n\t/* The rou"
  }
]

About this extraction

This page contains the full source code of the korczis/foremost GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 17 files (196.0 KB), approximately 59.3k tokens, and a symbol index with 107 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!