Repository: xme/pastemon Branch: master Commit: 0229f9db1efb Files: 6 Total size: 66.1 KB Directory structure: gitextract_v29hyv1s/ ├── README ├── pastemon.conf.sample ├── pastemon.pl ├── proxies.conf ├── regex.conf.sample └── user-agents.conf ================================================ FILE CONTENTS ================================================ ================================================ FILE: README ================================================ Introduction ------------ pastemon.pl is a script which runs in the background as a daemon and monitors pastebin.com for interesting content (based on regular expressions). Found information is sent to syslog The script can also generate (CEF events). More information is available here: http://blog.rootshell.be/2012/01/17/monitoring-pastebin-com-within-your-siem/ v1.14 - 2012/10/31 ------------------ - [FEATURE] Added SQLite DB support to store pasties details (some fields must still be implemented) v1.13 - 2012/10/24 ------------------ - [CONTRIBUTION] Added support for multiple SMTP recipients (email addresses separared by commas) Contribution from coreyroach@hotmail.com - [CONTRIBUTION] Added a new macro-% to specify the site name in the dump function. '%S' will be replaced by the site name. Example: '%S/%Y/%M' => 'pastebin.com/2012/10'. v1.12 - 2012/09/20 ------------------ - [BUGFIX] Fixed FuzzyMatch() which was broken with gziped pasties. - [FEATURE] Email notification: The subject is now appended with the field(s) corresponding to the matched regex(es). This allows a better view of received emails as well as filtering them. - [BUGFIX] Fixed FuzzyMatch() to detect properly duplicate pasties (fixed regex). v1.11 - 2012/09/14 ------------------ - [FEATURE] Added support for nopaste.me. - [FEATURE] Added support for pastesite.com. - [FEATURE] Added configurable sleep delays per pastie website. v1.10 - 2012/08/01 ------------------ - [FEATURE] If not configuration file is specified, pastemon.pl tries to load /etc/pastemon.conf by default. - [FEATURE] pastemon.pl uses specific Perl modules like WordPress::XMLRPC or Text::JaroWinkler. The script now handles properly environment without those modules. It's not required to comment them in the code. If a module is missing, related configuration is automatically disabled. - [FEATURE] Added optional compression (via IO::Compress::Gzip) of dumped pasties. In configuration file: yes v1.9 - 2012/07/23 ----------------- - [FEATURE] pastemon can now follow (search for regex) URLs detected in pasties. This is configured via the main configuration file: yes (bit\.ly) - [FEATURE] The regex.conf format changed to an XML format. Examples: \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} 10 IP Address - [FEATURE] A minimum number of regex occurences can be defined to notify ( tag in the XML file) - [FEATURE] HTTP requests are now using now a random User-Agent. - [BUGFIX] Optimized the detection of already processed pasties. This reduces the amount of HTTP requests send to the website. v1.8 - 2012/06/25 ----------------- - [FEATURE] Adder for pastie.org! - [FEATURE] Added multi-thread support (1 thread per website monitored) - [FEATURE] Added substitution macro in the dump directory. Support macros are: %Y - replace with the current year %M - replace with the current month %D - replace with the current day Directory is automatically created. Example: /home/user/pastemon/%Y/%M/%D - [FEATURE] Added a new configuration directive: yes|1 This feature enables a dump of *ALL* pastie wheter they match a regex or not. This is similar to a mirror mode WARNING: Huge disk space might be required by this feature! - [BUGFIX] Test if the provided SMTP server (for mail notifications) is available (Thanks to @manuelsubredu for the patch) - [BUGFIX] Fixed an issue in createBlogPost() which caused an unexpected process exit. v1.7 - 2012/05/11 ----------------- - Added support for "included" regular expressions - Fixed in bug in getRegexDesc() - Added support for comments ('#') in the regex configuration file - Moved configuration parameters from command line switches to an XML file - Added matching regex description in dump files - Added SMTP notifications - Added distance check to detect duplicate pasties (using Jaro-Winkler algorithm) v1.6 - 2012/02/21 ----------------- - Added a detection of "slow down" messages returned by Pastebin (add a small pause) - Added support for Wordpress XMLRPC - Added support for random proxies - Some bug fixes v1.5 - 2012/02/19 ----------------- - Fixed the regex to grab pasties from the archive page. (HTML code changed) v1.4 - 2012/02/15 ----------------- - Fixed a bug with CEF events: custome fields start at 1 not 0! (Thanks to Heiko Hansen for the report) - Notify the presence of a proxy variable (HTTP_PROXY) v1.3 - 2012/01/26 ----------------- - Added a '--pidfile=file' configuration switch to specify an alternative location for the PID file. This allows the script to be executed with a non-root account. - Added a '--sample=x' configuration to display a sample a data matching a regular expression. 'x' is the number of bytes displayed before and after the matching string. This is useful to estimate the value of the pastie. Example: Found in http://pastebin.com/raw.php?i=Q8pQRHKW : belgium (2 times) | Sample: g(0) ""\n [32] => string(11) "Belgium(32)"\n [31] => string(14) "Ne v1.2 - 2012/01/21 ----------------- - Fixed a bug affecting the case sensitivity search - New feature: an exception can be associated to a regular expression in the configuration file. The syntax is: "regex1 _EXCLUDE_ regex2". This could prevent some false positive matches. v1.1 - 2012/01/20 ----------------- - Added a '--dump' configuration switch to save matching pasties in a directory. This is to keep the pasties posted with an expiration date (example: for later review) v1.0 - 2012/01/18 ----------------- Initial release ================================================ FILE: pastemon.conf.sample ================================================ yes /var/run/pastemon.pid regex.conf 256 proxies.conf user-agents.conf /home/pastemon/dump/%Y/%M/%D yes yes 15 0.95 10240 yes 10 yes 120 yes 300 yes 300 yes (anonpaste|pastebin\.com|pastie\.org|pastehtml\.com|pastebay\.net|pastee\.org) 10.0.0.1 514 3 daemon 127.0.0.1 pastemon@rootshell.be recipient@domain.com PasteMon Alert www.myblog.com editor averystrongpassword favorite /home/pastemon/pastemon.db ================================================ FILE: pastemon.pl ================================================ #!/usr/bin/perl # # pastemon.pl # # This script runs in the background as a daemon and monitors pastebin.com for # interesting content (based on regular expressions). Found information is sent # to syslog # # This script is based on the Python script written by Xavier Garcia # (http://www.shellguardians.com/2011/07/monitoring-pastebin-leaks.html) # # Copyright (c) 2012 Xavier Mertens # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # 3. Neither the name of copyright holders nor the names of its # contributors may be used to endorse or promote products derived # from this software without specific prior written permission. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS # ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED # TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR # PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL COPYRIGHT HOLDERS OR CONTRIBUTORS # BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF # SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS # INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN # CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) # ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE # POSSIBILITY OF SUCH DAMAGE. # # History # ------- # See README file use strict; use threads; use threads::shared; use Digest::MD5 qw(md5 md5_hex md5_base64); use File::Path; use Getopt::Long; use IO::Socket; use LWP::UserAgent; use HTML::Entities; use Sys::Syslog; use Encode; use XML::XPath; use XML::XPath::XMLParser; use Net::SMTP; use POSIX qw(setsid); # Optional modules my $haveWordPressXMLRMC = eval "use WordPress::XMLRPC; 1"; my $haveTextJaroWinkler = eval "use Text::JaroWinkler qw(strcmp95); 1"; my $haveIOCompressGzip = eval "use IO::Compress::Gzip; 1"; my $haveIOUncompressGunzip = eval "use IO::Uncompress::Gunzip; 1"; my $haveDBI = eval "use DBI; 1"; use constant PROCESS_URL => 1; use constant PASTEBIN => 0; # Supported websites use constant PASTIE => 1; use constant NOPASTE => 2; use constant PASTESITE => 3; my @webSiteNames = ( # Self-defined names for multiple usages "pastebin.com", "pastie.net", "nopaste.me", "pastesite.com", ); my $program = "pastemon.pl"; my $version = "v1.14"; my $debug; my $help; my $ignoreCase; # By default respect case in strings search my $cefDestination; # Send CEF events to this destination:port my $cefPort = 514; my $cefSeverity = 3; my $caught = 0; my $httpTimeout = 10; # Default HTTP timeout my @pasties; my @seenPasties; my $maxPasties = 1000; # TODO: Make it configurable? my @regexList; # List of interesting regex (with the data) my $pidFile = "/var/run/pastemon.pid"; my $configFile = "/etc/pastemon.conf"; # Main XML configuration file my $regexFile; # Regular expressions definitions my $wpConfigFile; my $proxyFile; my @proxies; my $uaFile; my @uas; my $wpSite; # Wordpress settings my $wpUser; my $wpPass; my $wpCategory; my $smtpServer; # SMTP settings my $smtpFrom; my $smtpRecipient; my $smtpSubject; my @smtpRecipients; my $distanceMin; my $distanceMaxSize; my $followUrls; # Follow URLs found in pastie my $followMatching; my $checkPastebin; # Websites to monitor my $checkPastie; my $checkNopaste; my $checkPastesite; my $delayPastebin = 300; # Delays between pasties fetches my $delayPastie = 300; my $delayNopaste = 300; my $delayPastesite = 300; my $syslogFacility = "daemon"; my $dumpDir; my $dumpAll; my $compressDump; my $sampleSize; my %matches; my $dbFile; # SQLite3 DB file # Process arguments my $result = GetOptions( "debug" => \$debug, "help" => \$help, "config=s" => \$configFile, ); # TODO: Add a "--drop-sql-table" option to rebuild a fresh DB? if ($help) { print <<__HELP__; Usage: $0 --config=filepath [--debug] [--help] Where: --config : Specify the XML configuration file --debug : Enable debug mode (verbose - do not detach) --help : What you're reading now. __HELP__ exit 0; } parseXMLConfigFile($configFile); ($debug) && print STDERR "+++ Running in foreground.\n"; ($cefDestination) && syslogOutput("Sending CEF events to $cefDestination:$cefPort (severity $cefSeverity)"); # Do not allow multiple running instances! if (-r $pidFile) { open(PIDH, "<$pidFile") || die "Cannot read pid file!"; my $currentpid = ; close(PIDH); die "$program already running (PID $currentpid)"; } loadRegexFromFile($regexFile) || die "Cannot load regex from file $regexFile"; loadUserAgentFromFile($uaFile) || die "Cannot load user-agents from file $uaFile"; if (!$debug) { my $pid = fork; die "Cannot fork" unless defined($pid); exit(0) if $pid; # We are the child (POSIX::setsid != -1) or die "setsid failed"; chdir("/") || die "Cannot changed working directory to /"; close(STDOUT); close(STDOUT); close(STDIN); } syslogOutput("Running with PID $$"); open(PIDH, ">$pidFile") || die "Cannot write PID file $pidFile: $!"; print PIDH "$$"; close(PIDH); # Notify if HTTP proxy settings detected if ($ENV{'HTTP_PROXY'}) { ($proxyFile) && die "The HTTP_PROXY environment variable conflicts with the use of a proxies list"; syslogOutput("Using detected HTTP proxy: " . $ENV{'HTTP_PROXY'}); } my @threads; my @webSites; ($checkPastebin) && push(@webSites, PASTEBIN); ($checkPastie) && push(@webSites, PASTIE); ($checkNopaste) && push(@webSites, NOPASTE); ($checkPastesite) && push(@webSites, PASTESITE); # Launch threads based on the number of webistes to monitor for my $webSite (@webSites) { my $t = threads->new(\&mainLoop, $webSite); push(@threads, $t); } $SIG{'TERM'} = \&sigHandler; $SIG{'INT'} = \&sigHandler; $SIG{'KILL'} = \&sigHandler; $SIG{'USR1'} = sub { foreach my $t (@threads) { $t->kill('SIGUSR1'); } }; # Parent process just waiting for a signal while(1) { sleep(1); if ($caught) { syslogOutput("Killing my threads"); foreach my $t (@threads) { $t->kill('SIGKILL'); } } } exit 0; # --------- # Main loop # --------- sub mainLoop { $SIG{'USR1'} = \&sigReload; # Handle config reload $SIG{'KILL'} = \&sigHandler; my $webSite = shift; while(1) { my $pastie; if (!&fetchLastPasties($webSite)) { foreach $pastie (@pasties) { exit 0 if ($caught == 1); analyzePastie($webSite, $pastie, PROCESS_URL); } exit 0 if ($caught == 1); } purgeOldPasties($maxPasties); # Wait some seconds (depending on the website) DELAY: { $webSite == PASTEBIN && do { ($debug) && print STDERR "Sleeping $delayPastebin\n"; sleep($delayPastebin); last DELAY; }; $webSite == PASTIE && do { ($debug) && print STDERR "Sleeping $delayPastie\n"; sleep($delayPastie); last DELAY; }; $webSite == NOPASTE && do { ($debug) && print STDERR "Sleeping $delayNopaste\n"; sleep($delayNopaste); last DELAY; }; $webSite == PASTESITE && do { ($debug) && print STDERR "Sleeping $delayPastesite\n"; sleep($delayPastesite); last DELAY; }; } } } # # analyzePastie # sub analyzePastie { my $webSite = shift; my $pastie = shift or return; my $processUrl = shift; my $regex; my $md5; if (!grep /$pastie/, @seenPasties) { my $content = fetchPastie($pastie); if ($content) { # If we receive a "slow down" message, follow Pastebin recommandation! if ($content =~ /Please slow down/) { ($debug) && print STDERR "+++ Slow down message received. Paused 5 seconds\n"; sleep(5); } else { # Compute the MD5 digest $md5 = md5_hex(encode('UTF8',$content)); if (!dbSearchMD5($md5)) { undef(%matches); # Reset the matches regex/counters my $i = 0; my $regexSearch; my $regexInclude; my $regexExclude; my $regexDesc; my $regexCount; foreach $regex (@regexList) { $regexSearch = @$regex[0]; $regexInclude = @$regex[1]; $regexExclude = @$regex[2]; $regexDesc = @$regex[3]; $regexCount = @$regex[4]; my $sampleData; my ($startPos, $endPos); my $preCount = 0; if ($ignoreCase) { $preCount += () = $content =~ /$regexSearch/mgi; $startPos = $-[0]; $endPos = $+[0]; } else { $preCount += () = $content =~ /$regexSearch/mg; $startPos = $-[0]; $endPos = $+[0]; } if ($preCount >= $regexCount) { if ($sampleSize) { # Optional: extract a sample of the data $startPos = (($startPos - $sampleSize) < 0) ? 0 : ($startPos - $sampleSize); $sampleData = encode('UTF8', substr($content, $startPos, ($endPos - $startPos) + $sampleSize)); } # Process "include" regex defined if ($regexInclude ne "") { my $postCount = 0; if ($ignoreCase) { $postCount += () = $content =~ /$regexInclude/mgi; } else { $postCount += () = $content =~ /$regexInclude/mg; } if ($postCount) { # Matches for include $regex $matches{$i} = [ ( $regexSearch, $preCount, $sampleData ) ]; $i++; } } elsif ($regexExclude ne "") { my $postCount = 0; if ($ignoreCase) { $postCount += () = $content =~ /$regexExclude/mgi; } else { $postCount += () = $content =~ /$regexExclude/mg; } if (! $postCount) { # Matches for exclude $regex $matches{$i} = [ ( $regexSearch, $preCount, $sampleData ) ]; $i++; } } else { $matches{$i} = [ ( $regexSearch, $preCount, $sampleData ) ]; $i++; } } } if ($followUrls && $processUrl) { $i += processUrls($content); } if ($i) { # Try to find a corresponding pastie? if (!FuzzyMatch($webSite, $content)) { # Generate the results based on matches my $buffer = "Found in " . $pastie . " : "; my $key; for $key (keys %matches) { $buffer = $buffer . $matches{$key}[0] . " (" . $matches{$key}[1] . " times) "; } if ($sampleSize) { # Optional: Add sample of data my $safeData = $matches{0}[2]; # Sanitize the data $safeData =~ s/ //g; $safeData =~ s/\n/\\r/g; $safeData =~ s/\n/\\n/g; $safeData =~ s/\t/\\t/g; $buffer = $buffer . "| Sample: " . $safeData; } syslogOutput($buffer); # Generating CEF event (if configured) ($cefDestination) && sendCEFEvent($pastie); # Generating blog post (if configured) ($wpSite) && createBlogPost($pastie); # Send SMTP notification (if configured) if ($smtpServer) { my $smtp = Net::SMTP->new($smtpServer) or die "Cannot create SMTP connection to $smtpServer: $?"; $smtp->mail($smtpFrom); $smtp->recipient(@smtpRecipients, { SkipBad => 1}); $smtp->data(); my $subjectTags; for $key (keys %matches) { my $tempDesc = getRegexDesc($matches{$key}[0]); if (length($tempDesc) > 0) { $subjectTags = $subjectTags . '(' . getRegexDesc($matches{$key}[0]) . ') '; } } my $smtpBody = "To: $smtpRecipient\nSubject: $smtpSubject $subjectTags\n\n"; for $key (keys %matches) { $smtpBody = $smtpBody . "Matched: " . $matches{$key}[0] . " (" . $matches{$key}[1] . " time(s))\n"; } $smtpBody = $smtpBody . "\nSource: " . $pastie . "\n\n" . $content; $smtp->datasend($smtpBody); $smtp->dataend(); $smtp->quit(); } # Save pastie content in the dump directory (if configured) if ($dumpDir) { my $tempPastie = getPastieID($pastie); my $tempDir = validateDumpDir($webSite, $dumpDir); # Generate and create dump directory (-d $tempDir) or die "Cannot validate directory $dumpDir: $!"; open(DUMP, ">:encoding(UTF-8)", "$tempDir/$tempPastie.raw") or die "Cannot write to $tempDir/$tempPastie.raw : $!"; for $key (keys %matches) { print DUMP "Matched: " . $matches{$key}[0] . " (" . $matches{$key}[1] . " time(s))\n"; } print DUMP "\n$content"; close(DUMP); if ($compressDump) { # Compress pastie my $in = "$tempDir/$tempPastie.raw"; my $out = "$tempDir/$tempPastie.gz"; use IO::Compress::Gzip qw(gzip); if (gzip $in => $out) { unlink("$tempDir/$tempPastie.raw"); } else { syslogOutput("Cannot compress $tempDir/$tempPastie.raw: $!"); } } } } } elsif ($dumpAll && $dumpDir) { # Mirroring mode - dump the pastie in all cases my $tempPastie = getPastieID($pastie); my $tempDir = validateDumpDir($webSite, $dumpDir); (-d $tempDir) or die "Cannot validate directory $tempDir: $!"; open(DUMP, ">:encoding(UTF-8)", "$tempDir/$tempPastie.raw") or die "Cannot write to $tempDir/$tempPastie.raw : $!"; print DUMP "\n$content"; close(DUMP); if ($compressDump) { # Compress pastie my $in = "$tempDir/$tempPastie.raw"; my $out = "$tempDir/$tempPastie.gz"; use IO::Compress::Gzip qw(gzip); if (gzip $in => $out) { unlink("$tempDir/$tempPastie.raw"); } else { syslogOutput("Cannot compress $tempDir/$tempPastie.raw: $!"); } } } # Flag this pastie as "seen" push(@seenPasties, $pastie); # Save pastie data in SQLite if ($dbFile) { dbSavePastie($pastie, $md5); } # Wait a random number of seconds to not mess with pastebin.com webmasters sleep(int(rand(5))); } else { # MD5 Exists in DB ($debug) && print "DEBUG: MD5 $md5 already found in DB!\n"; } } } } } # # Search for interesting data in URLs found inside the pastie # sub processUrls { my $pastie = shift || return 0; while ($pastie =~ m,(http.*?://([^\s)\"](?!ttp:))+),g) { # " my $url = $&; if ($url =~ /$followMatching/gi) { #Process only URLs matching our regex! ($debug) && print "+++ Following URL: $url\n"; my $ua = LWP::UserAgent->new; $ua->agent(getRandomUA()); my $r = $ua->head("$url"); if ($r->is_success && substr($r->header('Content-Type'), 0, 5) eq "text/") { # Only process "text" analyzePastie($url); } } # Protect us against pastebin.com blacklist? #sleep(int(rand(15))); } return 0; } # # parseXMLConfigFile # Load the configuration from provided XML file # Args: # $configFile = Main pastemon.conf XML file # sub parseXMLConfigFile { my $configFile = shift; (-r $configFile) || die "Cannot load XML file $configFile: $!"; ($debug) && print STDERR "+++ Loading XML file $configFile.\n"; my $xml = XML::XPath->new(filename => "$configFile"); my $buff; # Reset settings undef $pidFile; undef $sampleSize; undef $dumpDir; undef $dumpAll; undef $compressDump; undef $proxyFile; undef $uaFile; undef $cefDestination; undef $cefPort; undef $cefSeverity; undef $smtpServer; undef $smtpFrom; undef $smtpRecipient; undef $smtpSubject; undef $wpSite; undef $wpUser; undef $wpPass; undef $wpCategory; undef $distanceMin; undef $distanceMaxSize; undef $checkPastebin; undef $checkPastie; undef $checkNopaste; undef $checkPastesite; undef $followUrls; undef $followMatching; undef $dbFile; # Core Parameters my $nodes = $xml->find('/pastemon/core'); foreach my $node ($nodes->get_nodelist) { $buff = $node->find('ignore-case')->string_value; if (lc($buff) eq "yes" || $buff eq "1") { $ignoreCase++; ($debug) && print STDERR "+++ Non-sensitive search enabled.\n"; } $buff = $node->find('dump-all')->string_value; if (lc($buff) eq "yes" || $buff eq "1") { $dumpAll++; ($debug) && print STDERR "+++ Dumping all pasties (mirror mode).\n"; } $buff = $node->find('compress-pasties')->string_value; if (lc($buff) eq "yes" || $buff eq "1") { $compressDump++; ($debug) && print STDERR "+++ Compressing all pasties (mirror mode).\n"; } $pidFile = $node->find('pid-file')->string_value; $regexFile = $node->find('regex-file')->string_value; $sampleSize = $node->find('sample-size')->string_value; $dumpDir = $node->find('dump-directory')->string_value; $proxyFile = $node->find('proxy-config')->string_value; $uaFile = $node->find('ua-config')->string_value; $httpTimeout = $node->find('http-timeout')->string_value; $distanceMin = $node->find('distance-min')->string_value; $distanceMaxSize = $node->find('distance-max-size')->string_value; } # Monitored websites my $nodes = $xml->find('/pastemon/websites'); foreach my $node ($nodes->get_nodelist) { $buff = $node->find('pastebin')->string_value; if (lc($buff) eq "yes" || $buff eq "1") { $checkPastebin++; ($debug) && print STDERR "+++ pastebin.com monitoring activated.\n"; } $buff = $node->find('pastie')->string_value; if (lc($buff) eq "yes" || $buff eq "1") { $checkPastie++; ($debug) && print STDERR "+++ pastie.com monitoring activated.\n"; } $buff = $node->find('nopaste')->string_value; if (lc($buff) eq "yes" || $buff eq "1") { $checkNopaste++; ($debug) && print STDERR "+++ nopaste.me monitoring activated.\n"; } $buff = $node->find('pastesite')->string_value; if (lc($buff) eq "yes" || $buff eq "1") { $checkPastesite++; ($debug) && print STDERR "+++ pastesite.com monitoring activated.\n"; } $delayPastebin = $node->find('pastebin-delay')->string_value; $delayPastie = $node->find('pastie-delay')->string_value; $delayNopaste = $node->find('nopaste-delay')->string_value; $delayPastesite = $node->find('pastesite-delay')->string_value; } # Follow URLs my $nodes = $xml->find('/pastemon/urls'); foreach my $node ($nodes->get_nodelist) { $buff = $node->find('follow')->string_value; if (lc($buff) eq "yes" || $buff eq "1") { $followUrls++; ($debug) && print STDERR "+++ Follow URLs feature activated.\n"; } $followMatching = $node->find('matching')->string_value; } # CEF Parameters my $nodes = $xml->find('/pastemon/cef-output'); foreach my $node ($nodes->get_nodelist) { $cefDestination = $node->find('destination')->string_value; $cefPort = $node->find('port')->string_value; $cefSeverity = $node->find('severity')->string_value; } # Syslog Parameters my $nodes = $xml->find('/pastemon/syslog-output'); foreach my $node ($nodes->get_nodelist) { $syslogFacility = $node->find('facility')->string_value; } # Wordpress Parameters my $nodes = $xml->find('/pastemon/wordpress-output'); foreach my $node ($nodes->get_nodelist) { $wpSite = $node->find('site')->string_value; $wpUser = $node->find('user')->string_value; $wpPass = $node->find('password')->string_value; $wpCategory = $node->find('category')->string_value; } # SMTP Parameters my $nodes = $xml->find('/pastemon/smtp-output'); foreach my $node ($nodes->get_nodelist) { $smtpServer = $node->find('smtp-server')->string_value; $smtpFrom = $node->find('from')->string_value; $smtpRecipient = $node->find('recipient')->string_value; $smtpSubject = $node->find('subject')->string_value; } # SQLite3 Parameters my $nodes = $xml->find('/pastemon/db-output'); foreach my $node ($nodes->get_nodelist) { $dbFile = $node->find('db-file')->string_value; } # --------------------- # Parameters validation # --------------------- # Check if the provided dump directory is writable to us if ($dumpDir) { # (-w $dumpDir) or die "Directory $dumpDir is not writable: $!"; syslogOutput("Using $dumpDir as dump directory"); } # Compress dumped pasties? if ($compressDump) { if ($haveIOCompressGzip) { # Module IO::Compress::Gzip installed? if (!$dumpDir) { syslogOutput("Option compress-pasties disabled: No dump directory defined"); undef $compressDump } if (!$haveIOUncompressGunzip) { # Module IO::Compress::Gunzp installed? syslogOutput("Option compress-pasties disabled: IO::Uncompress:Gunzip not installed"); undef $compressDump; } } else { syslogOutput("Option compress-pasties disabled: IO::Compress:Gzip not installed"); undef $compressDump; } } # Dumping all pasties requires a dump directory if ($dumpAll && !$dumpDir) { syslogOutput("No dump directory specified"); } # Verifiy sampleSize format if specified if ($sampleSize) { die "Sample buffer length must be an integer!" if not $sampleSize =~ /\d+/; syslogOutput("Dumping $sampleSize bytes samples"); } # Verify the HTTP timeout if specified if ($httpTimeout) { die "HTTP timeout must be an integer!" if not $httpTimeout =~ /\d+/; syslogOutput("HTTP timeout: $httpTimeout seconds"); } # Verify Wordpress config if ($wpSite) { if ($haveWordPressXMLRMC) { # Module WordPress::XMLRPC installed? (!$wpSite || !$wpUser || !$wpPass || !$wpCategory) && die "Incomplete Wordpress configuration"; ($sampleSize) || die "A sample buffer length must be given with Wordpress output"; syslogOutput("Dumping data to $wpSite/xmlrpc.php"); } else { syslogOutput("Wordpress configuration disabled: Wordpress::XMLRPC not installed"); undef $wpSite; } } # Verify SMTP config if ($smtpServer) { (!$smtpServer || !$smtpFrom || !$smtpRecipient || !$smtpSubject) && die "Incomplete SMTP configuration"; my $smtp = Net::SMTP->new($smtpServer) or die "Cannot use SMTP server $smtpServer: $?"; $smtp->quit(); @smtpRecipients = split(/[, ]+/, $smtpRecipient); syslogOutput("Sending SMTP notifications to <".$smtpRecipient.">"); } # Load proxies if ($proxyFile) { (-r $proxyFile) or die "Cannot read proxy configuration file $proxyFile: $!"; loadProxyFromFile($proxyFile) || die "Cannot load proxies from file $proxyFile"; } # Distance if ($distanceMin) { if ($haveTextJaroWinkler) { # Module Text::JaroWinkler installed? (!$dumpDir) && die "A dump directory must be configured to use the distance check"; ($distanceMin > 0 && $distanceMin < 1) or die "Minimum distance must be between 0 and 1"; if ($distanceMaxSize) { die "Distance max size must be an integer!" if not $distanceMaxSize =~ /\d+/; syslogOutput("Enabled duplicate detection with distance of $distanceMin (size limit: $distanceMaxSize bytes)"); } else { syslogOutput("Enabled duplicate detection with distance of $distanceMin"); } } else { syslogOutput("Distance configuration disabled: Text::JaroWinkler not installed"); undef $distanceMin; } } # SQLite3 Output if ($dbFile) { if ($haveDBI) { # Module DBI installed? # Do we have to initialize the DB (first execution) my $dbh = DBI->connect("dbi:SQLite:dbname=" . $dbFile) or die "Cannot connect to the SQLite DB " . $dbFile . "\n"; my $sth = $dbh->prepare("SELECT name FROM sqlite_master WHERE type='table' AND name='pasties'"); $sth->execute(); my $data = $sth->fetch(); if (!$data) { # Tables 'pasties' does not exists. Create it. $sth = $dbh->prepare("CREATE TABLE pasties (id VARCHAR(50), timestamp DATETIME, url VARCHAR(128), matched VARCHAR(256), path VARCHAR(256), md5 VARCHAR(32) PRIMARY KEY, type INTEGER)"); $sth->execute() or die "Cannot create table 'pasties'"; $sth = $dbh->prepare("CREATE UNIQUE INDEX pasties_idx ON pasties(id)"); $sth->execute() or die "Cannot create index 'pasties_idx'"; ($debug) && print STDERR "+++ Created database " . $dbFile . "\n"; } $dbh->disconnect(); } else { syslogOutput("DB support disabled: DBI not installed"); undef $dbFile; } } # Follow URL if ($followUrls && !$followMatching) { syslogOutput("Warning: No regex defined to match URLs"); $followMatching = ".*"; # Match everything } return; } # # Download the latest pasties and load them in a Perl array # (http://pastebin.com/archive) # sub fetchLastPasties { my $webSite = shift; my $tempProxy; my $ua = LWP::UserAgent->new; $ua->timeout($httpTimeout); if (@proxies) { $tempProxy = selectRandomProxy(); $ua->proxy('http', $tempProxy); } else { ($ENV{'HTTP_PROXY'}) && $ua->env_proxy; } $ua->agent(getRandomUA()); undef @pasties; # Reset the array first! # www.pastebin.com if ($webSite == PASTEBIN) { ($debug) && print STDERR "Loading new pasties from pastebin.com.\n"; my $response = $ua->get("http://pastebin.com/archive"); if ($response->is_success) { # Load the pasties into an array # @pasties = $response->decoded_content =~ /.+<\/a><\/td>/g; # New format (2012/02/19): my @tempPasties = $response->decoded_content =~ /.+<\/a><\/td>/g; # Append the complete URL foreach my $p (@tempPasties) { $p = 'http://pastebin.com/raw.php?i=' . $p; } push(@pasties, @tempPasties); } else { syslogOutput("Cannot fetch www.pastebin.com: " . $response->status_line); # If cannot fetch pastie and we use proxies, disable the current one! (@proxies) && disableProxy($tempProxy); return 1; } } elsif ($webSite == PASTIE) { #($debug) && print STDERR "Loading new pasties from pastie.org.\n"; my $response = $ua->get("http://pastie.org/pastes"); if ($response->is_success) { my @tempPasties = $response->decoded_content =~ //g; # Append the complete URL foreach my $p (@tempPasties) { $p = $p . '/download'; } push(@pasties, @tempPasties); } else { syslogOutput("Cannot fetch www.pastie.org: " . $response->status_line); # If cannot fetch pastie and we use proxies, disable the current one! (@proxies) && disableProxy($tempProxy); return 1; } } elsif ($webSite == NOPASTE) { #($debug) && print STDERR "Loading new pasties from nopaste.me.\n"; my $response = $ua->get("http://nopaste.me/recent"); if ($response->is_success) { my @tempPasties = $response->decoded_content =~ //ig; # Append the complete URL foreach my $p (@tempPasties) { $p = 'http://nopaste.me/raw/' . $p . '.txt'; } push(@pasties, @tempPasties); } else { syslogOutput("Cannot fetch nopaste.me: " . $response->status_line); # If cannot fetch pastie and we use proxies, disable the current one! (@proxies) && disableProxy($tempProxy); return 1; } } elsif ($webSite == PASTESITE) { ($debug) && print STDERR "Loading new pasties from pastesite.com.\n"; my $response = $ua->get("http://pastesite.com/recent"); if ($response->is_success) { my @tempPasties = $response->decoded_content =~ /status_line); # If cannot fetch pastie and we use proxies, disable the current one! (@proxies) && disableProxy($tempProxy); return 1; } } else { die "Unknown website constant: $webSite"; } # DEBUG #foreach my $p (@pasties) { # print "DEBUG: $p\n"; #} return 0; } # # Fetch the raw content of a pastie and return its content # sub fetchPastie { my $tempProxy; my $pastie = shift; my $ua = LWP::UserAgent->new; $ua->timeout($httpTimeout); if (@proxies) { $tempProxy = selectRandomProxy(); $ua->proxy('http', $tempProxy); } else { ($ENV{'HTTP_PROXY'}) && $ua->env_proxy; } $ua->agent(getRandomUA()); my $response = $ua->get("$pastie"); if ($response->is_success) { # Hack for pastesite.com: Extract data from the # (To bypass the button) if ($pastie =~ /http:\/\/pastesite.com/) { if ($response->decoded_content =~ /\