Finding vulnerabilities in PHP code
(via static code analysis)
By Peter Serwylo
http://peter.serwylo.com
@serwylo
Locate your local PHP source code path/file (e.g. /var/www/project1/ or /var/www/index.php), choose the vulnerability type you are looking for and click scan!
Check subdirs to include all subdirectories into the scan. It is recommended to scan only the root directory of your project. Files in subdirectories will be automatically scanned by RIPS when included by the PHP code. However enabling subdirs can improve the scan result and the include success rate (shown in the result).
Debug errors or improve your scan result by choosing a different verbosity level (default level 1 is recommended).
After the scan finished 4 new button will appear in the upper right. You can select between different types of vulnerabilities that have been found by clicking on their name in the stats window. You can click user input in the upper right to get a list of entry points, functions for a list and graph of all user defined functions or files for a list and graph of all scanned files and their includes. All lists are referenced to the Code Viewer.
Change the syntax highlighting schema on-the-fly by selecting a different code style.
Before scanning you can choose which way the code flow should be displayed: bottom-up or top-down.
You are about to scan " + amount + " files. "; warning+="Depending on the amount of codelines and includes this may take a while."; warning+="The author of RIPS recommends to scan only the root directory of your project without subdirs.
"; warning+="Do you want to continue anyway?
"; warning+=" "; warning+=""; warning+="Could not access main.php. Make sure your webserver is running.
"; else if(this.status == 404) warning+="Could not access main.php. Make sure you copied all files.
"; else if(this.status == 500) warning+="Scan aborted. Try to scan only one entry file at once or increase the set_time_limit() in config/general.php.
"; warning+="You are about to scan " + amount + " files. "; warning+="Depending on the amount of codelines and includes this may take a while. "; warning+="The author of RIPS recommends to scan only the root directory of your project without subdirs.
"; warning+="Do you want to continue anyway?
"; warning+=" "; warning+=""; warning+="Could not access windows/leakscan.php. Make sure your webserver is running.
"; else if(this.status == 404) warning+="Could not access windows/leakscan.php. Make sure you copied all files.
"; else if(this.status == 500) warning+="Scan aborted. Try to scan only one entry file at once or increase the set_time_limit() in config/general.php.
"; warning+="| ',"\n",
' filename), '\',\'',
implode(',', $tree->lines), '\');"> '."\n",
'',"\n"; if(isset($GLOBALS['scan_functions'][$tree->name])) { // help button echo '',"\n"; if(isset($GLOBALS['F_DATABASE'][$tree->name]) || isset($GLOBALS['F_FILE_AFFECT'][$tree->name]) || isset($GLOBALS['F_FILE_READ'][$tree->name]) || isset($GLOBALS['F_LDAP'][$tree->name]) || isset($GLOBALS['F_XPATH'][$tree->name]) || isset($GLOBALS['F_POP'][$tree->name]) ) { // data leak scan if(!empty($vulnBlock->dataleakvar)) { echo '',"\n"; // line } else { $tree->title .= ' (Blind exploitation)'; } } } if(!empty($tree->get) || !empty($tree->post) || !empty($tree->cookie) || !empty($tree->files) || !empty($tree->server) ) { /*echo ' filename),
'\',\'',implode(',',array_unique($tree->get)),
'\',\'',implode(',',array_unique($tree->post)),
'\',\'',implode(',',array_unique($tree->cookie)),
'\',\'',implode(',',array_unique($tree->files)),
'\',\'',implode(',',array_unique($tree->server)),'\');"> ',"\n",*/
echo 'filename),
'\',\'',implode(',',array_unique($tree->get)),
'\',\'',implode(',',array_unique($tree->post)),
'\',\'',implode(',',array_unique($tree->cookie)),
'\',\'',implode(',',array_unique($tree->files)),
'\',\'',implode(',',array_unique($tree->server)),'\');"> ';
}
// $tree->title
echo ' | ',$tree->title,'',
' ',"\n";
if($treestyle == 1)
traverseBottomUp($tree);
else if($treestyle == 2)
traverseTopDown($tree);
echo ' ',"\n", '
|
|
| declaration | calls |
|---|---|
',$func_name,' | '; $calls = array(); if(isset($info[3])) { foreach($info[3] as $call) { $calls[] = ''.$call[1].''; } } echo implode(',',array_unique($calls)).' |
| type[parameter] | taints |
|---|---|
| $input_name | ",implode(',',array_unique($finds)),' |
',htmlentities($filename),' |
',htmlentities($filename),'
|
| Result |
|---|
| Sum: | ',$count_all,' |
| No vulnerabilities found. | |
| ',(($count_matches == 0) ? 'No' : $count_matches),' matches found. | |
| Scanned files: | ',count($files),' | |
| Include success: | '; if($count_inc > 0) { echo ($count_inc_success=$count_inc-$count_inc_fail).'/'.$count_inc, ' ('.$round_inc_success=round(($count_inc_success/$count_inc)*100,0).'%)'; } else { echo 'No includes.'; } echo ' | |
| Considered sinks: | ',count($scan_functions),' | '; if(empty($_POST['search']) && $count_all > 0) { echo ''; } echo ' |
| User-defined functions: | '.(count($user_functions_offset)-(count($user_functions_offset)>0?1:0)).' | |
| Unique sources: | '.count($user_input).' | |
| Sensitive sinks: | '.(is_array($file_sinks_count) ? array_sum($file_sinks_count) : 0).' | |
| Info: | ',$detail,' |
| Info: | Your include success is low. Enable subdirs for better filename guesses. |
| Scan time: |
* For a fairly comprehensive set of languages see the * README * file that came with this source. At a minimum, the lexer should work on a * number of languages including C and friends, Java, Python, Bash, SQL, HTML, * XML, CSS, Javascript, and Makefiles. It works passably on Ruby, PHP and Awk * and a subset of Perl, but, because of commenting conventions, doesn't work on * Smalltalk, Lisp-like, or CAML-like languages without an explicit lang class. *
* Usage:
} and {@code } tags in your source with
* {@code class=prettyprint.}
* You can also use the (html deprecated) {@code } tag, but the pretty
* printer needs to do more substantial DOM manipulations to support that, so
* some css styles may not be preserved.
* } or {@code } element to specify the
* language, as in {@code }. Any class that
* starts with "lang-" followed by a file extension, specifies the file type.
* See the "lang-*.js" files in this directory for code that implements
* per-language file handlers.
*
* Change log:
* cbeust, 2006/08/22
*
* Java annotations (start with "@") are now captured as literals ("lit")
*
* @requires console
*/
// JSLint declarations
/*global console, document, navigator, setTimeout, window */
/**
* Split {@code prettyPrint} into multiple timeouts so as not to interfere with
* UI events.
* If set to {@code false}, {@code prettyPrint()} is synchronous.
*/
window['PR_SHOULD_USE_CONTINUATION'] = true;
/** the number of characters between tab columns */
window['PR_TAB_WIDTH'] = 8;
/** Contains functions for creating and registering new language handlers.
* @type {Object}
*/
window['PR']
/** Pretty print a chunk of code.
*
* @param {string} sourceCodeHtml code as html
* @return {string} code as html, but prettier
*/
= window['prettyPrintOne']
/** Find all the {@code } and {@code } tags in the DOM with
* {@code class=prettyprint} and prettify them.
* @param {Function?} opt_whenDone if specified, called when the last entry
* has been finished.
*/
= window['prettyPrint'] = void 0;
(function () {
// Keyword lists for various languages.
var FLOW_CONTROL_KEYWORDS =
"break continue do else for if return while ";
var C_KEYWORDS = FLOW_CONTROL_KEYWORDS + "auto case char const default " +
"double enum extern float goto int long register short signed sizeof " +
"static struct switch typedef union unsigned void volatile ";
var COMMON_KEYWORDS = C_KEYWORDS + "catch class delete false import " +
"new operator private protected public this throw true try typeof ";
var CPP_KEYWORDS = COMMON_KEYWORDS + "alignof align_union asm axiom bool " +
"concept concept_map const_cast constexpr decltype " +
"dynamic_cast explicit export friend inline late_check " +
"mutable namespace nullptr reinterpret_cast static_assert static_cast " +
"template typeid typename using virtual wchar_t where ";
var JAVA_KEYWORDS = COMMON_KEYWORDS +
"abstract boolean byte extends final finally implements import " +
"instanceof null native package strictfp super synchronized throws " +
"transient ";
var CSHARP_KEYWORDS = JAVA_KEYWORDS +
"as base by checked decimal delegate descending dynamic event " +
"fixed foreach from group implicit in interface internal into is lock " +
"object out override orderby params partial readonly ref sbyte sealed " +
"stackalloc string select uint ulong unchecked unsafe ushort var ";
var COFFEE_KEYWORDS = "all and by catch class else extends false finally " +
"for if in is isnt loop new no not null of off on or return super then " +
"true try unless until when while yes ";
var JSCRIPT_KEYWORDS = COMMON_KEYWORDS +
"debugger eval export function get null set undefined var with " +
"Infinity NaN ";
var PERL_KEYWORDS = "caller delete die do dump elsif eval exit foreach for " +
"goto if import last local my next no our print package redo require " +
"sub undef unless until use wantarray while BEGIN END ";
var PYTHON_KEYWORDS = FLOW_CONTROL_KEYWORDS + "and as assert class def del " +
"elif except exec finally from global import in is lambda " +
"nonlocal not or pass print raise try with yield " +
"False True None ";
var RUBY_KEYWORDS = FLOW_CONTROL_KEYWORDS + "alias and begin case class def" +
" defined elsif end ensure false in module next nil not or redo rescue " +
"retry self super then true undef unless until when yield BEGIN END ";
var SH_KEYWORDS = FLOW_CONTROL_KEYWORDS + "case done elif esac eval fi " +
"function in local set then until ";
var ALL_KEYWORDS = (
CPP_KEYWORDS + CSHARP_KEYWORDS + JSCRIPT_KEYWORDS + PERL_KEYWORDS +
PYTHON_KEYWORDS + RUBY_KEYWORDS + SH_KEYWORDS);
// token style names. correspond to css classes
/** token style for a string literal */
var PR_STRING = 'str';
/** token style for a keyword */
var PR_KEYWORD = 'kwd';
/** token style for a comment */
var PR_COMMENT = 'com';
/** token style for a type */
var PR_TYPE = 'typ';
/** token style for a literal value. e.g. 1, null, true. */
var PR_LITERAL = 'lit';
/** token style for a punctuation string. */
var PR_PUNCTUATION = 'pun';
/** token style for a punctuation string. */
var PR_PLAIN = 'pln';
/** token style for an sgml tag. */
var PR_TAG = 'tag';
/** token style for a markup declaration such as a DOCTYPE. */
var PR_DECLARATION = 'dec';
/** token style for embedded source. */
var PR_SOURCE = 'src';
/** token style for an sgml attribute name. */
var PR_ATTRIB_NAME = 'atn';
/** token style for an sgml attribute value. */
var PR_ATTRIB_VALUE = 'atv';
/**
* A class that indicates a section of markup that is not code, e.g. to allow
* embedding of line numbers within code listings.
*/
var PR_NOCODE = 'nocode';
/** A set of tokens that can precede a regular expression literal in
* javascript.
* http://www.mozilla.org/js/language/js20/rationale/syntax.html has the full
* list, but I've removed ones that might be problematic when seen in
* languages that don't support regular expression literals.
*
* Specifically, I've removed any keywords that can't precede a regexp
* literal in a syntactically legal javascript program, and I've removed the
* "in" keyword since it's not a keyword in many languages, and might be used
* as a count of inches.
*
*
The link a above does not accurately describe EcmaScript rules since
* it fails to distinguish between (a=++/b/i) and (a++/b/i) but it works
* very well in practice.
*
* @private
*/
var REGEXP_PRECEDER_PATTERN = function () {
var preceders = [
"!", "!=", "!==", "#", "%", "%=", "&", "&&", "&&=",
"&=", "(", "*", "*=", /* "+", */ "+=", ",", /* "-", */ "-=",
"->", /*".", "..", "...", handled below */ "/", "/=", ":", "::", ";",
"<", "<<", "<<=", "<=", "=", "==", "===", ">",
">=", ">>", ">>=", ">>>", ">>>=", "?", "@", "[",
"^", "^=", "^^", "^^=", "{", "|", "|=", "||",
"||=", "~" /* handles =~ and !~ */,
"break", "case", "continue", "delete",
"do", "else", "finally", "instanceof",
"return", "throw", "try", "typeof"
];
var pattern = '(?:^^|[+-]';
for (var i = 0; i < preceders.length; ++i) {
pattern += '|' + preceders[i].replace(/([^=<>:&a-z])/g, '\\$1');
}
pattern += ')\\s*'; // matches at end, and matches empty string
return pattern;
// CAVEAT: this does not properly handle the case where a regular
// expression immediately follows another since a regular expression may
// have flags for case-sensitivity and the like. Having regexp tokens
// adjacent is not valid in any language I'm aware of, so I'm punting.
// TODO: maybe style special characters inside a regexp as punctuation.
}();
/**
* Given a group of {@link RegExp}s, returns a {@code RegExp} that globally
* matches the union of the sets of strings matched by the input RegExp.
* Since it matches globally, if the input strings have a start-of-input
* anchor (/^.../), it is ignored for the purposes of unioning.
* @param {Array.} regexs non multiline, non-global regexs.
* @return {RegExp} a global regex.
*/
function combinePrefixPatterns(regexs) {
var capturedGroupIndex = 0;
var needToFoldCase = false;
var ignoreCase = false;
for (var i = 0, n = regexs.length; i < n; ++i) {
var regex = regexs[i];
if (regex.ignoreCase) {
ignoreCase = true;
} else if (/[a-z]/i.test(regex.source.replace(
/\\u[0-9a-f]{4}|\\x[0-9a-f]{2}|\\[^ux]/gi, ''))) {
needToFoldCase = true;
ignoreCase = false;
break;
}
}
function decodeEscape(charsetPart) {
if (charsetPart.charAt(0) !== '\\') { return charsetPart.charCodeAt(0); }
switch (charsetPart.charAt(1)) {
case 'b': return 8;
case 't': return 9;
case 'n': return 0xa;
case 'v': return 0xb;
case 'f': return 0xc;
case 'r': return 0xd;
case 'u': case 'x':
return parseInt(charsetPart.substring(2), 16)
|| charsetPart.charCodeAt(1);
case '0': case '1': case '2': case '3': case '4':
case '5': case '6': case '7':
return parseInt(charsetPart.substring(1), 8);
default: return charsetPart.charCodeAt(1);
}
}
function encodeEscape(charCode) {
if (charCode < 0x20) {
return (charCode < 0x10 ? '\\x0' : '\\x') + charCode.toString(16);
}
var ch = String.fromCharCode(charCode);
if (ch === '\\' || ch === '-' || ch === '[' || ch === ']') {
ch = '\\' + ch;
}
return ch;
}
function caseFoldCharset(charSet) {
var charsetParts = charSet.substring(1, charSet.length - 1).match(
new RegExp(
'\\\\u[0-9A-Fa-f]{4}'
+ '|\\\\x[0-9A-Fa-f]{2}'
+ '|\\\\[0-3][0-7]{0,2}'
+ '|\\\\[0-7]{1,2}'
+ '|\\\\[\\s\\S]'
+ '|-'
+ '|[^-\\\\]',
'g'));
var groups = [];
var ranges = [];
var inverse = charsetParts[0] === '^';
for (var i = inverse ? 1 : 0, n = charsetParts.length; i < n; ++i) {
var p = charsetParts[i];
switch (p) {
case '\\B': case '\\b':
case '\\D': case '\\d':
case '\\S': case '\\s':
case '\\W': case '\\w':
groups.push(p);
continue;
}
var start = decodeEscape(p);
var end;
if (i + 2 < n && '-' === charsetParts[i + 1]) {
end = decodeEscape(charsetParts[i + 2]);
i += 2;
} else {
end = start;
}
ranges.push([start, end]);
// If the range might intersect letters, then expand it.
if (!(end < 65 || start > 122)) {
if (!(end < 65 || start > 90)) {
ranges.push([Math.max(65, start) | 32, Math.min(end, 90) | 32]);
}
if (!(end < 97 || start > 122)) {
ranges.push([Math.max(97, start) & ~32, Math.min(end, 122) & ~32]);
}
}
}
// [[1, 10], [3, 4], [8, 12], [14, 14], [16, 16], [17, 17]]
// -> [[1, 12], [14, 14], [16, 17]]
ranges.sort(function (a, b) { return (a[0] - b[0]) || (b[1] - a[1]); });
var consolidatedRanges = [];
var lastRange = [NaN, NaN];
for (var i = 0; i < ranges.length; ++i) {
var range = ranges[i];
if (range[0] <= lastRange[1] + 1) {
lastRange[1] = Math.max(lastRange[1], range[1]);
} else {
consolidatedRanges.push(lastRange = range);
}
}
var out = ['['];
if (inverse) { out.push('^'); }
out.push.apply(out, groups);
for (var i = 0; i < consolidatedRanges.length; ++i) {
var range = consolidatedRanges[i];
out.push(encodeEscape(range[0]));
if (range[1] > range[0]) {
if (range[1] + 1 > range[0]) { out.push('-'); }
out.push(encodeEscape(range[1]));
}
}
out.push(']');
return out.join('');
}
function allowAnywhereFoldCaseAndRenumberGroups(regex) {
// Split into character sets, escape sequences, punctuation strings
// like ('(', '(?:', ')', '^'), and runs of characters that do not
// include any of the above.
var parts = regex.source.match(
new RegExp(
'(?:'
+ '\\[(?:[^\\x5C\\x5D]|\\\\[\\s\\S])*\\]' // a character set
+ '|\\\\u[A-Fa-f0-9]{4}' // a unicode escape
+ '|\\\\x[A-Fa-f0-9]{2}' // a hex escape
+ '|\\\\[0-9]+' // a back-reference or octal escape
+ '|\\\\[^ux0-9]' // other escape sequence
+ '|\\(\\?[:!=]' // start of a non-capturing group
+ '|[\\(\\)\\^]' // start/emd of a group, or line start
+ '|[^\\x5B\\x5C\\(\\)\\^]+' // run of other characters
+ ')',
'g'));
var n = parts.length;
// Maps captured group numbers to the number they will occupy in
// the output or to -1 if that has not been determined, or to
// undefined if they need not be capturing in the output.
var capturedGroups = [];
// Walk over and identify back references to build the capturedGroups
// mapping.
for (var i = 0, groupIndex = 0; i < n; ++i) {
var p = parts[i];
if (p === '(') {
// groups are 1-indexed, so max group index is count of '('
++groupIndex;
} else if ('\\' === p.charAt(0)) {
var decimalValue = +p.substring(1);
if (decimalValue && decimalValue <= groupIndex) {
capturedGroups[decimalValue] = -1;
}
}
}
// Renumber groups and reduce capturing groups to non-capturing groups
// where possible.
for (var i = 1; i < capturedGroups.length; ++i) {
if (-1 === capturedGroups[i]) {
capturedGroups[i] = ++capturedGroupIndex;
}
}
for (var i = 0, groupIndex = 0; i < n; ++i) {
var p = parts[i];
if (p === '(') {
++groupIndex;
if (capturedGroups[groupIndex] === undefined) {
parts[i] = '(?:';
}
} else if ('\\' === p.charAt(0)) {
var decimalValue = +p.substring(1);
if (decimalValue && decimalValue <= groupIndex) {
parts[i] = '\\' + capturedGroups[groupIndex];
}
}
}
// Remove any prefix anchors so that the output will match anywhere.
// ^^ really does mean an anchored match though.
for (var i = 0, groupIndex = 0; i < n; ++i) {
if ('^' === parts[i] && '^' !== parts[i + 1]) { parts[i] = ''; }
}
// Expand letters to groups to handle mixing of case-sensitive and
// case-insensitive patterns if necessary.
if (regex.ignoreCase && needToFoldCase) {
for (var i = 0; i < n; ++i) {
var p = parts[i];
var ch0 = p.charAt(0);
if (p.length >= 2 && ch0 === '[') {
parts[i] = caseFoldCharset(p);
} else if (ch0 !== '\\') {
// TODO: handle letters in numeric escapes.
parts[i] = p.replace(
/[a-zA-Z]/g,
function (ch) {
var cc = ch.charCodeAt(0);
return '[' + String.fromCharCode(cc & ~32, cc | 32) + ']';
});
}
}
}
return parts.join('');
}
var rewritten = [];
for (var i = 0, n = regexs.length; i < n; ++i) {
var regex = regexs[i];
if (regex.global || regex.multiline) { throw new Error('' + regex); }
rewritten.push(
'(?:' + allowAnywhereFoldCaseAndRenumberGroups(regex) + ')');
}
return new RegExp(rewritten.join('|'), ignoreCase ? 'gi' : 'g');
}
/**
* Split markup into a string of source code and an array mapping ranges in
* that string to the text nodes in which they appear.
*
*
* The HTML DOM structure:
*
* (Element "p"
* (Element "b"
* (Text "print ")) ; #1
* (Text "'Hello '") ; #2
* (Element "br") ; #3
* (Text " + 'World';")) ; #4
*
*
* corresponds to the HTML
* {@code
print 'Hello '
+ 'World';
}.
*
*
* It will produce the output:
*
* {
* source: "print 'Hello '\n + 'World';",
* // 1 2
* // 012345678901234 5678901234567
* spans: [0, #1, 6, #2, 14, #3, 15, #4]
* }
*
*
* where #1 is a reference to the {@code "print "} text node above, and so
* on for the other text nodes.
*
*
*
* The {@code} spans array is an array of pairs. Even elements are the start
* indices of substrings, and odd elements are the text nodes (or BR elements)
* that contain the text for those substrings.
* Substrings continue until the next index or the end of the source.
*
*
* @param {Node} node an HTML DOM subtree containing source-code.
* @return {Object} source code and the text nodes in which they occur.
*/
function extractSourceSpans(node) {
var nocode = /(?:^|\s)nocode(?:\s|$)/;
var chunks = [];
var length = 0;
var spans = [];
var k = 0;
var whitespace;
if (node.currentStyle) {
whitespace = node.currentStyle.whiteSpace;
} else if (window.getComputedStyle) {
whitespace = document.defaultView.getComputedStyle(node, null)
.getPropertyValue('white-space');
}
var isPreformatted = whitespace && 'pre' === whitespace.substring(0, 3);
function walk(node) {
switch (node.nodeType) {
case 1: // Element
if (nocode.test(node.className)) { return; }
for (var child = node.firstChild; child; child = child.nextSibling) {
walk(child);
}
var nodeName = node.nodeName;
if ('BR' === nodeName || 'LI' === nodeName) {
chunks[k] = '\n';
spans[k << 1] = length++;
spans[(k++ << 1) | 1] = node;
}
break;
case 3: case 4: // Text
var text = node.nodeValue;
if (text.length) {
if (!isPreformatted) {
text = text.replace(/[ \t\r\n]+/g, ' ');
} else {
text = text.replace(/\r\n?/g, '\n'); // Normalize newlines.
}
// TODO: handle tabs here?
chunks[k] = text;
spans[k << 1] = length;
length += text.length;
spans[(k++ << 1) | 1] = node;
}
break;
}
}
walk(node);
return {
source: chunks.join('').replace(/\n$/, ''),
spans: spans
};
}
/**
* Apply the given language handler to sourceCode and add the resulting
* decorations to out.
* @param {number} basePos the index of sourceCode within the chunk of source
* whose decorations are already present on out.
*/
function appendDecorations(basePos, sourceCode, langHandler, out) {
if (!sourceCode) { return; }
var job = {
source: sourceCode,
basePos: basePos
};
langHandler(job);
out.push.apply(out, job.decorations);
}
/** Given triples of [style, pattern, context] returns a lexing function,
* The lexing function interprets the patterns to find token boundaries and
* returns a decoration list of the form
* [index_0, style_0, index_1, style_1, ..., index_n, style_n]
* where index_n is an index into the sourceCode, and style_n is a style
* constant like PR_PLAIN. index_n-1 <= index_n, and style_n-1 applies to
* all characters in sourceCode[index_n-1:index_n].
*
* The stylePatterns is a list whose elements have the form
* [style : string, pattern : RegExp, DEPRECATED, shortcut : string].
*
* Style is a style constant like PR_PLAIN, or can be a string of the
* form 'lang-FOO', where FOO is a language extension describing the
* language of the portion of the token in $1 after pattern executes.
* E.g., if style is 'lang-lisp', and group 1 contains the text
* '(hello (world))', then that portion of the token will be passed to the
* registered lisp handler for formatting.
* The text before and after group 1 will be restyled using this decorator
* so decorators should take care that this doesn't result in infinite
* recursion. For example, the HTML lexer rule for SCRIPT elements looks
* something like ['lang-js', /<[s]cript>(.+?)<\/script>/]. This may match
* '