ISAPI_Rewrite 1.3 Documentation

锺离马鲁
2023-12-01

ISAPI_Rewrite 1.3 Documentation

Introduction
What's new
Main concept
Lite version limitation
Special notes for the IIS6
Configuration
Configuration file format
CacheClockRate directive
RepeatLimit directive
RewriteCond directive
RewriteRule directive
RewriteHeader directive
Regular expression syntax
Special note about "pathological" regular expressions
Literals
Wildcard
Repeats
Non-greedy repeats
Parenthesis
Non-Marking Parenthesis
Alternatives
Sets
Character classes
Collating elements
Equivalence classes
Line anchors
Back references
Forward lookahead asserts
Word operators
Escape operator
What gets matched?
Format string syntax
Examples

 

Introduction

ISAPI_Rewrite is a powerful regular expressions-based URL manipulation engine. It acts mostly like Apache's mod_Rewrite, but it is designed especially for Microsoft Internet Information Server. If you ever wanted to change your web site's URL scheme, this product is for you!

Some key benefits of ISAPI_Rewrite:

  • Speed

    ISAPI_Rewrite is extremely fast and highly scalable solution. It is written by using only pure C/C++ code, Win32 API and ISAPI. It uses intelligent configuration cache mechanism. All work is done just in one stage and there are no recursively requests or any other operations that may take a long time.

  • Security

    ISAPI_Rewrite is designed for operation in a shared environment. It can serve as many sites as you have. ISP and hosting providers can safely permit their users to configure ISAPI_Rewrite and be sure that any configuration changes will affect only local users environment. ISAPI_Rewrite can even solve many security problems, for example, block an access to some folders or file extensions or create more complex rules.

  • Power

    Flexibility and power of ISAPI_Rewrite come form its regular expression nature. With regular expressions you don't need to write a thousands check strings. The comparison and replace of URLs can be done with a few string patterns. So, ISAPI_Rewrite can do many things that cannot be done using other technology solutions available for IIS. See examples section for more information.

What's new

ISAPI_Rewrite version 1.3 build 16:

  • Introduced some modifications to the Regex++ regular expressions engine to overcome a problem with "pathological" rules requiring exponential time for processing. Now time to process a single rule is limited to half-second. If a rule fails to complete in this time a processing finishes and ISAPI_Rewrite sends "500 Internal Server error" to a client to indicate configuration error.
  • Added new N (Next) flag to the RewriteRule and RewriteHeader directives. It makes possible to organize loops while processing rules.
  • Added RepeatLimit directive to limit the number of possible loops.
  • Added F (Forbidden) flag to the RewriteRule and RewriteHeader directives. It forces to send 403 Forbidden response to a client if a positive match detected.
  • Added O (nOrmalize) flag to the RewriteRule, RewriteHeader and RewriteCond directives. It points out that checked string first should be normalized (i.e. URL encoding, illegal characters, etc removed).
  • Added a possibility to check ServerVariables with RewriteCond directive. It could be done using %ServerVariable instead of a header name.
  • Improved configuration parsing process error logging. Now error messages contain line numbers.

ISAPI_Rewrite version 1.2 build 14 (Full version only):

  • Fixed a problem introduced in the Full version 1.1 build 11. Configuration flag CacheClockRate was incorrectly parsed. And after the first cache cleanup inetinfo.exe process began to consume 99% of a CPU time.

ISAPI_Rewrite version 1.2 build 13:

  • Added new flag U (Unmangle Log). Now ISAPI_Rewrite can log URL as it was originally requested.

ISAPI_Rewrite version 1.1 build 11:

  • Fixed a problem with the truncation of the last character of a configuration file.
  • Fixed several shortcomings with documentation and default configuration files.
  • Included additional optimisation for the Internet Information Server 6.0.
  • ISAPI_Rewrite now adds custom header with original URL information to the client request, so the original URL can be retrieved in the server script.
  • New RewriteHeader directive now allows to rewrite not only the URL part of the client request, but any other HTTP header or even method and version information.

Main concept

ISAPI_Rewrite provides a rule-based rewriting engine to rewrite requested URLs on the fly. It supports virtually unlimited number of the rules and an unlimited number of attached rule conditions to provide a really flexible and powerful URL manipulation mechanism (Really a config file size is forcibly limited to 2Mb to prevent possible config parsing overhead). The URL manipulations can depend on tests for HTTP headers, Request-URI, method and version.

This program operates on Request-URI (including query string) and HTTP headers as it described in RFC 2068 both in per-server (global) context and per-virtual-site context. The result of operation can lead to either rewriting or redirection.

Rewriting will cause server to continue request processing with new URI as if it has been the originally requested URI. New URI can include query string section (following question mark) and may direct to any files, script calls, program invocations etc. 

Redirection will cause server to send immediate response to client with redirect instruction (HTTP response code 302 with Location header), providing result URI as a new location. You can use absolute links (that is required by RFC 2068) in redirect instruction to redirect request to different host, port and protocol. Redirect instruction always causes rewriting engine to stop the processing sequence.

Rules are processed in the order as they appear in configuration file. ISAPI_Rewrite processes per-server (global) rules first and then it processes an individual virtual site rules if specified. There are no recursively requests or subsequent rollbacks in processing order, so you will never get into an infinite loop.

The rewriting engine loops through the ruleset rule by rule (RewriteRule directives). The particular rule is applied only if it matches against URI and all corresponding conditions (RewriteCond directives) matches against their test strings. ISAPI_Rewrite uses match algorithm. It means that expression is matched only if it matches the whole input string. If rule is applied ISAPI_Rewrite continues to loop through the ruleset with new URI until the last rule will be processed.

ISAPI_Rewrite saves original path info + query string before any manipulation on the URL in HTTP header named X-Rewrite-URL. Then it can be retrieved in ASP with Request.ServerVariables("HTTP_X_REWRITE_URL").

Whenever you use parentheses in Pattern or in one of the CondPattern, back-references are internally created which can be used withing the format string (using $N syntax) or withing the other patterns (using /N syntax). The references are global for the entire RewriteRule directive and corresponding RewriteCond directives. Sub matches are numbered from up to down and from left to right beginning with the first RewriteCond directive (if such is exists) corresponding to the RewriteRule directive.

To simplify rules and strengthen server security it is strongly recommended that you disable parent paths in the IIS settings.

Lite version limitation

Lite and Full versions of ISAPI_Rewrite are the same, except that Lite version doesn't support for per-virtual-site configuration, only global rules are processed.

Special notes for the IIS6

These special notes concern new features of the Internet Information Server 6.0 built-in into the Windows.NET Server and limitations imposed by those features upon the ISAPI_Rewrite functionality.

The main difference of the IIS6 from it's ancestors is a new default process model called Worker Process Isolation (WPI) mode. Also IIS6 could operate in the IIS5-compatibility mode (which have no effect on the ISAPI_Rewrite's functionality) it's main advantages could be achived only in the WPI mode.

In the WPI mode virtual web sites or even individual web applications are running inside an Application Pools. And each application pool is served by one or more isolated worker processes w3wp.exe. It looks like High isolation mode in the IIS5 but there exists one significant difference - filters are not running inside the inetinfo.exe process anymore. They running inside a worker processes as an usual applications.

It means that there could be multiple instances of a single filter (one instance for each worker process). Nevertheless this is not a problem for the ISAPI_Rewrite. But now consider the case where a virtual site itself belongs to one Application pool while child web application belongs to another pool. There will be 2 different instances of the filter and one will process requests for the site while other will process request for the child web appication. The bad thing is that IIS determines target application pool before the first invocation of the filter. And rewriting of an URLs from one pool to another pool are prohibited. So there is no easy way to redirect request from one pool to another except sending a HTTP Redirect response to the client browser.

Configuration

Configuration file format

There are two types of configuration files - global (per-server) and individual (per-virtual-site) files. The global configuration file should be named httpd.ini and should appear in the ISAPI_Rewrite installation directory. The shortcut of this file is provided through the start menu. The individual configuration files should be named httpd.ini and could appear in physical root directories of virtual sites. Both file types formats are the same and it is the standard Windows INI file braked by sections. The only section allowed in this version of ISAPI_Rewrite is [ISAPI_Rewrite]. All directives should be placed in this section and each directive should be placed on a separate line. Any text outside this section will be ignored.

httpd.ini file example:

[ISAPI_Rewrite]

# This is a comment

# 300 = 5 minutes
CacheClockRate 300
RepeatLimit 20

# Protect httpd.ini and httpd.parce.errors files
# from accessing through HTTP
RewriteRule ^/httpd(/.ini|/.parse/.errors).* . [F,I]

# Some custom rules
RewriteCond Host: (.+)
RewriteRule (.*) /$1$2 [I]

When ISAPI_Rewrite parses configuration file it creates error log file named httpd.parse.error in the same directory where parsed file is located.

CacheClockRate directive

Syntax: CacheClockRate Interval

This directive can appear only in global configuration context. If this directive is found in a per-virtual-site context it will be ignored and an error message will be written to a httpd.parse.errors file.

ISAPI_Rewrite caches every configuration file at first time it is loaded. Using this directive you can specify period of inactivity of particular site when it's configuration will be purged from cache. By setting this parameter big enough you can force ISAPI_Rewrite to never recycle its cache. Remember that any changes to configuration files update cache immediately after the next request regardless of this interval.

  • Interval

    Specifies time of inactivity (in seconds) when particular configuration will be purged from cache. The default value is 3600 (1 hour).

RepeatLimit directive

Syntax: RepeatLimit Limit

This directive could appear both in global and in per-virtual-site configuration files. If it will appear in the global configuration file it will change the global limit for all sites. If this directive will appear in a per-virtual-site configuration file it will change a limit for this site only and this limit could not exceed the global limit.

ISAPI_Rewrite allows loops while processing rules (see the description of the N flag of the RewriteRule and RewriteHeader directives). This directive allows to limit the maximum number of possible loops. It could be set to zero or one to disable looping.

  • Limit

    Specifies a maximum number of allowed loops. The default value is 32.

RewriteCond directive

Syntax: RewriteCond TestVerb CondPattern [Flags]

The RewriteCond directive defines a rule condition. Precede a RewriteRule directive with one or more RewriteCond directives. The following rewriting rule applied only if its pattern matches the current state of the URI and if these additional conditions apply too.

  • TestVerb

    Specifies verb that will be matched against regular expression.

    TestVerb=(URL | METHOD | VERSION | HTTPHeaderName: | %ServerVariable) where:

    • URL - returns Request-URI of client request as described in RFC 2068 (HTTP 1.1);
    • METHOD - returns HTTP method of client request (OPTIONS, GET, HEAD, POST, PUT, DELETE or TRACE);
    • VERSION - returns HTTP version;
    • HTTPHeaderName - returns value of the specified HTTP header. HTTPHeaderName can be any valid HTTP header name. Header names should include the trailing colon ":". If specified header does not exists in a client's request  TestVerb is treated as empty string.
      HTTPHeaderName = 
      Accept:
      Accept-Charset:
      Accept-Encoding:
      Accept-Language:
      Authorization:
      Cookie:
      From:
      Host:
      If-Modified-Since:
      If-Match:
      If-None-Match:
      If-Range:
      If-Unmodified-Since:
      Max-Forwards:
      Proxy-Authorization:
      Range:
      Referer:
      User-Agent:
      Any-Custom-Header:

      For more information about HTTP headers and their values refer to RFC 2068.

    • ServerVariable - returns value of the specified Server Variable. For examlpe, SERVER_PORT. Complete list of the server variables could be found in the IIS documentation. Variable name should be prefixed with % sign.
  • CondPattern

    The regular expression to match TestVerb.

  • [Flags]

    Flags is a comma-separated list of the following flags:

    • O (nOrmalize)

      Normalizes string before processing. Normalization includes removing of an URL-encoding, illegal characters, etc. This flas is useful with URLs and URL-encoded headers.

    RewriteRule directive

    Syntax: RewriteRule Pattern FormatString [Flags]

    The RewriteRule directive is the real rewriting workhorse. The directive can occur more than once. Each directive defines one single rewriting rule. The definition order of these rules is important, because this order is used when applying the rules at run-time.

    • Pattern

      Specifies regular expression that will be matched against Request-URI. See regular expression syntax section for more info.

    • FormatString

      Specifies format string that will generate new URI. See format string syntax section for more info.

    • [Flags]

      Flags is a comma-separated list of the following flags:

      • I (ignore case)

        Indicates that characters are matched regardless of case. This flag affects RewriteRule directive and all corresponding RewriteCond directives.

      • F (Forbidden)

        Stops the rewriting process and sends 403 Forbidden response to a client. Note that FormatString is useless in this case and could be set to any non-empty string.

      • L (last rule)

        Stop the rewriting process here and don't apply any more rewriting rules. Use this flag to prevent the currently rewritten URI from being rewritten further by following rules.

      • N (Next iteration)

        Forces rewriting engine to modify rule's target and restart rule checking from the beginning (all modifications are saved). Number of restarts is limited by the value specified in the RepeatLimit directive. If this number is exceeded N flag will be simply ignored.

      • R (explicit redirect)

        Force server to send immediate response to client with redirect instruction, providing result URI as a new location. Redirect rule is always the last rule.

      • U (Unmangle Log)

        Log the URL as it was originally requested and not as the URL was rewritten.

      • O (nOrmalize)

        Normalizes string before processing. Normalization includes removing of an URL-encoding, illegal characters, etc. This flas is useful with URLs and URL-encoded headers.

    RewriteHeader directive

    Syntax: RewriteHeader  HeaderName  Pattern  FormatString  [Flags]

    The RewriteHeader directive is more general variant of RewriteRule directive and it is designed to rewrite not only the URL part of client request, but any HTTP header. This directive can be used to rewrite, create or delete any HTTP headers, or even change method of the client request.

    • HeaderName 

      Specifies a HTTP header that will be rewritten. Possible values are the same as for the TestVerb parameter in the RewriteCond directive. Thus, RewriteRule directive is a synonym to the RewriteHeader URL Pattern Format [Flags]

    • Pattern

      Specifies regular expression that will be matched against specified header. See regular expression syntax section for more information.

    • FormatString

      Specifies format string that will generate new header value. See format string syntax section for more information.

    • [Flags]

      Flags is a comma-separated list of the following flags:

      • I (ignore case)

        Indicates that characters are matched regardless of case. This flag affects RewriteHeader directive and all corresponding RewriteCond directives.

      • F (Forbidden)

        Stops the rewriting process and sends 403 Forbidden response to a client. Note that FormatString is useless in this case and could be set to any non-empty string.

      • L (last rule)

        Stop the rewriting process here and don't apply any more rewriting rules.

      • N (Next iteration)

        Forces rewriting engine to modify rule's target and restart rule checking from the beginning (all modifications are saved). Number of restarts is limited by the value specified in the RepeatLimit directive. If this number is exceeded N flag will be simply ignored.

      • R (explicit redirect)

        Force server to send immediate response to client with redirect instruction, providing new URI as a new location. Redirect rule is always the last rule.

      • U (Unmangle Log)

        Log the URL as it was originally requested and not as the URL was rewritten.

      • O (nOrmalize)

        Normalizes string before processing. Normalization includes removing of an URL-encoding, illegal characters, etc. This flas is useful with URLs and URL-encoded headers.

    To remove header, format string pattern should generate an empty string. For example this rule will remove user agent information from the client request:

    RewriteHeader User-Agent: .* $0

    And this rule will add Old-URL header to the request, providing a Request-URL as a header value:

    RewriteCond URL (.*)
    RewriteHeader Old-URL: ^$ $1

    This last example will direct all WebDAV requests to the /webdav.asp script by changing request method:

    RewriteCond METHOD OPTIONS
    RewriteRule (.*) /webdav.asp?$1
    RewriteHeader METHOD OPTIONS GET

    Regular expression syntax

    This section covers the regular expression syntax used by ISAPI_Rewrite.

    Special note about "pathological" regular expressions

    ISAPI_Rewrite uses a very powerful regular expressions engine Regex++ written by Dr. John Maddock. But as any real thing it's not ideal: There exists some "pathological" expressions which may require exponential time for matching; these all involve nested repetition operators, for example attempting to match the expression "(a*a)*b" against N letter a's requires time proportional to 2N. These expressions can (almost) always be rewritten in such a way as to avoid the problem, for example "(a*a)*b" could be rewritten as "a*b" which requires only time linearly proportional to N to solve. In the general case, non-nested repeat expressions require time proportional to N2, however if the clauses are mutually exclusive then they can be matched in linear time - this is the case with "a*b", for each character the matcher will either match an "a" or a "b" or fail, where as with "a*a" the matcher can't tell which branch to take (the first "a" or the second) and so has to try both.

    In the version 1.3 of ISAPI_Rewrite we introduced some modifications to the Regex++ regular expressions engine to overcome a problem with "pathological" rules requiring exponential time for processing. Now time to process a single rule is limited to half-second. If a rule fails to complete in this time a processing finishes and ISAPI_Rewrite sends "500 Internal Server error" to a client to indicate configuration error. Also the failed rule is disabled to prevent performance losses. These solution is a hack of Regex++ and should be considered as temporary. We have contacted Dr. Maddock and he told us that he will try to detect such "pathologial" expressions in a future version of Regex++ so there will be no need for hacks at all.

    Literals

    All characters are literals except: ".", "*", "?", "+", "(", ")", "{", "}", "[", "]", "^" and "$". These characters are literals when preceded by a "/". A literal is a character that matches itself.

    Wildcard

    The dot character "." matches any single character except null character and newline character. 

    Repeats

    A repeat is an expression that is repeated an arbitrary number of times. An expression followed by "*" can be repeated any number of times including zero. An expression followed by "+" can be repeated any number of times, but at least once. An expression followed by "?" may be repeated zero or one times only. When it is necessary to specify the minimum and maximum number of repeats explicitly, the bounds operator "{}" may be used, thus "a{2}" is the letter "a" repeated exactly twice, "a{2,4}" represents the letter "a" repeated between 2 and 4 times, and "a{2,}" represents the letter "a" repeated at least twice with no upper limit. Note that there must be no white-space inside the {}, and there is no upper limit on the values of the lower and upper bounds. All repeat expressions refer to the shortest possible previous sub-expression: a single character; a character set, or a sub-expression grouped with "()" for example.

    Examples:

    • "ba*" will match all of "b", "ba", "baaa" etc.
    • "ba+" will match "ba" or "baaaa" for example but not "b".
    • "ba?" will match "b" or "ba".
    • "ba{2,4}" will match "baa", "baaa" and "baaaa".

    Non-greedy repeats

    Non-greedy repeats are possible by appending a '?' after the repeat; a non-greedy repeat is one which will match the shortest possible string.

    For example to match html tag pairs one could use something like:

    "</s*tagname[^>]*>(.*?)</s*/tagname/s*>"

    In this case $1 will contain the text between the tag pairs, and will be the shortest possible matching string. 

    Parenthesis

    Parentheses serve two purposes, to group items together into a sub-expression, and to mark what generated the match. For example the expression "(ab)*" would match all of the string "ababab". All sub matches marked by parenthesis can be back referenced using /N or $N syntax. It is permissible for sub-expressions to match null strings. Sub-expressions are indexed from left to right starting from 1, sub-expression 0 is the whole expression.

    Non-Marking Parenthesis

    Sometimes you need to group sub-expressions with parenthesis, but don't want the parenthesis to spit out another marked sub-expression, in this case a non-marking parenthesis (?:expression) can be used. For example the following expression creates no sub-expressions:

    "(?:abc)*" 

    Alternatives

    Alternatives occur when the expression can match either one sub-expression or another, each alternative is separated by a "|". Each alternative is the largest possible previous sub-expression; this is the opposite behaviour from repetition operators.

    Examples:

    • "a(b|c)" could match "ab" or "ac".
    • "abc|def" could match "abc" or "def". 

    Sets

    A set is a set of characters that can match any single character that is a member of the set. Sets are delimited by "[" and "]" and can contain literals, character ranges, character classes, collating elements and equivalence classes. Set declarations that start with "^" contain the compliment of the elements that follow.

    Examples:

    Character literals:

    • "[abc]" will match either of "a", "b", or "c".
    • "[^abc] will match any character other than "a", "b", or "c".

    Character ranges:

    • "[a-z]" will match any character in the range "a" to "z".
    • "[^A-Z]" will match any character other than those in the range "A" to "Z".

    Character classes

    Character classes are denoted using the syntax "[:classname:]" within a set declaration, for example "[[:space:]]" is the set of all whitespace characters. The available character classes are: 

    alnumAny alpha numeric character.
    alphaAny alphabetical character a-z and A-Z. Other characters may also be included depending upon the locale.
    blankAny blank character, either a space or a tab.
    cntrlAny control character.
    digitAny digit 0-9.
    graphAny graphical character.
    lowerAny lower case character a-z. Other characters may also be included depending upon the locale.
    printAny printable character.
    punctAny punctuation character.
    spaceAny whitespace character.
    upperAny upper case character A-Z. Other characters may also be included depending upon the locale.
    xdigitAny hexadecimal digit character, 0-9, a-f and A-F.
    wordAny word character - all alphanumeric characters plus the underscore.
    unicodeAny character whose code is greater than 255, this applies to the wide character traits classes only.

    There are some shortcuts that can be used in place of the character classes:

    • /w in place of [:word:]
    • /s in place of [:space:]
    • /d in place of [:digit:]
    • /l in place of [:lower:]
    • /u in place of [:upper:] 

    Collating elements

    Collating elements take the general form [.tagname.] inside a set declaration, where tagname is either a single character, or a name of a collating element, for example [[.a.]] is equivalent to [a], and [[.comma.]] is equivalent to [,]. ISAPI_Rewrite supports all the standard POSIX collating element names, and in addition the following digraphs: "ae", "ch", "ll", "ss", "nj", "dz", "lj", each in lower, upper and title case variations. Multi-character collating elements can result in the set matching more than one character, for example [[.ae.]] would match two characters, but note that [^[.ae.]] would only match one character. 

    Equivalence classes

    Equivalenceclassestakethegeneralform[=tagname=] inside a set declaration, where tagname is either a single character, or a name of a collating element, and matches any character that is a member of the same primary equivalence class as the collating element [.tagname.]. An equivalence class is a set of characters that collate the same, a primary equivalence class is a set of characters whose primary sort key are all the same (for example strings are typically collated by character, then by accent, and then by case; the primary sort key then relates to the character, the secondary to the accentation, and the tertiary to the case). If there is no equivalence class corresponding to tagname, then [=tagname=] is exactly the same as [.tagname.].

    To include a literal "-" in a set declaration then: make it the first character after the opening "[" or "[^", the endpoint of a range, a collating element, or precede it with an escape character as in "[/-]". To include a literal "[" or "]" or "^" in a set then make them the endpoint of a range, a collating element, or precede with an escape character. 

    Line anchors

    An anchor is something that matches the null string at the start or end of a line: "^" matches the null string at the start of a line, "$" matches the null string at the end of a line. 

    Back references

    A back reference is a reference to a previous sub-expression that has already been matched, the reference is to what the sub-expression matched, not to the expression itself. A back reference consists of the escape character "/" followed by a digit "1" to "9", "/1" refers to the first sub-expression, "/2" to the second etc. For example the expression "(.*)/1" matches any string that is repeated about its mid-point for example "abcabc" or "xyzxyz". A back reference to a sub-expression that did not participate in any match, matches the null string. In ISAPI_Rewrite all back references are global for entire RewriteRule and corresponding RewriteCond directives. Sub matches are numbered up to down and left to right beginning from the first RewriteCond directive of the corresponding RewriteRule directive, if there is one.

    Forward Lookahead Asserts

    There are two forms of these; one for positive forward lookahead asserts, and one for negative lookahead asserts:

    • "(?=abc)" matches zero characters only if they are followed by the expression "abc".
    • "(?!abc)" matches zero characters only if they are not followed by the expression "abc".

    Word operators

    The following operators are provided for compatibility with the GNU regular expression library.

    • "/w" matches any single character that is a member of the "word" character class, this is identical to the expression "[[:word:]]".
    • "/W" matches any single character that is not a member of the "word" character class, this is identical to the expression "[^[:word:]]".
    • "/<" matches the null string at the start of a word.
    • "/>" matches the null string at the end of the word.
    • "/b" matches the null string at either the start or the end of a word.
    • "/B" matches a null string within a word.

    Escape operator

    The escape character "/" has several meanings.

    • The escape operator may introduce an operator for example: back references, or a word operator.
    • The escape operator may make the following character normal, for example "/*" represents a literal "*" rather than the repeat operator.

    Single character escape sequences:

    The following escape sequences are aliases for single characters:
     

    Escape sequenceCharacter codeMeaning
    /a0x07Bell character.
    /t0x09Tab character.
    /v0x0BVertical tab.
    /e0x1BASCII Escape character.
    /0dd0ddAn octal character code, where dd is one or more octal digits.
    /xXX0xXXA hexadecimal character code, where XX is one or more hexadecimal digits.
    /x{XX}0xXXA hexadecimal character code, where XX is one or more hexadecimal digits, optionally a unicode character.
    /cZz-@An ASCII escape sequence control-Z, where Z is any ASCII character greater than or equal to the character code for '@'.
     

    Miscellaneous escape sequences:

    The following are provided mostly for perl compatibility, but note that there are some differences in the meanings of /l /L /u and /U: 

    Escape sequenceMeaning
    /wEquivalent to [[:word:]].
    /WEquivalent to [^[:word:]].
    /sEquivalent to [[:space:]].
    /SEquivalent to [^[:space:]].
    /dEquivalent to [[:digit:]].
    /DEquivalent to [^[:digit:]].
    /lEquivalent to [[:lower:]].
    /LEquivalent to [^[:lower:]].
    /uEquivalent to [[:upper:]].
    /UEquivalent to [^[:upper:]].
    /CAny single character, equivalent to '.'.
    /XMatch any Unicode combining character sequence, for example "a/x 0301" (a letter a with an acute).
    /QThe begin quote operator, everything that follows is treated as a literal character until a /E end quote operator is found.
    /EThe end quote operator, terminates a sequence begun with /Q.
     

    What gets matched?

    The regular expression will match the first possible matching string, if more than one string starting at a given location can match then it matches the longest possible string. In cases where their are multiple possible matches all starting at the same location, and all of the same length, then the match chosen is the one with the longest first sub-expression, if that is the same for two or more matches, then the second sub-expression will be examined and so on. Note that ISAPI_Rewrite uses MATCH algorithm. The result is matched only if the expression matches the whole input sequence. For example:

    • RewriteCond URL ^/somedir/.* #will match any request to somedir directory and subdirectories, while
    • RewriteCond URL ^/somedir/ #will match only request to the root of the somedir.

    Format string syntax

    In format strings, all characters are treated as literals except: "(", ")", "$", "/", "?", ":".

    To use any of these as literals you must prefix them with the escape character /

    The following special sequences are recognized: 

    Grouping:

    Use the parenthesis characters ( and ) to group sub-expressions within the format string, use /( and /) to represent literal '(' and ')'. 

    Sub-expression expansions:

    The following perl like expressions expand to a particular matched sub-expression:

    $`Expands to all the text from the end of the previous match to the start of the current match, if there was no previous match in the current operation, then everything from the start of the input string to the start of the match.
    $'Expands to all the text from the end of the match to the end of the input string.
    $&Expands to all of the current match.
    $0Expands to all of the current match.
    $NExpands to the text that matched sub-expression N.
     

    Conditional expressions:

    Conditional expressions allow two different format strings to be selected dependent upon whether a sub-expression participated in the match or not:

    ?Ntrue_expression:false_expression

    Executes true_expression if sub-expression N participated in the match, otherwise executes false_expression.

    Example: suppose we search for "(while)|(for)" then the format string "?1WHILE:FOR" would output what matched, but in upper case.

    Escape sequences:

    The following escape sequences are also allowed:
     

    /aThe bell character.
    /fThe form feed character.
    /nThe newline character.
    /rThe carriage return character.
    /tThe tab character.
    /vA vertical tab character.
    /xA hexadecimal character - for example /x0D.
    /x{}A possible unicode hexadecimal character - for example /x{1A0}
    /cxThe ASCII escape character x, for example /c@ is equivalent to escape-@.
    /eThe ASCII escape character.
    /ddAn octal character constant, for example /10.

    Examples

    Emulating host-header-based virtual sites on a single site

    For example you have registered two domains www.site1.com and www.site2.com Now you can create two different sites using single physical site. Add the following rules to your httpd.ini file:

    [ISAPI_Rewrite]
    
    RewriteCond  Host:  (?:www/.)?site1/.com
    RewriteRule  (.*)   /site1$1
    
    RewriteCond  Host:  (?:www/.)?site2/.com
    RewriteRule  (.*)   /site2$1

    Now just place your sites in /site1 and /site2 directories.

    Or you can use more general rules:

    [ISAPI_Rewrite]
    
    RewriteCond  Host:  (?:www/.)?(.+)
    RewriteRule  (.*)   /$1$2

    The directory names for sites should be like /somesite1.com, /somesite2.info, etc.

    Using loops (Next flag) to convert request parameters

    Suppose you wish to access physical URLs like http://www.myhost.com/foo.asp?a=A&b=B&c=C using requests like http://www.myhost.com/a/A/b/B/c/C/foo.asp and the number of parameters may vary from one request to another.

    There exists at least two possible solutions. You could simply add a separate rule for each possible number of parameters or you could use a technique demonstrated by the following example.

    [ISAPI_Rewrite]
    
    RewriteRule /([^/]*)/([^/]*)(.*)foo.asp(.+)? $3foo.asp(?4$4&:/?)$1=$2 [N,I]

    This rule will extract one parameter from request URL, append it to the end of the request string and restart rules processing from the beginning. So it will loop until all parameters will be moved to the right place (or until the RepeatLimit will be exceeded).

    Moving sites from UNIX to IIS

    This rules can help change the URL from /~username to /username and /file.html to /file.htm. It can be useful if you just moved your site from UNIX to IIS and keep getting hits to the old pages from search engines and other external pages.

    [ISAPI_Rewrite]
    
    #redirecting to update old links
    RewriteRule (.*)/.html $1.htm
    RewriteRule  /~(.*)  http/://myserver/$1 [R]

    Moving site location

    Many webmasters asked for a solution to the following problem: They want to redirect all requests to one web server to the another web server. Such problems usually arise when you need to establish a newer web server which will replace the old one over time. The solution is to use ISAPI_Rewrite on the old web server:

    [ISAPI_Rewrite]
    
    #redirecting to update old links
    RewriteRule  (.+)  http/://newwebserver$1 [R]

    Browser-dependent content

    It is sometimes necessary to provide browser-dependent content at least for important top-level pages, i.e. one has to provide a full-featured version for the Internet Explorer, a minimum-featured version for the Lynx browsers and an average-featured version for all others.

    We have to act on the HTTP header "User-Agent". The sample code does the following: If the HTTP header "User-Agent" contains "MSIE", the target foo.htm is rewritten to foo.IE.htm. If the browser is "Lynx" or "Mozilla" of version 1 or 2 the URL becomes foo.20.htm. Other browsers receive page foo.32.html. All this is done by the following ruleset:

    [ISAPI_Rewrite]
    
    RewriteCond  User-Agent:  .*MSIE.*
    RewriteRule  foo/.htm  foo.IE.htm  [L]
    
    RewriteCond  User-Agent:  (?:Lynx|Mozilla/[12]).*
    RewriteRule  foo/.htm  foo.20.htm  [L]
    
    RewriteRule  foo/.htm  foo.32.htm  [L]

    Dynamically generated robots.txt

    robots.txt is a file that search engines use to discover URLs that should or should not be indexed. But creation of this file for large sites with lot of dynamic content is a very complex task. Have you ever dreamed about dynamically generated robots.txt? Let's write robots.asp script:

    <%@ Language=JScript EnableSessionState=False%>
    <%
    
    //The script must return plain text
    Response.ContentType="text/plain";
    
    /*
    Place generation code here
    */
    
    %>

    Now make it robots.txt using single rule:

    [ISAPI_Rewrite]
    
    RewriteRule  /robots/.txt  /robots.asp

    Server side XML processing

    Content of the site stored in XML files. There is /XMLProcess.asp file that processes XML files on server and returns HTML to end user. URLs to the documents have a form of:
    http://www.mysite.com/XMLProcess.asp?xml=/somdir/somedoc.xml
    But many popular search engines will not index such documents because URLs contain question mark (document is dynamically generated). ISAPI_Rewrite can competely eliminate this problem.

    [ISAPI_Rewrite]
    
    RewriteRule  /doc(.*)/.htm  /XMLProcess.asp/?xml=$1.xml

    Now to access documents use URL like http://www.mysite.com/doc/somedir/somedoc.htm. Search engines will never know that physically there is no somedoc.htm file and content is dynamically generated.

    Using conditional expressions

    Sometimes you need to apply rule when some pattern not matches. Unlike Apache's mod_Rewrite, ISAPI_Rewrite don't have support for non-matching patterns because it is not very obvious what should rewriting engine do when non-matching pattern generates a sub matches. Instead of non-matching patterns you can use conditional expressions in format strings.

    For example you need to move all users not using Internet Explorer to the other location:

    [ISAPI_Rewrite]
    
    #if user agent is Internet Explorer leave URI untouched
    #else precede it with /nonie
    
    RewriteCond  User-Agent:  .*(MSIE)?.*
    RewriteRule  (.+) ?1$2:/nonie$2

    Proxy throughput

    We are planning to implement proxy throughput abilities in the next version of ISAPI_Rewrite. But since rewriting URLs can lead to script or program invocations, you can implement your own proxy mechanism using ASP or CGI/ISAPI applications. Create proxy.asp file in the root of site and write following ASP code:

    <%@ Language=JScript EnableSessionState=False%>
    <%
    
    //we are using MSXML3.0 to travel via HTTP
    
    var httpReq;
    if(Request.QueryString.Count)
    {
        httpReq=Server.CreateObject("Msxml2.ServerXMLHTTP");
        httpReq.setTimeouts(5000,5000,15000,15000);
        httpReq.open("GET",""+Request.QueryString, false);
        httpReq.send();
        Response.Status=httpReq.status;
        Response.ContentType=httpReq.getResponseHeader("Content-Type");
        Response.BinaryWrite(httpReq.responseBody);
    }
    %>

    Now /proxy.asp can be used as a proxy providing requested URL in query string. Usage example:

    [ISAPI_Rewrite]
    
    #throughput all content of /images folder to another server
    RewriteRule  /images(.+)  /proxy.asp/?http/://myimagearchive.net/image$1

    Blocking inline-images

    Assume we have some pages with inlined GIF graphics under http://www.quux-corp.de/. These graphics are nice, so others directly incorporate them via hyperlinks to their pages. We don't like this practice because it adds useless traffic to our server.

    While we cannot 100% protect the images from inclusion, we can at least restrict the cases where the browser sends a HTTP Referer header.

    [ISAPI_Rewrite]
    
    RewriteCond  Referer: .+
    RewriteCond  Referer: (http://www/.quux-corp/.de/)?.*
    RewriteRule  (.*/.gif)  ?1$2:/404.asp [I]

    ISAPI_Rewrite filter uses Regex++ librarary. This document contains part of Regex++ library documentation.

    Regex++ (Version Boost 1.28.0)  
    Copyright (c) 1998-2002, Dr John Maddock

     类似资料:

    相关阅读

    相关文章

    相关问答