Mercurial > hg > nginx-site
changeset 1919:dcfb4f3ac8a7
Added the "Regular expressions" section to the development guide.
author | Vladimir Homutov <vl@nginx.com> |
---|---|
date | Wed, 01 Mar 2017 14:06:46 +0300 |
parents | 4ecc39397e97 |
children | de5251816480 |
files | xml/en/docs/dev/development_guide.xml |
diffstat | 1 files changed, 121 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- a/xml/en/docs/dev/development_guide.xml Tue Feb 28 15:31:38 2017 +0300 +++ b/xml/en/docs/dev/development_guide.xml Wed Mar 01 14:06:46 2017 +0300 @@ -527,6 +527,127 @@ </section> +<section name="Regular expressions" id="regex"> + +<para> +The regular expressions interface in nginx is a wrapper around +the <link url="http://www.pcre.org">PCRE</link> +library. +The corresponding header file is <path>src/core/ngx_regex.h</path>. +</para> + +<para> +To use a regular expression for string matching, first, it needs to be +compiled, this is usually done at configuration phase. +Note that since PCRE support is optional, all code using the interface must +be protected by the surrounding <literal>NGX_PCRE</literal> macro: +<programlisting> +#if (NGX_PCRE) +ngx_regex_t *re; +ngx_regex_compile_t rc; + +u_char errstr[NGX_MAX_CONF_ERRSTR]; + +ngx_str_t value = ngx_string("message (\\d\\d\\d).*Codeword is '(?<cw>\\w+)'"); + +ngx_memzero(&rc, sizeof(ngx_regex_compile_t)); + +rc.pattern = value; +rc.pool = cf->pool; +rc.err.len = NGX_MAX_CONF_ERRSTR; +rc.err.data = errstr; +/* rc.options are passed as is to pcre_compile() */ + +if (ngx_regex_compile(&rc) != NGX_OK) { + ngx_conf_log_error(NGX_LOG_EMERG, cf, 0, "%V", &rc.err); + return NGX_CONF_ERROR; +} + +re = rc.regex; +#endif +</programlisting> +After successful compilation, <literal>ngx_regex_compile_t</literal> structure +fields <literal>captures</literal> and <literal>named_captures</literal> +are filled with count of all and named captures respectively found in the +regular expression. +</para> + +<para> +Later, the compiled regular expression may be used to match strings against it: +<programlisting> +ngx_int_t n; +int captures[(1 + rc.captures) * 3]; + +ngx_str_t input = ngx_string("This is message 123. Codeword is 'foobar'."); + +n = ngx_regex_exec(re, &input, captures, (1 + rc.captures) * 3); +if (n >= 0) { + /* string matches expression */ + +} else if (n == NGX_REGEX_NO_MATCHED) { + /* no match was found */ + +} else { + /* some error */ + ngx_log_error(NGX_LOG_ALERT, log, 0, ngx_regex_exec_n " failed: %i", n); +} +</programlisting> +The arguments of <literal>ngx_regex_exec()</literal> are: the compiled regular +expression <literal>re</literal>, string to match <literal>s</literal>, +optional array of integers to hold found <literal>captures</literal> +and its <literal>size</literal>. +The <literal>captures</literal> array size must be a multiple of three, +per requirements of the +<link url="http://www.pcre.org/original/doc/html/pcreapi.html">PCRE API</link>. +In the example, its size is calculated from a total number of captures plus +one for the matched string itself. +</para> + +<para> +Now, if there are matches, captures may be accessed: +<programlisting> +u_char *p; +size_t size; +ngx_str_t name, value; + +/* all captures */ +for (i = 0; i < n * 2; i += 2) { + value.data = input.data + captures[i]; + value.len = captures[i + 1] - captures[i]; +} + +/* accessing named captures */ + +size = rc.name_size; +p = rc.names; + +for (i = 0; i < rc.named_captures; i++, p += size) { + + /* capture name */ + name.data = &p[2]; + name.len = ngx_strlen(name.data); + + n = 2 * ((p[0] << 8) + p[1]); + + /* captured value */ + value.data = &input.data[captures[n]]; + value.len = captures[n + 1] - captures[n]; +} +</programlisting> +</para> + +<para> +The <literal>ngx_regex_exec_array()</literal> function accepts the array of +<literal>ngx_regex_elt_t</literal> elements (which are just compiled regular +expressions with associated names), a string to match and a log. +The function will apply expressions from the array to the string until +the match is found or no more expressions are left. +The return value is <literal>NGX_OK</literal> in case of match and +<literal>NGX_DECLINED</literal> otherwise, or <literal>NGX_ERROR</literal> +in case of error. +</para> + +</section> </section>