comparison xml/en/docs/dev/development_guide.xml @ 1919:dcfb4f3ac8a7

Added the "Regular expressions" section to the development guide.
author Vladimir Homutov <vl@nginx.com>
date Wed, 01 Mar 2017 14:06:46 +0300
parents 8b7c3b0ef1a4
children de5251816480
comparison
equal deleted inserted replaced
1918:4ecc39397e97 1919:dcfb4f3ac8a7
525 </list> 525 </list>
526 </para> 526 </para>
527 527
528 </section> 528 </section>
529 529
530 <section name="Regular expressions" id="regex">
531
532 <para>
533 The regular expressions interface in nginx is a wrapper around
534 the <link url="http://www.pcre.org">PCRE</link>
535 library.
536 The corresponding header file is <path>src/core/ngx_regex.h</path>.
537 </para>
538
539 <para>
540 To use a regular expression for string matching, first, it needs to be
541 compiled, this is usually done at configuration phase.
542 Note that since PCRE support is optional, all code using the interface must
543 be protected by the surrounding <literal>NGX_PCRE</literal> macro:
544 <programlisting>
545 #if (NGX_PCRE)
546 ngx_regex_t *re;
547 ngx_regex_compile_t rc;
548
549 u_char errstr[NGX_MAX_CONF_ERRSTR];
550
551 ngx_str_t value = ngx_string("message (\\d\\d\\d).*Codeword is '(?&lt;cw&gt;\\w+)'");
552
553 ngx_memzero(&amp;rc, sizeof(ngx_regex_compile_t));
554
555 rc.pattern = value;
556 rc.pool = cf->pool;
557 rc.err.len = NGX_MAX_CONF_ERRSTR;
558 rc.err.data = errstr;
559 /* rc.options are passed as is to pcre_compile() */
560
561 if (ngx_regex_compile(&amp;rc) != NGX_OK) {
562 ngx_conf_log_error(NGX_LOG_EMERG, cf, 0, "%V", &amp;rc.err);
563 return NGX_CONF_ERROR;
564 }
565
566 re = rc.regex;
567 #endif
568 </programlisting>
569 After successful compilation, <literal>ngx_regex_compile_t</literal> structure
570 fields <literal>captures</literal> and <literal>named_captures</literal>
571 are filled with count of all and named captures respectively found in the
572 regular expression.
573 </para>
574
575 <para>
576 Later, the compiled regular expression may be used to match strings against it:
577 <programlisting>
578 ngx_int_t n;
579 int captures[(1 + rc.captures) * 3];
580
581 ngx_str_t input = ngx_string("This is message 123. Codeword is 'foobar'.");
582
583 n = ngx_regex_exec(re, &amp;input, captures, (1 + rc.captures) * 3);
584 if (n >= 0) {
585 /* string matches expression */
586
587 } else if (n == NGX_REGEX_NO_MATCHED) {
588 /* no match was found */
589
590 } else {
591 /* some error */
592 ngx_log_error(NGX_LOG_ALERT, log, 0, ngx_regex_exec_n " failed: %i", n);
593 }
594 </programlisting>
595 The arguments of <literal>ngx_regex_exec()</literal> are: the compiled regular
596 expression <literal>re</literal>, string to match <literal>s</literal>,
597 optional array of integers to hold found <literal>captures</literal>
598 and its <literal>size</literal>.
599 The <literal>captures</literal> array size must be a multiple of three,
600 per requirements of the
601 <link url="http://www.pcre.org/original/doc/html/pcreapi.html">PCRE API</link>.
602 In the example, its size is calculated from a total number of captures plus
603 one for the matched string itself.
604 </para>
605
606 <para>
607 Now, if there are matches, captures may be accessed:
608 <programlisting>
609 u_char *p;
610 size_t size;
611 ngx_str_t name, value;
612
613 /* all captures */
614 for (i = 0; i &lt; n * 2; i += 2) {
615 value.data = input.data + captures[i];
616 value.len = captures[i + 1] - captures[i];
617 }
618
619 /* accessing named captures */
620
621 size = rc.name_size;
622 p = rc.names;
623
624 for (i = 0; i &lt; rc.named_captures; i++, p += size) {
625
626 /* capture name */
627 name.data = &amp;p[2];
628 name.len = ngx_strlen(name.data);
629
630 n = 2 * ((p[0] &lt;&lt; 8) + p[1]);
631
632 /* captured value */
633 value.data = &amp;input.data[captures[n]];
634 value.len = captures[n + 1] - captures[n];
635 }
636 </programlisting>
637 </para>
638
639 <para>
640 The <literal>ngx_regex_exec_array()</literal> function accepts the array of
641 <literal>ngx_regex_elt_t</literal> elements (which are just compiled regular
642 expressions with associated names), a string to match and a log.
643 The function will apply expressions from the array to the string until
644 the match is found or no more expressions are left.
645 The return value is <literal>NGX_OK</literal> in case of match and
646 <literal>NGX_DECLINED</literal> otherwise, or <literal>NGX_ERROR</literal>
647 in case of error.
648 </para>
649
650 </section>
530 651
531 </section> 652 </section>
532 653
533 654
534 <section name="Containers" id="containers"> 655 <section name="Containers" id="containers">