Commit b520021b authored by Nibaldo González's avatar Nibaldo González Committed by Dominik Haumann

Update documentation of highlight & RegExp

parent c05c1677
......@@ -339,7 +339,8 @@ In this example, the <userinput>itemData</userinput> <emphasis>Normal Text</emph
<varlistentry>
<term>The last part of a highlight definition is the optional
<userinput>general</userinput> section. It may contain information
about keywords, code folding, comments and indentation.</term>
about keywords, code folding, comments, indentation, empty lines and
spell checking.</term>
<listitem>
<para>The <userinput>comment</userinput> section defines with what
......@@ -350,12 +351,24 @@ user presses the corresponding shortcut for <emphasis>comment/uncomment</emphasi
<para>The <userinput>keywords</userinput> section defines whether
keyword lists are case sensitive or not. Other attributes will be
explained later.</para>
<para>The other sections, <userinput>folding</userinput>,
<userinput>emptyLines</userinput> and <userinput>spellchecking</userinput>,
are usually not necessary and are explained later.</para>
<programlisting>
&lt;general&gt;
&lt;comments&gt;
&lt;comment name="singleLine" start="#"/&gt;
&lt;/comments&gt;
&lt;keywords casesensitive="1"/&gt;
&lt;folding indentationsensitive="0"/&gt;
&lt;emptyLines&gt;
&lt;emptyLine regexpr="\s+"/&gt;
&lt;emptyLine regexpr="\s*#.*"/&gt;
&lt;/emptyLines&gt;
&lt;spellchecking&gt;
&lt;encoding char="&#225;" string="\&#39;a"/&gt;
&lt;encoding char="&#224;" string="\&#96;a"/&gt;
&lt;/spellchecking&gt;
&lt;/general&gt;
&lt;/language&gt;
</programlisting>
......@@ -397,6 +410,10 @@ to the context specified in fallthroughContext if no rule matches.
Default: <emphasis>false</emphasis>.</para>
<para><userinput>fallthroughContext</userinput> specifies the next context
if no rule matches.</para>
<para><userinput>noIndentationBasedFolding</userinput> disables indentation-based folding
in the context. If indentation-based folding is not activated, this attribute is useless.
This is defined in the element <emphasis>folding</emphasis> of the group <emphasis>general</emphasis>.
Default: <emphasis>false</emphasis>.</para>
</listitem>
</varlistentry>
......@@ -490,6 +507,35 @@ do not need to set it, as it defaults to <emphasis>false</emphasis>.</para>
</varlistentry>
<varlistentry>
<term>The element <userinput>emptyLine</userinput> in the group <userinput>emptyLines</userinput>
defines which lines should be treated as empty lines. This allows modifying the behavior of the
<emphasis>lineEmptyContext</emphasis> attribute in the elements <userinput>context</userinput>.
Available attributes are:</term>
<listitem>
<para><userinput>regexpr</userinput> defines a regular expression that will be treated as an empty line.
By default, empty lines do not contain any characters, therefore, this adds additional empty lines,
for example, if you want lines with spaces to also be considered empty lines.
However, in most syntax definitions you do not need to set this attribute.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>The element <userinput>encoding</userinput> in the group <userinput>spellchecking</userinput>
defines a character encoding for spell checking. Available attributes:</term>
<listitem>
<para><userinput>char</userinput> is a encoded character.</para>
<para><userinput>string</userinput> is a sequence of characters that will be encoded as
the character <emphasis>char</emphasis> in the spell checking.
For example, in the language LaTeX, the string <userinput>\&quot;{A}</userinput> represents
the character <userinput>&#196;</userinput>.</para>
</listitem>
</varlistentry>
</variablelist>
......@@ -654,7 +700,7 @@ current context in its <userinput>string</userinput> or
<userinput>char</userinput> attributes. In a <userinput>string</userinput>,
the placeholder <replaceable>%N</replaceable> (where N is a number) will be
replaced with the corresponding capture <replaceable>N</replaceable>
from the calling regular expression. In a
from the calling regular expression, starting from 1. In a
<userinput>char</userinput> the placeholder must be a number
<replaceable>N</replaceable> and it will be replaced with the first character of
the corresponding capture <replaceable>N</replaceable> from the calling regular
......@@ -666,6 +712,93 @@ expression. Whenever a rule allows this attribute it will contain a
</listitem>
</itemizedlist>
<para>How does it work:</para>
<para>In the <link linkend="regular-expressions">regular expressions</link> of the
<userinput>RegExpr</userinput> rules, all text within simple curved brackets
<userinput>(PATTERN)</userinput> is captured and remembered.
These captures can be used in the context to which it is switched, in the rules with the
attribute <userinput>dynamic</userinput> <emphasis>true</emphasis>, by
<replaceable>%N</replaceable> (in <emphasis>String</emphasis>) or
<replaceable>N</replaceable> (in <emphasis>char</emphasis>).</para>
<para>It is important to mention that a text captured in a <userinput>RegExpr</userinput> rule is
only stored for the switched context, specified in its <userinput>context</userinput> attribute.</para>
<tip>
<itemizedlist>
<listitem>
<para>If the captures will not be used, both by dynamic rules and in the same regular expression,
<userinput>non-capturing groups</userinput> should be used: <userinput>(?:PATTERN)</userinput></para>
<para>The <emphasis>lookahead</emphasis> or <emphasis>lookbehind</emphasis> groups such as
<userinput>(?=PATTERN)</userinput> or <userinput>(?!PATTERN)</userinput> are not captured.
See <link linkend="regular-expressions">Regular Expressions</link> for more information.</para>
</listitem>
<listitem>
<para>The capture groups can be used within the same regular expression,
using <replaceable>\N</replaceable> instead of <replaceable>%N</replaceable> respectively.
For more information, see <link linkend="regex-capturing">Capturing matching text (back references)</link>
in <link linkend="regular-expressions">Regular Expressions</link>.</para>
</listitem>
</itemizedlist>
</tip>
<para>Example 1:</para>
<para>In this simple example, the text matched by the regular expression
<userinput>=*</userinput> is captured and inserted into <replaceable>%1</replaceable>
in the dynamic rule. This allows the comment to end with the same amount of
<userinput>=</userinput> as at the beginning. This matches text like:
<userinput>[[ comment ]]</userinput>, <userinput>[=[ comment ]=]</userinput> or
<userinput>[=====[ comment ]=====]</userinput>.</para>
<para>In addition, the captures are available only in the switched context
<emphasis>Multi-line Comment</emphasis>.</para>
<programlisting>
&lt;context name="Normal" attribute="Normal Text" lineEndContext="#stay"&gt;
&lt;RegExpr context="Multi-line Comment" attribute="Comment" String="\[(=*)\[" beginRegion="RegionComment"/&gt;
&lt;/context&gt;
&lt;context name="Multi-line Comment" attribute="Comment" lineEndContext="#stay"&gt;
&lt;StringDetect context="#pop" attribute="Comment" String="]%1]" dynamic="true" endRegion="RegionComment"/&gt;
&lt;/context&gt;
</programlisting>
<para>Example 2:</para>
<para>In the dynamic rule, <replaceable>%1</replaceable> corresponds to the capture that matches
<userinput>#+</userinput>, and <replaceable>%2</replaceable> to <userinput>&amp;quot;+</userinput>.
This matches text as: <userinput>#label""""inside the context""""#</userinput>.</para>
<para>These captures will not be available in other contexts, such as
<emphasis>OtherContext</emphasis>, <emphasis>FindEscapes</emphasis> or
<emphasis>SomeContext</emphasis>.</para>
<programlisting>
&lt;context name="SomeContext" attribute="Normal Text" lineEndContext="#stay"&gt;
&lt;RegExpr context="#pop!NamedString" attribute="String" String="(#+)(?:[\w-]|[^[:ascii:]])(&amp;quot;+)"/&gt;
&lt;/context&gt;
&lt;context name="NamedString" attribute="String" lineEndContext="#stay"&gt;
&lt;RegExpr context="#pop!OtherContext" attribute="String" String="%2(?:%1)?" dynamic="true"/&gt;
&lt;DetectChar context="FindEscapes" attribute="Escape" char="\"/&gt;
&lt;/context&gt;
</programlisting>
<para>Example 3:</para>
<para>This matches text like:
<userinput>Class::function&lt;T&gt;( ... )</userinput>.</para>
<programlisting>
&lt;context name="Normal" attribute="Normal Text" lineEndContext="#stay"&gt;
&lt;RegExpr context="FunctionName" String="\b([a-zA-Z_][\w-]*)(::)([a-zA-Z_][\w-]*)(?:&amp;lt;[\w\-\s]*&amp;gt;)?(\()" lookAhead="true"/&gt;
&lt;/context&gt;
&lt;context name="FunctionName" attribute="Normal Text" lineEndContext="#pop"&gt;
&lt;StringDetect context="#stay" attribute="Class" String="%1" dynamic="true"/&gt;
&lt;StringDetect context="#stay" attribute="Operator" String="%2" dynamic="true"/&gt;
&lt;StringDetect context="#stay" attribute="Function" String="%3" dynamic="true"/&gt;
&lt;DetectChar context="#pop" attribute="Normal Text" char="4" dynamic="true"/&gt;
&lt;/context&gt;
</programlisting>
<sect3 id="highlighting-rules-in-detail">
<title>The Rules in Detail</title>
......@@ -955,6 +1088,16 @@ The attribute <userinput>column</userinput> counts characters, so a tabulator is
</para>
</listitem>
<listitem>
<para>In <userinput>RegExpr</userinput> rules, use the attribute <userinput>column="0"</userinput> if the pattern
<userinput>^PATTERN</userinput> will be used to match text at the beginning of a line.
This improves performance, as it will avoid looking for matches in the rest of the columns.</para>
</listitem>
<listitem>
<para>In regular expressions, use non-capturing groups <userinput>(?:PATTERN)</userinput> instead of
capturing groups <userinput>(PATTERN)</userinput>, if the captures will not be used in the same regular
expression or in dynamic rules. This avoids storing captures unnecessarily.</para>
</listitem>
<listitem>
<para>You can switch contexts without processing characters. Assume that you
want to switch context when you meet the string <userinput>*/</userinput>, but
need to process that string in the next context. The below rule will match, and
......
......@@ -240,15 +240,14 @@ corresponding to the octal number ooo (between 0 and
<varlistentry>
<term><userinput>\w</userinput></term>
<listitem><para>Matches any <quote>word character</quote> - in this case any letter or digit. Note that
underscore (<literal>_</literal>) is not matched, as is the case with perl regular expressions.
Equal to <literal>[a-zA-Z0-9]</literal></para></listitem>
<listitem><para>Matches any <quote>word character</quote> - in this case any letter, digit or underscore.
Equal to <literal>[a-zA-Z0-9_]</literal></para></listitem>
</varlistentry>
<varlistentry>
<term><userinput>\W</userinput></term>
<listitem><para>Matches any non-word character - anything but letters or numbers.
Equal to <literal>[^a-zA-Z0-9]</literal> or <literal>[^\w]</literal></para></listitem>
<listitem><para>Matches any non-word character - anything but letters, numbers or underscore.
Equal to <literal>[^a-zA-Z0-9_]</literal> or <literal>[^\w]</literal></para></listitem>
</varlistentry>
......@@ -256,13 +255,17 @@ Equal to <literal>[^a-zA-Z0-9]</literal> or <literal>[^\w]</literal></para></lis
</para>
<para>The <emphasis>POSIX notation of classes</emphasis>,
<userinput>[:&lt;class name&gt;:]</userinput> are also supported.
For example, <userinput>[:digit:]</userinput> is equivalent to <userinput>\d</userinput>,
and <userinput>[:space:]</userinput> to <userinput>\s</userinput>.
See the full list of POSIX character classes
<ulink url="https://www.regular-expressions.info/posixbrackets.html">here</ulink>.</para>
<para>The abbreviated classes can be put inside a custom class, for
example to match a word character, a blank or a dot, you could write
<userinput>[\w \.]</userinput></para>
<note> <para>The POSIX notation of classes, <userinput>[:&lt;class
name&gt;:]</userinput> is currently not supported.</para> </note>
<sect3>
<title>Characters with special meanings inside character classes</title>
......@@ -331,12 +334,14 @@ put the alternatives inside a subpattern:
</sect3>
<sect3>
<sect3 id="regex-capturing">
<title>Capturing matching text (back references)</title>
<para>If you want to use a back reference, use a sub pattern to have
the desired part of the pattern remembered.</para>
<para>If you want to use a back reference, use a sub pattern <userinput>(PATTERN)</userinput>
to have the desired part of the pattern remembered.
To prevent the sub pattern from being remembered, use a non-capturing group
<userinput>(?:PATTERN)</userinput>.</para>
<para>For example, if you want to find two occurrences of the same
word separated by a comma and possibly some whitespace, you could
......@@ -657,6 +662,28 @@ pattern.</para>
</listitem>
</varlistentry>
<varlistentry>
<term><userinput>(PATTERN)</userinput> (Capturing group)</term>
<listitem><para>The sub pattern within the parentheses is captured and remembered,
so that it can be used in back references. For example, the expression
<userinput>(&amp;quot;+)[^&amp;quot;]*\1</userinput> matches
<userinput>&quot;&quot;&quot;&quot;text&quot;&quot;&quot;&quot;</userinput> and
<userinput>&quot;text&quot;</userinput>.</para>
<para>See the section <link linkend="regex-capturing">Capturing matching text (back references)</link>
for more information.</para>
</listitem>
</varlistentry>
<varlistentry>
<term><userinput>(?:PATTERN)</userinput> (Non-capturing group)</term>
<listitem><para>The sub pattern within the parentheses is not captured and
is not remembered. It is preferable to always use non-capturing groups if
the captures will not be used.</para>
</listitem>
</varlistentry>
</variablelist>
</para>
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment