Skip to content

[WIP] use of several contexts in parallel

Jonathan Poelen requested to merge jpoelen/syntax-highlighting:prectx into master

This is an attempt to implement shell-session or gcov without completely rewriting syntax. The goal is to check another context before the one normally used to create a "new branch" with priority.

  <!-- Shell Session -->
  <context attribute="Normal Text" lineEndContext="#stay" name="Start" fallthroughContext="Output">
    <Detect2Chars attribute="Prompt" context="Bash" char="$" char1=" "/>
    <Detect2Chars attribute="Prompt" context="Bash" char="$" char1="&tab;"/>
    <Detect2Chars attribute="Prompt2" context="Bash" char=">" char1=" "/>
    <Detect2Chars attribute="Prompt2" context="Bash" char=">" char1="&tab;"/>
  </context>

  <context attribute="Output" lineEndContext="#pop" name="Output">
  </context>

  <!-- new attribute: preContext -->
  <context attribute="Normal Text" lineEndContext="#pop" name="Bash" preContext="LinePrefix">
    <IncludeRules context="BashOneLine##Bash" includeAttrib="true"/>
  </context>

  <context attribute="Normal Text" lineEndContext="#pop" name="LinePrefix" fallthroughContext="#pop" lineEmptyContext="#pop#pop">
    <Detect2Chars attribute="Prompt2" context="#pop" char=">" char1=" " column="0"/>
    <Detect2Chars attribute="Prompt2" context="#pop" char=">" char1="&tab;" column="0"/>
    <RegExpr context="#pop#pop" String="." column="0" lookAhead="1"/>
  </context>

Here, LinePrefix will be used before each use of the normal flow rules. A #pop allows you to jump into the normal flow, 2 #pop to return to the previous state (Start) and completely unwind the normal flow.

2022-01-03-232629_459x458_scrot

(right without preContext)

This mechanism is complicated to implement and has a flaw: it cannot be applied in the middle of a rule. For example, if the stream normal to a regex of the form <.*>, preContext applies to the < and what comes after >, but not to the rule consumes.

This is not a problem in the case of shell-session and gcov since the information is at the beginning of the line, but can be problematic if we want for example to represent complete syntaxe in a string or others that may end in the middle of the line.

bla bla `py: # a comment` bla bla
                        ^ end of syntax
                          ^ normal text

I thought of something to represent transitions and cut the text into sub-parts on which individual contexts are applied.

You should be able to represent a transition such as gcov -> c ++

<sequence> <!-- new rule ? context attribute ? -->
  <group context="GCov"/> <!-- when we exit GCov, we switch to C++ ->
  <group context="C++"/> <!-- when we exit C++ you keep your context state -->
</sequence>

But also be able to isolate part of a line

<context name="pyCode" ....>
  <partition String="`"> <!-- equivalent regular expression: ([^`]*)(`)(.*) -->
    <group context="python"/>
    <group context="#pop"/>
    <!-- <group context="..."/> ignored because the previous context is #pop -->
  </partition>
  <!-- partition is useful to have several parallel contexts -->
</context>

I think it might also allow syntaxes like Doxygen to be independent of comments used by the language.

So much for the general idea. I would like to get back on it 😄

Merge request reports