Friday 15 August 2014

parsing - Why does the lexer rule for strings takes precedence over all my other rules? -


Using Alex Lakerser I am creating a laser to counter the "header" email. Here's an example title:

  to: "John Doe" & lt; John@doe.org>   

"John Doe" is called "Display Name" and assume that it can contain any ASCII characters.

Similarly we believe that some part of the email address can have any ASCII characters.

Below is my Alex program I when I run "Header" above, I get a token:

  [tokenstrings "from: \" John doe \ "  

Apparently this rule:

  $ us_ascii_character + {\ - - & gt; TokenString s}   

What takes priority over all other rules?

I thought the priority was based on the order in which the rules are physically listed in my program: to see if the input string matches the first rule, if this is not a match Check if the input string matches the second rule, and even further. No?

How do I express my rules, such as Lexor "Headers" indicates these tokens:

  to:, "John doe",   

And can display names and email parts contain any ASCII characters? Here is my Alex Lezzer:

  {module Main (main) where}% Cover "Pause" $ points = 0-9 $ alpha = [A-GA- Jade] $ us_ascii_character = [\ t \ n \ r \! \ "\ # \ $ \% \ & Amp;? \ '\ (\) \ * \ + \, \ - \ \ / 0123456789 \: \; \\ LT; \ = \ & gt; \ \ @ABCDEFGHIJKLMNOPQRSTUVWXYZ \ [\\\] \ ^ _` abcdefghijklmnopqrstuvwxyz \ {\ | \} ~ \ DEL] tokens: - $ white +; {\ s -> tokenform]: {\ - -> tokencollone} "{ \ S - & gt; TokenQuote} \ & lt; {\ S - & gt; Token left angle bracket} & gt; {\ S - & gt; Tokenite Angel Breakette} @ {\ - - & gt; Tokenetesine} \. {\ S - & gt; Tokenieriod} $ us_ascii_character + {\ - - & gt; Tokenstrings} {- Type of each action: String - & gt; Token - Token type: Data token = Tokens | Tokencollon | Tokencot | Token Left Angle Bracket | Tokenite Angle Bracket | Tokenetcin | Tokenieriode | Rules were misinterpreted to choose the rule: TokenString String deriving (Eq, Show) 
< P>

Input stream matches more than one rule, then the rule matches the long prefix of the input stream. If there are still many rules that correspond to the same number of characters, then the rule that appears first in the file.

As has been said. Only when many rules match equally long prefix, the order in which it is specific.

Matches the whole input, you should only create a single [tokenstrings "from: \" John Doe \ "& lt; john@doe.org>"] .

In order to token to input, S -> tokenstrings s}

(Disclaimer: I do not know the Alex syntax, probably this would actually be different).

No comments:

Post a Comment