Google Mail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Adding UTF8 IDENTIFIERS to Flex
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  2 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Follow-up To:
Add Cc | Add Follow-up to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers that you hear
 
SeeScreen  
View profile   Translate to Translated (View Original)
 More options 24 Oct, 21:48
Newsgroups: comp.compilers
From: SeeScreen <seescr...@gmail.com>
Date: Sat, 24 Oct 2009 13:48:19 -0700 (PDT)
Local: Sat 24 Oct 2009 21:48
Subject: Adding UTF8 IDENTIFIERS to Flex
The solution is based on the GREEN portions of the first chart shown
on this link:
  http://www.w3.org/2005/03/23-lex-U

UTF8_BYTE_ORDER_MARK   [\xEF][\xBB][\xBF]

D           [0-9]
ASCII     [\x0-\xFF]

U1          [a-zA-Z_]
U2          [\xC2-\xDF][\x80-\xBF]
U3          [\xE0][\xA0-\xBF][\x80-\xBF]
U4          [\xE1-\xEC][\x80-\xBF][\x80-\xBF]
U5          [\xED][\x80-\x9F][\x80-\xBF]
U6          [\xEE-\xEF][\x80-\xBF][\x80-\xBF]
U7          [\xF0][\x90-\xBF][\x80-\xBF][\x80-\xBF]
U8          [\xF1-\xF3][\x80-\xBF][\x80-\xBF][\x80-\xBF]
U9          [\xF4][\x80-\x8F][\x80-\xBF][\x80-\xBF]

L       {ASCII}|{U2}|{U3}|{U4}|{U5}|{U6}|{U7}|{U8}|{U9}
U      [\x0-\xFF]|{U2}|{U3}|{U4}|{U5}|{U6}|{U7}|{U8}|{U9}

%{

 #include <stdio.h>
 #include <string.h>
 #include <math.h>
 #include "y.tab.h"
 #define YY_NO_UNISTD_H
 int lineNumber = 0;

%}

%%

{UTF8_BYTE_ORDER_MARK}       {  /*  Byte Order Mark */   }

"int"                   { return (INT); }

{L}({L}|{D})*           { return (IDENTIFIER); }
";"                     { return(';');  }
")"                     { return(')');  };
"("                     { return('(');  };
"="                     { return('=');  }
[ \t\v\f]               {  }
[\r\n]|[\r]|[\n]        { lineNumber++; }
.                       { /* ignore bad characters */ }

%%

int yywrap()
{
  return(1);

}

***********************************************
The U pattern above correctly recognizes the entire UTF8 set of
patterns. The L pattern recognizes the same set of patterns, except
that it excludes characters that can not be used for {C, C++, or Java}
IDENTIFIERS.

This solution also correctly ignores the UTF8 Byte Order mark, (if
embedded at the beginning of the text file) as long as the source text
file begins with at least one blank character or one blank line.

The above works with very old versions of Flex, as long as the -L (lex
Compatability flag) is not specified. The -8 (generate eight bit
scanner flag) was also specified, even though it may be the default.


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Hans Aberg  
View profile   Translate to Translated (View Original)
 More options 25 Oct, 10:06
Newsgroups: comp.compilers
From: Hans Aberg <haberg_20080...@math.su.se>
Date: Sun, 25 Oct 2009 11:06:37 +0100
Local: Sun 25 Oct 2009 10:06
Subject: Re: Adding UTF8 IDENTIFIERS to Flex

SeeScreen wrote:
> The solution is based on the GREEN portions of the first chart shown
> on this link:
>   http://www.w3.org/2005/03/23-lex-U

I hacked together this, which converts Unicode character ranges to Flex
like expressions:
   http://lists.gnu.org/archive/html/help-flex/2005-01/msg00043.html

(For single characters <char>, one can just feed a UTF-8 .l file to Flex
in 8-bit mode, with "<char>" expressions.)

   Hans


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message, you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google