Friday, 15 May 2015

regex - Match Chinese character in Perl -


I know that this question has been asked before. I had checked all the previous replies but still my problem could not be solved. Please forgive me for a clear duplicate question.

I am writing a pell program to process the text file in Chinese file. I want to recognize the Chinese text but I want to exclude all other lines like English or other languages ​​and URLs. I use " utf8 " and " $ line = ~ / (\ p {Han} +) / " but it does nothing if I " Use the utf8 "and" $ line = ~ / ä¿¡æ ?? ¯ / , then it does nothing if I " Use utf8 ", then" $ line = ~ / ä¿¡æ ?? ¯ / "can work but not" $ line = ~ / (\ P {Han} +) / ". I check text file with encoding: file -bi input .txt, it shows: " text / plain; Charset = utf-8 ". The following code is:

  $ | = 1; strict usage; use utf8; $$ $ $ ARGV in my $ [0]; sub main {open ($, Line = ~ / (if my $ line = & lt; IN & gt;); {$ line = ~ / (\ "$" P {Han} +) /) {print "sugar: $ line \ n";} if ($ line = ~ / ä¿¡æ ?? ¯ /) {print "$ line \ n";}} while # (IN);}   

Thanks in advance for any help and advice!

You have to open the file as UTF-8:

  Open IN, Can not open "& lt :: encoding (UTF-8)", IN $ or $ $ die in $ \ n ";   

Otherwise it is read as a byte string, which is not what you want.

No comments:

Post a Comment