I know that this question has been asked before. I had checked all the previous replies but still my problem could not be solved. Please forgive me for a clear duplicate question.
I am writing a pell program to process the text file in Chinese file. I want to recognize the Chinese text but I want to exclude all other lines like English or other languages and URLs. I use " Thanks in advance for any help and advice! You have to open the file as UTF-8: Otherwise it is read as a byte string, which is not what you want. utf8 " and "
$ line = ~ / (\ p {Han} +) / " but it does nothing if I " Use the
utf8 "and"
$ line = ~ / ä¿¡æ ?? ¯ / , then it does nothing if I "
Use utf8 ", then"
$ line = ~ / ä¿¡æ ?? ¯ / "can work but not"
$ line = ~ / (\ P {Han} +) / ". I check text file with encoding: file -bi input .txt, it shows: "
text / plain; Charset = utf-8 ". The following code is:
$ | = 1; strict usage; use utf8; $$ $ $ ARGV in my $ [0]; sub main {open ($, Line = ~ / (if my $ line = & lt; IN & gt;); {$ line = ~ / (\ "$" P {Han} +) /) {print "sugar: $ line \ n";} if ($ line = ~ / ä¿¡æ ?? ¯ /) {print "$ line \ n";}} while # (IN);}
Open IN, Can not open "& lt :: encoding (UTF-8)", IN $ or $ $ die in $ \ n ";
No comments:
Post a Comment