Monday 15 July 2013

parsing - Java RTF Parser -


Does anyone know of a strong RTF parser I can use in Java? I need to remove plain text including international text. It would be nice to remove embedded images and files. It can also be a C ++ or another library which I can call easily or if there is a good source code, then I can convert to Java.

The following library does not cover enough RTF, or parse some valid RTFs

  1. RTFdeterkit of Java swing, quite basic and brittle ApacheTikka, Nich, and many more Other devices use it.
  2. An RTF library from ITDST (Comm. Lowagie.etc ...) is not very comprehensive,
  3. etranslate rtf library (it is most complete of java) is not an update Version, but the version I failed on some of my RTF archives (RTFs are valid, at least they open in MSWORD and OpenOffice OK).

    is a C # library which is quite full, but alas ... this is C # and Java is not

    I also saw in the open office, It's too slow for me, though it's probably too broad.

    (I searched web searches and stack overflow before posting this question, so if you are talking about an ancient "already asked" post, then maybe there is a reply in it Feel free to tell me, if I can not remember it!)

    You can find useful You can find it provides a stream-based parser, which can be parsed as your document Provides events. A simple example is the extractor, which shows how the API can be used.

No comments:

Post a Comment