Wednesday 15 August 2012

c# - XML Parser gets stuck on special characters, despite encoding -


This is the situation as it is:

I am getting data from an XML API The data sometimes has a special apostrophe character, which crashes my parser. This accident occurs when I read data from the local file. When I read data from the stream, there is no accident, but I do not even get a DOM tree: it leaves without informing me.

Below you will find a list of the efforts we've made:

  // var web = new WebClient () does not work; Web.Encoding = Encoding.UTF8; Var response = web.DownloadString ("http://thetvdb.com/api/apikey/series/" + show.TVDBID + "/"); Var tree = XDocument.Parse (reaction); // WorksWork Doc = New XMLDC (); Doc.Load ("C: \\ test \\ test.xml"); Var response = doc.InnerXml; Var tree = XDocument.Parse (reaction); // workspace xmlDoc = XDocument.Parse (file.readlace text ("c: \\ test \\ test.xml", System.Text.Encoding.UTF8)); Var xmlDoc = XDocument.Load ("C: \\ test \\ test.xml"); Var tree = xmldoc; // var web = new webclient () does not work; Web.Encoding = Encoding.UTF8; Web.DownloadFile ("http://thetvdb.com/api/apikey/series/" + show.TVDBID + "/", "C: \\ test.xml"); Var tree = XDocument.Load ("C: \\ test.xml"); // var web = new webclient () does not work; Web.Encoding = Encoding.UTF8; Var data = web DownloadData ("http://thetvdb.com/api/apikey/series/" + show.TVDBID + "/"); Var response = encoding. UTF8.GetString (data); Var tree = XDocument.Parse (reaction);   

I determine whether something works or if it reaches breakpoint in the first line of this loop:

  if (root ! = Null) {var endupdate = root.Element ("Series"). Element ("final incomplete"). Values; Leading (various ep. Desendants in tree ("episode")) {var season = API element ("season number"). Values; // breakpoints here}}   

Accidents happen when parser faces this apostry:  Enter image details here

When I entered this character manually, apostry or & amp; # 39 , then no error is thrown and it continues until next. When I look at the source page of the API request in Firefox and Chrome, it tells me that encoding is UTF-8 and examples of code on API wiki also show UTF-8 at the top.

Is this the place where I am yet to have any thoughts?

I have noticed that in my result string from API query, only during debugging, according to XML / Text / HTML Visualization, & lt; Series & gt; & Lt; / Series & gt; tag, and any & lt; Episode & gt; & Lt; / Episode & gt; However, when I execute the same query in my browser, it shows me both. Is this possible?

Update:

When I use Unicode as an encoding, I do not get any warning and I am fully capable. Parse local XML file! I'm not an encoding expert, is there a downside to use Unicode?

When using Unicode for data stream, I get a group of Asian letters.

This is to do with the encoding of your data, this allows you to get raw binary (hence No problem with encoding).

  WebClient myWebClient = new WebClient (); Byte [] data = myWebClient.DownloadData (uri); String xmlContents = Encoding.UTF8.GetString (data);   

Edit After your most recent incidents with Unicode, I will say that the data is actually encoded in UTF-16. Unicode is not an encoding type, it is essentially just a coded character set - i.e. a group of characters and mapping between their represented characters and integer code points. When you "turn some symbolic words into Unicode", then usually the meaning of UTF-16 is. Anyway, glad your problem has been solved!

No comments:

Post a Comment