Consider:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.newDocument();
Element root = doc.createElement("list");
doc.appendChild(root);
for(CorrectionEntry correction : dictionary){
Element elem = doc.createElement("elem");
elem.setAttribute("from", correction.getEscapedFrom());
elem.setAttribute("to", correction.getEscapedTo());
root.appendChild(elem);
}
(then follows the writing of the document into an XML file)
where getEscapedFrom and getEscapedTo return (in my code) something like finké if the originating word is finké. So as to perform a Unicode escape for the characters that are bigger than 127.
The problem is that the final XML has the following line <elem from="finke" to="fink&#xE9;" /> (from is finke, to is finké) where I would like it to be <elem from="finke" to="finké" />
I've tried, following another response in StackOverflow, to disable escaping of ampersands putting the line doc.appendChild(doc.createProcessingInstruction(StreamResult.PI_DISABLE_OUTPUT_ESCAPING, "&")); after the creation of the doc but without success.
How could I "tell XML" to not escape ampersands? Or, conversely, how could I let "XML" to convert from é, or \\u00E9, to é?
Update
I managed to come to the problem: up until the writing of the file the node (through debug) seems to contain the right string. Once I call transformer.transform(domSource, streamResult); everything goes wild.
DOMSource domSource = new DOMSource(doc);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
StreamResult streamResult = new StreamResult(baos);
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(domSource, streamResult);
System.out.println(baos.toString());
The problem seems to be the transformer.