I am importing contents from an Excel-generated CSV-file into an XML document like:
$csv = fopen($csvfile, r);
$words = array();
while (($pair = fgetcsv($csv)) !== FALSE) {
array_push($words, array('en' => $pair[0], 'de' => $pair[1]));
}
The inserted data are English/German expressions.
I insert these values into an XML structure and output the XML as following:
$dictionary = new SimpleXMLElement('<dictionary></dictionary>');
//do things
$dom = dom_import_simplexml($dictionary) -> ownerDocument;
$dom -> formatOutput = true;
header('Content-encoding: utf-8'); //<3 UTF-8
header('Content-type: text/xml'); //Headers set to correct mime-type for XML output!!!!
echo $dom -> saveXML();
This is working fine, yet I am encountering one really strange problem. When the first letter of a String is an Umlaut (like in Österreich or Ägypten) the character will be omitted, resulting in gypten or sterreich. If the Umlaut is in the middle of the String (Russische Föderation) it gets transferred correctly. Same goes for things like ß or é or whatever.
All files are UTF-8 encoded and served in UTF-8.
This seems rather strange and bug-like to me, yet maybe I am missing something, there's a lot of smart people around here.