I’m working on a job for a client where legacy database data are being used to generate an XML document for processing with an XSLT stylesheet.

The data are encoded HTML entities in the database. So when I created my DOMDocument, I got the following warnings:

Warning: DOMDocument::loadXML() [function.DOMDocument-loadXML]: Entity ‘middot’ not defined in Entity, line: 963 in /usr/local/www/data-dist/sheds/includes/SDEHSFunctions.php on line 414

Instead of passing in ‘·’ in the XML string to the constructor of the DOMDocument object, I needed to either declare all entities in the XML doctype (bothersome) or I needed to convert these text entities into numeric ones (eg. ‘·’ becomes ‘·’).

I took a look around and found this handy function:

http://php.net/get_html_translation_table

I did a print_r on the translation table returned and found that it returns an array where the key is the actual character represented and the element is the textual HTML entity. So here’s a quick function to get the character coded equivalent:

$to_convert = '·'; 
$table = get_html_translation_table(HTML_ENTITIES);
$equiv = '&#'.ord(array_search($to_convert,$table)).';';