PHP XML options

XML is the Extensible Markup Language. Common uses for XML include the documents produced by LibreOffice, OpenOffice, and Microsoft Office. PHP has several modules for reading and writing XML.

XML lets you define a scope for your data in a DTD, a Document Type Definition. The DTD can be used when reading or writing XML but is not needed.

When you read the XML document, you read it as a document or as a long transaction type file or as a continuous stream. PHP provides modules for all types of processing.

Alternative formats

CSV, Comma Separated Variable, files were used for row oriented data but were useless for complicated structures and were mostly replaced by XML. JSON is now used for small structures of data. JSON does stupid things including misusing whitespace as a formatting element, which means the slightest editing can really screw up the JSON data. Configuration files can be read from php.ini style files. PHP provides functions for all those formats. XML is the most flexible.

How does XML work?

Read other articles under XML for the details on how XML works. If you have created HTML, you have created something similar to XML as both XML and HTML are based on SGML.

XML contains elements with start and end tags. The element start tag can have attributes. You can have a structure of elements within elements. A LibreOffice document has a complex structure many levels deep and is best processed by modules that read the whole document into one structure, modules like the PHP DOM module.

Transaction type files have one outer element, the container, then one data row element repeated many times. Reading the whole file into memory can flood memory. You can read this type of file row by row with the PHP XML Parser module.

Streams are similar to transaction files and may run forever. Use the PHP XML Reader and XML Writer modules.

PHP XML modules

PHP provides several modules for reading XML. You decide to read the XML as a one off document or to process the XML as a long running file or as a stream. There is more than one PHP option for each choice.

XML Parser

XML Parser is the most reliable code for small to medium files. XML Parser is the XML processor least likely to fail and lets you build the data structure you want. There are a few projects that started with DOM or SimpleXML then had to switch to XML Parser.

DOM

PHP provides the DOM module to read XML as a document, not a long running file or stream. I find the DOM module fails to read XML with really simple errors and the output of the DOM XML read process is really complicated. While there are reasons to use DOM for some projects, I have never used DOM successfully and had to replace DOM on a couple of projects.

libxml

PHP provides the libxml library of code for use in the DOM module and most of the other PHP XML modules. You can use libxml direct for decoding XML errors.

SDO

SDO, Service Data Objects, and SDO DAS XML help you handle XML from sources that can also supply data in other formats. For each format, there is an SDO DAS. SDO converts all the SDO DAS formats to one common format. I find that data from multiple sources in different formats is rarely compatible with a single process. You usually have to process each format first, making SDO of little use.

SimpleXML

PHP provides SimpleXML because SimpleXML is popular. SimpleXML works for reliable XML sources and fails with errors that will pass through XML Parser. Some of the SimpleXML errors are difficult to interpret. The SimpleXML format is excellent for some uses and difficult to use for other uses.

The SimpleXML output works when you know what is in the input. XML Parser lets you see the data as the data is processed and vary the processing based on the input.

WDDX

WDDX is the Web Distributed Data Exchange and is similar to the XML-RPC format. SOAP is a later protocol using part of the WDDX format and some of the XML-RPC protocol. WDDX was not as popular as regular XML or XML-RPC and is almost universally replaced by JSON. PHP provides a WDDX processor.

XML diff and merge

The PHP XML diff module takes two XML files, or strings, or DOM structures. and displays the differences. The output of the diff can be merged into an XML file/string/DOM using the merge function. The diff output is human readable for easier testing.

XML Reader and XML Writer

XML Reader reads from an XML stream and XML Writer writes to an XML stream. Each row/transaction is called a node. XML Reader reads a node, presents the node for processing, then reads the next node. You have class methods including open, read, next, and close.

XSL

XSL reads an XSLT, a template, then converts an XML file from one XML format to another XML format. An XSLT is easy to create for simple conversions and difficult to create for anything else. XSL is useless for converting data from non XML formats.

Read more