XML数据提取可能是一项常见的任务,但是要直接使用此数据,您需要了解PHP如何解析XML。在PHP中解析XML涉及各种不同的功能,所有这些功能协同工作以从XML文档中提取数据。我将完成所有这些功能,并在最后将它们联系在一起。
xml_parser_create()
此函数用于创建解析器对象,该对象将在其余过程中使用。该对象用于存储数据和配置选项,并传递给所涉及的每个功能。
$xml_parser = xml_parser_create();
xml_set_element_handler()
接下来,我们需要设置将在脚本解析中使用的函数。该xml_set_handler()方法采用以下参数:
XML解析器参考:这是对使用xml_parser_create创建的解析器的参考function()。
起始元素:这是对函数的回调引用,当解析器运行时找到起始元素时将调用该函数。
End element:这是对函数的回调引用,当解析器运行时找到end元素时将调用该函数。
最后两个参数必须是具有特定占用空间的函数。这意味着它们需要具有正确的参数编号,但是您可以随心所欲地调用它。这是对该函数的调用示例xml_set_element_handler()。
xml_set_element_handler($xml_parser, "startElement", "endElement");
的startElement()和endElement()功能将自动由XML解析器对象时,事情在运动设定被调用。
startElement() 功能
在该函数的调用上方,xml_set_element_handler()您需要设置一个读取起始元素数据的方法。该方法必须具有以下参数:
因此,您的函数可能看起来像这样:
function startElement($xmlParser, $name, $attribs) {
echo "Start: " . $name ."<br />";
}
所有这些操作将打印出元素的名称,但是您可以做更多的事情。例如,假设您的元素之一被称为
, you can use an if or switch statement to store this value in a variable for use later. Like this:<code><pre>function startElement($xmlParser, $name, $attribs) {
global $variable;
switch ($name) {
case 'title':
$variable = $name;
break;
}
}
</pre></code><p>Remember that you will need to put this function declaration BEFORE the call for <code>xml_set_element_handler()</code>, PHP needs to know about this method so that it can point the parser towards it.</p><h4><code>endElement()</code> function</h4><p>This function is called when the parser encounters a xml closing element. In an opposite operation as before you might need to clear the variable you stored during the start element function. Again this decleration MUST be before the call for xml_set_element_handler. Note that if the tag is self closing then there will be no end element. The function must have the following parameters.</p><ul><li><strong>xml_parser</strong>: The parser created in the call to xml_parser_create.</li><li><strong>name</strong>: The name of the element.</li></ul><p>The following code will just print of the name of the end element, you can use this function to overright anything that may have happened in the startElement function. For example, you may have set a value in the <code>startElement()</code> to keep track of the depth of the parser into the XML document, you can use this method to reduce it. This might be important if there is more than one element with the same name, but in a different context.</p><code><pre>function endElement($parser, $name)
{
echo "End: " . $name . "<br />";
}
</pre></code><h3><code>xml_set_character_data_handler()</code></h3><p>The next function to call is xml_set_character_data_handler. This takes two parameters:</p><ul><li><strong>xml_parser</strong>: This is a callback reference to the xml parser that was created in the call to xml_parser_create.</li><li><strong>characterData</strong>: This is a callback reference to the method that will be called when character data is found.</li></ul><p>This function works in the same way as the <code>xml_set_element_handler()</code> function in that it simply sets a reference to the function that will be called when character data is encountered. The function is called like this.</p><code>xml_set_character_data_handler($xml_parser, "characterData");</code><h4><code>characterData()</code> function</h4><p>The <code>characterData()</code> function, which again MUST be placed before the call to <code>xml_set_character_data_handler()</code> and must also have the following parameters.</p><ul><li><strong>xml_parser</strong>: The reference to the xml parser created in the call to xml_parser_create.</li><li><strong>data</strong>: The data held within the XML element. Any CDATA tags have been used then the parser will return everything between those tags so no need to worry about cutting them out.</li></ul><p>So when the parser object finds a data object this method is called. The following function will just print out the data.</p><code><pre>function characterData($parser, $data) {
echo "Data: " . $data . "<br />";
}
</pre></code><p>One thing that it is essential that you look out for is the funny thing that the parser does when it encounders certain conditions. It will stop parsing and call the function again. This repeats until all of the data has been passed. I've listed (I think) all of the conditions below.</p><ul><li>The parser runs into an Entity Declaration, such as & (&) or ' (')</li><li>The parser finishes parsing an entity.</li><li>The parser runs into the new-line character (\n)</li><li>The parser runs into a series of tab characters (\t)</li><li>The content of the $data parameter is more than 1024 (bytes).</li></ul><p>The best way to explain this is to use an example. Lets say that you have the following string as part of the data.</p><code><pre>some text&
some more text'
last bit of text
</pre></code><p>If you used the previous example method of just printing out the information then the parser will print out the following:</p><code><pre>Data: some text
Data: &
Data: some more text
Data: '
Data: last bit of text
</pre></code><p>So be sure that when you call the method to make sure that all of the character data is passed through. One thing you could do is to have the <code>characterData()</code> function add the data to a string. The string is initialised when the startElement function is called and printed off when the endElement function is called.</p><h3><code>xml_parser_set_option()</code></h3><p>This method is optional and can be used if you want the parser to have a certain behaviour. For example, to turn off case folding on the parser use the following code.</p><code>xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, false);</code><p>Case folding is basically the turning of characters to their uppercase equivalent. However, in XML all tags must be lowercase so and for some reason the default of the parser is for this to be on. So if you create w3c valid XML make sure that you use this function to turn off case folding. Here is a list of the available options for this function.</p><ul><li><strong>XML_OPTION_CASE_FOLDING</strong>: (integer) Controls whether case-folding is enabled for this XML parser. Enabled by default.</li><li><strong>XML_OPTION_SKIP_TAGSTART</strong>: (integer) Specify how many characters should be skipped in the beginning of a tag name.</li><li><strong>XML_OPTION_SKIP_WHITE</strong>: (integer) Whether to skip values consisting of whitespace characters.</li><li><strong>XML_OPTION_TARGET_ENCODING</strong>: (string) Sets which target encoding to use in this XML parser. By default, it is set to the same as the source encoding used by <code>xml_parser_create()</code>. Supported target encodings are ISO-8859-1, US-ASCII and UTF-8.</li></ul><h3><code>xml_parse()</code></h3><p>This function is used to run the parser over some input. It takes the following parameters:</p><ul><li><strong>xml_parser</strong>: This is a xml parser object created in the <code>xml_parser_create()</code> function.</li><li><strong>data</strong>: A chunk of data to parse. This can be read from a file or a stream.</li><li><strong>end</strong>: (optional) If this is set to true then this is the last bit of data from the source and so this is the last time the function will be run.</li></ul><p>As you can see the <code>xml_parse()</code> function can be run over and over again until all of the data has been read from the file.</p><code><pre>if (!($fp = fopen("an_xmfile.xml", "r"))) {
die("could not open XML input");
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))){
die(sprintf("XML error: %s at line %d", xml_error_string(xml_get_error_code($xml_parser)), xml_get_current_line_number($xml_parser)));
}
}
</pre></code><h3><code>xml_parser_free()</code></h3><p>As the name suggests this function is called at the end of the XML parsing run. It basically just clears up the memory and throws away the XML parser object created at the start.</p><h3>Putting them all together</h3><p>Just as an example I have put the code together into something that will spit out XML into formatted HTML, albeit a little ugly. It is designed to allow you to expand upon to create your own XML parsing script.</p><code><pre>// 起始元素功能
function startElement($xmlParser, $name, $attribs) {
echo "Start: " . $name . "<br />";
}
// 结束元素功能
function endElement($parser, $name) {
echo "End: " . $name . "<br />";
}
function characterData($parser, $data) {
echo "Data: " . $data . "<br />";
}
$xml_parser = xml_parser_create();
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, false);
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
if (!($fp = fopen("an_xml_file.xml","r"))) {
die("could not open XML input");
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d", xml_error_string(xml_get_error_code($xml_parser)), xml_get_current_line_number($xml_parser)));
}
}
</pre></code><p> </p>