CPAN上的XML模塊可以分成三大類:對 XML 數據提供獨特的接口(通常有關在XML實例和Perl數據之間的轉換),實現某一標准XML API的模塊,和對一些特定的XML相關任務進行簡化的特殊用途模塊。這個月我們先關注第一個,XML Perl專用接口。
<?XML version="1.0"?>
<camelids>
<species name="Camelus dromedarius">
<common-name>Dromedary, or Arabian Camel</common-name>
<physical-characteristics>
<mass>300 to 690 kg.</mass>
<appearance>
The dromedary camel is characterized by a long-curved
neck, deep-narrow chest, and a single hump.
...
</appearance>
</physical-characteristics>
<natural-history>
<food-habits>
The dromedary camel is an herbivore.
...
</food-habits>
<reproduction>
The dromedary camel has a lifespan of about 40-50 years
...
</reproduction>
<behavior>
With the exception of rutting males, dromedaries show
very little aggressive behavior.
...
</behavior>
<habitat>
The camels prefer desert conditions characterized by a
long dry season and a short rainy season.
...
</habitat>
</natural-history>
<conservation status="no special status">
<detail>
Since the dromedary camel is domesticated, the camel has
no special status in conservation.
</detail>
</conservation>
</specIEs>
...
</camelids>
現在我們假設此完整文檔(可從本月例子代碼中獲取)包含駱駝家族所有成員的全部信息,而不僅僅是上面的單峰駱駝信息。為了舉例說明每一模塊是如何從此文件中提取某一數據子集,我們將寫一個很簡短的腳本來處理camelids.XML文檔和在STDOUT上輸出我們找到的每一種類的普通名(common-name),拉丁名(用括號包起來),和當前保存狀況。因此,處理完整個文檔,每一個腳本的輸出應該為如下結果: Bactrian Camel (Camelus bactrianus) endangered Dromedary, or Arabian Camel (Camelus dromedarius) no special status Llama (Lama glama) no special status Guanaco (Lama guanicoe) special concern Vicuna (Vicugna vicugna) endangered
Hash 如下:
my %camelid_links = ( one => { url => ' http://www.online.discovery.com/news/picture/may99/photo20.html', description => 'Bactrian Camel in front of Great ' . 'Pyramids in Giza, Egypt.'}, two => { url => 'http://www.fotos-online.de/english/m/09/9532.htm', description => 'Dromedary Camel illustrates the ' . 'importance of Accessorizing.'}, three => { url => 'http://www.eskimo.com/~wallama/funny.htm', description => 'CharlIE - biography of a narcissistic llama.'}, four => { url => 'http://arrow.colorado.edu/travels/other/turkey.Html', description => 'A visual metaphor for the perl5-porters ' . 'list?'}, five => { url => 'http://www.galaonline.org/pics.htm', description => 'Many cool alpacas.'}, six => { url => 'http://www.thpf.de/suedamerikareise/galerIE/vicunas.htm', description => 'Wild Vicunas in a scenic landscape.'} );而我們所期望從hash中創建的文檔例子為:
<?XML version="1.0">
<html>
<body>
<a href="http://www.eskimo.com/~wallama/funny.htm">CharlIE -
biography of a narcissistic llama.</a>
<a href="http://www.online.discovery.com/news/picture/may99/photo20.html">Bactrian
Camel in front of Great Pyramids in Giza, Egypt.</a>
<a href="http://www.fotos-online.de/english/m/09/9532.htm">Dromedary
Camel illustrates the importance of Accessorizing.</a>
<a href="http://www.galaonline.org/pics.htm">Many cool alpacas.</a>
<a href="http://arrow.colorado.edu/travels/other/turkey.html">A visual
metaphor for the perl5-porters list?</a>
<a href="http://www.thpf.de/suedamerikareise/galerIE/vicunas.htm">Wild
Vicunas in a scenic landscape.</a>
</body>
</Html>
良好縮進的XML結果文件(如上面所顯示的)對於閱讀很重要,但這種良好的空格處理不是我們案例所要求的。我們所關心的是結果文檔是結構良好的/well-formed和它正確地表現了hash裡的數據。
任務定義完畢,接下來該是代碼例子的時候了。
use XML::Simple; my $file = 'files/camelids.xml'; my $xs1 = XML::Simple->new(); my $doc = $xs1->XMLin($file); foreach my $key (keys (%{$doc->{species}})){ print $doc->{species}->{$key}->{'common-name'} . ' (' . $key . ') '; print $doc->{specIEs}->{$key}->{conservation}->final . "\n"; }
use XML::Simple; require "files/camelid_links.pl"; my %camelid_links = get_camelid_data(); my $xsimple = XML::Simple->new(); print $xsimple->XMLout(\%camelid_links, noattr => 1, XMLdecl => '');這數據到文檔的任務的條件要求暴露了XML::Simple的一個弱點:它沒有允許我們決定hash裡的哪個key應該作為元素返回和哪個key該作為屬性返回。上面例子的輸出雖然接近我們的輸出要求但還遠遠不夠。對於那些更喜歡將XML文檔內容直接作為Perl數據結構操作,而且需要在輸出方面做更細微控制的案例,XML::Simple和XML::Writer配合得很好。
如下例子說明了如何使用XML::Write來符合我們的輸出要求。
use XML::Writer; require "files/camelid_links.pl"; my %camelid_links = get_camelid_data(); my $writer = XML::Writer->new(); $writer->XMLDecl(); $writer->startTag('html'); $writer->startTag('body'); foreach my $item ( keys (%camelid_links) ) { $writer->startTag('a', 'href' => $camelid_links{$item}->{url}); $writer->characters($camelid_links{$item}->{description}); $writer->endTag('a'); } $writer->endTag('body'); $writer->endTag('Html'); $writer->end();
use XML::Parser; use XML::SimpleObject; my $file = 'files/camelids.xml'; my $parser = XML::Parser->new(ErrorContext => 2, Style => "Tree"); my $xso = XML::SimpleObject->new( $parser->parsefile($file) ); foreach my $species ($xso->child('camelids')->children('species')) { print $species->child('common-name')->{VALUE}; print ' (' . $species->attribute('name') . ') '; print $specIEs->child('conservation')->attribute('status'); print "\n"; }
use XML::TreeBuilder; my $file = 'files/camelids.xml'; my $tree = XML::TreeBuilder->new(); $tree->parse_file($file); foreach my $species ($tree->find_by_tag_name('species')){ print $species->find_by_tag_name('common-name')->as_text; print ' (' . $species->attr_get_i('name') . ') '; print $specIEs->find_by_tag_name('conservation')->attr_get_i('status'); print "\n"; }
use XML::Element; require "files/camelid_links.pl"; my %camelid_links = get_camelid_data(); my $root = XML::Element->new('Html'); my $body = XML::Element->new('body'); my $xml_pi = XML::Element->new('~pi', text => 'xml version="1.0"'); $root->push_content($body); foreach my $item ( keys (%camelid_links) ) { my $link = XML::Element->new('a', 'href' => $camelid_links{$item}->{url}); $link->push_content($camelid_links{$item}->{description}); $body->push_content($link); } print $xml_pi->as_XML; print $root->as_XML();
use XML::Twig; my $file = 'files/camelids.xml'; my $twig = XML::Twig->new(); $twig->parsefile($file); my $root = $twig->root; foreach my $species ($root->children('species')){ print $species->first_child_text('common-name'); print ' (' . $species->att('name') . ') '; print $specIEs->first_child('conservation')->att('status'); print "\n"; }
use XML::Twig;
require "files/camelid_links.pl";
my %camelid_links = get_camelid_data();
my $root = XML::Twig::Elt->new('Html');
my $body = XML::Twig::Elt->new('body');
$body->paste($root);
foreach my $item ( keys (%camelid_links) ) {
my $link = XML::Twig::Elt->new('a');
$link->set_att('href', $camelid_links{$item}->{url});
$link->set_text($camelid_links{$item}->{description});
$link->paste('last_child', $body);
}
print QQ|<?XML version="1.0"?>|;
$root->print;
這些例子舉例說明了這些普通XML Perl模塊的基本使用方法。我的目標是提供足夠多的例子讓你感受怎麼用每個模塊寫代碼。下個月我們將關注“實現某一標准XML API的模塊”,特別說明的,XML::DOM, XML::XPath 和其他大量的 SAX 和類SAX模塊。