Perl Project Improvement(3)Perl and XML, IDE and Regex and SQS
任飞鸣
2023-12-01
Perl Project Improvement(3)Perl and XML, IDE and Regex and SQS
1 XML PERL Operation
Search for <referencenumber> and ignore if there is <![CDATA[ ]], generate the contents and put into one file
> time perl -ne 'if (/referencenumber/){ s/<!\[CDATA\[//; s/]]>//; s/.*?>//; s/<.*//; print;}' 1052.xml > referencenumber.xml
real 0m7.773s
user 0m7.084s
sys 0m0.564s
time is just a measure tool for how much time it used to execute the command. It only take 7 seconds to search that in a 2G xml files.
749,999 lines, 749,999 words and 24,525,055 characters.
> wc referecenumber.xml
749999 749999 24625055 referecenumber.xml
Another command
> grep 'referencenumber' /data/12001.xml | awk -F"</?referencenumber>" '{ print $2}'
> time grep 'referencenumber' /data/1052.xml | awk -F"</?referencenumber>" '{ print $2}' > /data/referencenumber.xml
real 0m44.050s
user 0m45.492s
sys 0m0.809s
2 Env IDE Setting Up
Plugin for Perl on Eclipse
http://www.epic-ide.org/download.php
Download a small Eclipse only for Java
http://www.eclipse.org/downloads/
Set Up the Plugin
http://www.epic-ide.org/running_perl_scripts_within_eclipse/eclipse-runperl-figure4.png
Once I have the latest JAVA only Eclipse there, I will add the Perl Plugin
http://www.epic-ide.org/updates/testing
After install that, we can set up the eclipse Preference with Perl
Perl executatble “/Users/carl/tool/perl-5.16.3/bin/perl"
The select the Project Properties, setting these things:
Perl Include Path —> Add to List ${project_loc}
Set up the Unit tests
[Run] -> [External Tools]->[External Tools Configurations]->[Program] -> New
RunAllTest
- Location: /Users/carl/tool/perl-5.16.3/prove
- Working Directory: ${workspace_loc}:/jobs-producer-perl}
- Arguments: ${build_files:t/*}
SingleTest
- Location: /Users/carl/tool/perl-5.16.3/perl
- Working Directory: ${workspace_loc}:/jobs-producer-perl}
- Arguments: t/NumberUtil.t
PerlApp
- Location: /Users/carl/tool/perl-5.16.3/perl
- Working Directory: ${workspace_loc}:/jobs-producer-perl}
- Arguments: JobProducerApp.pl
3 Perl Regex and Command Supporting
Perl, method to generate the difference files.
sub generateReferenceNumbers {
die "Wrong arguments" if @_ != 2;
#serivces
my $logger = &loadLogger();
my $hugeFileName = $_[0];
my $source_id = $_[1];
#prepare 2 arrays
my @redisArray = ();
my @xmlArray;
#big File location should be from parameters
#output reference number file should be in the same directory
my $bigFile = "/data/1052.xml";
my $referencenumberFile = "/data/referencenumber.xml";
#command to regex the reference numbers
`perl -ne "if (/referencenumber/){ s/<referencenumber>//; s/<\\/referencenumber>//; s/<!\\[CDATA\\[//; s/]]>//; s/\\s*\\t*//; print; }" $bigFile > $referencenumberFile`;
#read and trim the reference numbers from file to array
open(my $fileHandler, "<", $referencenumberFile) or die "Failed to open file: $!\n";
while(<$fileHandler>) {
chomp;
push @xmlArray, $_;
}
close $fileHandler;
#find the differences
my @differencesArray = lib::CollectionUtil::differenceInArrays(\@xmlArray,\@redisArray);
#logging and testing the difference
#$logger->info("the difference array = @differencesArray");
#my $first = $differencesArray[0];
#$logger->info("===$first==");
#output the difference to XML and send 2 next steps
}
The most important part is this:
`perl -ne "if (/referencenumber/){ s/<referencenumber>//; s/<\\/referencenumber>//; s/<!\\[CDATA\\[//; s/]]>//; s/\\s*\\t*//; print; }" $bigFile > $referencenumberFile`
-ne means we can put regex there to find the match.
s/<referencenumber>// means once we find the match, replace <referencenumber> to empty ‘’| //
s/<\\/referencenumber>// means replace </referencenumber> to empty
s/<!\\[CDATA\\[>// means replace <![CDATA[> to empty
s/]]>// means replace ]]> to empty
s/\\s*\\t*// means replace all the blank, tap characters to empty
Read lines of the file and push to array
open(my $fileHandler, "<", $referencenumberFile) or die "Failed to open file: $!\n";
while(<$fileHandler>) {
chomp;
push @xmlArray, $_;
}
close $fileHandler;
3 SQS
http://search.cpan.org/~penfold/Amazon-SQS-Simple-2.04/lib/Amazon/SQS/Simple.pm
http://search.cpan.org/~penfold/Amazon-SQS-Simple-2.04/
> cpan -fi Amazon::SQS::Simple
Error Message:
ERROR [try ]: On calling SendMessage: 501 Protocol scheme 'https' is not supported (LWP::Protocol::https not installed) at lib/QueueClientHandler.pm line 39.
Solution:
> cpan -fi LWP::Protocol::https
Error Message:
t/QueueClientHandler.t (Wstat: 0 Tests: 2 Failed: 0)
Parse errors: No plan found in TAP output
Solution:
Change to logging the output, not print
Some core Classes, QueueClientHandler.pm
use strict;
use warnings;
use lib::CollectionUtil;
use Amazon::SQS::Simple;
package lib::QueueClientHandler;
sub init {
my $configService = &loadService('configService');
my $logger = &loadLogger();
$logger->debug("init SQS connection-----");
$logger->debug("--------------------------");
my $access_key = 'AKIAIMxxxxxxx'; # Your AWS Access Key ID
my $secret_key = 'BIr5Xlu1xxxxxxxx'; # Your AWS Secret Key
my $register = IOC::Registry->instance();
my $container = $register->getRegisteredContainer('JobsProducer');
my $queueClient = new Amazon::SQS::Simple($access_key, $secret_key);
$container->register(IOC::Service->new('queueService'
=> sub { $queueClient }));
return 1;
}
sub sendMessage(){
my $queueService = &loadService('queueService');
my $endpoint = 'https://sqs.us-east-1.amazonaws.com/216323611345/stage-tasks';
my $taskQueue = $queueService->GetQueue($endpoint);
my $response = $taskQueue->SendMessage('Hello world!');
}
sub fetchMessage(){
# Retrieve a message
my $queueService = &loadService('queueService');
my $logger = &loadLogger();
my $endpoint = 'https://sqs.us-east-1.amazonaws.com/216323611345/stage-tasks';
my $taskQueue = $queueService->GetQueue($endpoint);
my $msg = $taskQueue->ReceiveMessage();
#$msg->MessageBody
#print $msg->MessageBody() ;
if($msg){
$logger->info("Message I get is = ". $msg->MessageBody());
# Delete the message
$taskQueue->DeleteMessage($msg->ReceiptHandle);
}
}
sub loadService {
#check parameters
die "Wrong arguments" if @_ != 1;
my $serviceName = $_[0];
my $register = IOC::Registry->instance();
my $service = $register->searchForService($serviceName)
|| die "Failt to find the service name = " . $serviceName . " in RedisClientHandler.";
return $service;
}
sub loadLogger {
my $logger = Log::Log4perl::get_logger("lib::RedisClientHandler");
return $logger;
}
1;
__END__
Test Class to Send the Messages, QueueClientHandler.t
use strict;
use warnings;
use Test::More qw(no_plan);
use Log::Log4perl::Level;
use Log::Log4perl qw(:easy);
use YAML::XS qw(LoadFile);
use Data::Dumper;
use Cwd;
use IOC;
# Verify module can be included via "use" pragma
BEGIN { use_ok('lib::QueueClientHandler') };
# Verify module can be included via "require" pragma
require_ok( 'lib::QueueClientHandler' );
#init the test class
#logging
Log::Log4perl->init(cwd() ."/conf/log4perl-test.conf");
our $logger = Log::Log4perl::get_logger("JobsProducer");
#load configuration
my $config = LoadFile(cwd() .'/conf/config.yaml');
$logger->debug("----init configuration --------");
$logger->debug(Dumper($config));
$logger->debug("-------------------------------");
my $container = IOC::Container->new('JobsProducer');
$container->register(IOC::Service->new('configService'
=> sub { $config } ));
my $register = IOC::Registry->new();
$register->registerContainer($container);
# Test the Init Operation
lib::QueueClientHandler::init();
lib::QueueClientHandler::sendMessage();
#lib::QueueClientHandler::fetchMessage();
Consumer Pulling the Messages, TaskConsumerApp.pl
# import advertiser job feeds
#
# usage: $0 stop stop after current batch
# $0 start import loop
use strict;
use warnings;
use IOC;
use Log::Log4perl::Level;
use Log::Log4perl qw(:easy);
use YAML::XS qw(LoadFile);
use Data::Dumper;
use lib::MysqlDAOHandler;
use lib::RedisClientHandler;
use lib::FeedFileHandler;
use lib::JobImportHandler;
use lib::StringUtil;
use lib::NumberUtil;
use lib::QueueClientHandler;
use threads;
use threads::shared;
use Time::Piece;
use Cwd;
use constant FLAG_PID => 'JOBS_PRODUCER_RUNNING';
my $runningEnv = $ENV{'RUNNING_ENV'};
#logging
Log::Log4perl->init(cwd() . "/conf/log4perl-${runningEnv}.conf");
my $logger = Log::Log4perl::get_logger("JobsProducer");
#IOC
my $container = IOC::Container->new('JobsProducer');
my $register = IOC::Registry->new();
$register->registerContainer($container);
#configuration
my $config = LoadFile(cwd() . "/conf/config-${runningEnv}.yaml");
$logger->debug("----init configuration --------");
$logger->debug(Dumper($config));
$logger->debug("-------------------------------");
$container->register(IOC::Service->new('configService'
=> sub { $config } ));
#receive params
my $pidFileName = $config->{pidFilePath} . FLAG_PID;
# data file path
my $dataFilePath = $config->{dataFilePath};
# php script path
my $phpScriptPath = $config->{phpScriptPath};
#my $MAX_SPLIT_SIZE = 100_000_000; #max split file size
my $MAX_SPLIT_SIZE = $config->{maxSplitFileSize};
if (@ARGV == 1) {
if ($ARGV[0] eq 'stop') {
system 'touch ' . $pidFileName;
$logger->info("Application is stopping.");
}
$logger->info("Application is running on $runningEnv\n");
} else{
print "Usage: $0 start/stop";
exit 1;
}
unlink $pidFileName;
#init database connection
lib::MysqlDAOHandler::init();
#init redis connection
lib::RedisClientHandler::init();
#init queue connection
lib::QueueClientHandler::init();
#main thread pulling from mysql
#multiple thread downloading the file
#single thread split the file
#multiple threads execute the php import
##################################################################
# Main Processor
##################################################################
$logger->info("Start the Main thread.");
while (!-f $pidFileName) {
#keep running in main thread
$logger->info("Main-Thread - Scanning for tasks");
lib::QueueClientHandler::fetchMessage();
sleep 15;
}
$logger->info("Main-Thread - JobsProducerApp stop running.");
__END__
References:
http://sillycat.iteye.com/blog/2304196
http://sillycat.iteye.com/blog/2304197