Introduction to Perl

元阳荣

2023-12-01

(the idea is to cover all basic topics with GPT, double check, trim/pad the content better for reading; suitable for beginners familiar with C, Python and Bash.)

Perl is a high-level, general-purpose programming language that is commonly used for web development, system administration, network programming, and more. Here are some basic Perl grammar and examples to get you started:

==> the article is structured along key concepts and method; pls pick up the various keywords and grammar details from examples given.

==> Perl is C-like in parsing structure, Python-like in syntax and data structure, and Bash-like in variable and string.

Variables

Perl has three types of variables: scalars, arrays, and hashes. Scalars are single values, arrays are ordered lists of values, and hashes are unordered sets of key/value pairs ==> python dict.

# Scalar variable
my $name = "Alice";

# Array variable
my @numbers = (1, 2, 3, 4, 5);

# Hash variable
my %ages = ("Alice" => 25, "Bob" => 30, "Charlie" => 35);

In Perl, the `my` keyword is used to declare a lexically scoped variable. When you use `my` to declare a variable, it is only accessible within the block of code in which it was declared. This is useful for preventing naming collisions and for keeping your code organized.

Here's an example of how to use `my` to declare a variable in Perl:
sub my_subroutine {
  my $my_variable = "This variable is only accessible within this subroutine.";
  # do something with $my_variable
}

Naming rules:

In Perl, variable names must begin with a letter or underscore character, and may be followed by any combination of letters, digits, or underscore characters. Perl variable names are case-sensitive, meaning that $foo, $Foo, and $FOO are all different variables.

Here are some examples of valid Perl variable names:

```
$counter
$_total
$firstName
$_
```

And here are some examples of invalid Perl variable names:

```
$1stName # Cannot begin with a digit
$last-name # Hyphen is not allowed
$my var # Space is not allowed
```

Also, $Var is not @Var, not %Var ==> Perl keeps independent namespaces for each variable type

Data Types

Numerical

Interger:

hex ==> 0x

oct ==> 0 ("zero"), e.g. 057 = 47

bin ==> 0b

String:

'xxx' for raw string, except for $str2 = 'don\'t', where the apostrophe must still be escaped

"xxx" for interpreted ones

@Array

In Perl, the array functions alot like numpy arrays; there are several ways to access array elements. Here are some common methods:

1. Indexing: You can access an array element by its index using the square bracket notation. The index starts at 0 for the first element of the array. For example:

my @array = (1, 2, 3, 4, 5);
#both '$' and '@' are fine
print $array[0]; # prints 1
print @array[2]; # prints 3

you can also "gather/scatter" by discrete indexing:

@array[0, 2, 4] = [8, 9, 10]; #or (8, 9, 10), 
                                #both [] and () can create an anonymous array.

exchange values at different indices by gather/scatter:

@array[0, 2, 4] = $array[1, 4, 3];

2. Slicing: You can access a range of elements in an array using the range operator `..` inside the square brackets. For example:

my @array = (1, 2, 3, 4, 5);
my @slice = @array[1..3]; # slice elements 2 to 4
print "@slice"; # prints "2 3 4"

3. Iteration: You can iterate over all elements of an array using a loop. For example:

my @array = (1, 2, 3, 4, 5);
foreach my $element (@array) {
    print "$element ";
} # prints "1 2 3 4 5 "

4. Built-in functions: Perl provides several built-in functions to manipulate arrays. Some of the most commonly used functions are `push`, `pop`, `shift`, `unshift`, `splice`, `join`, `sort`, `reverse`, and `grep`. For example:

my @array = (1, 2, 3, 4, 5);
push @array, 6; # add element at the end
pop @array; # remove last element
shift @array; # remove first element
unshift @array, 0; # add element at the beginning
splice @array, 2, 2; # remove 2 elements starting at index 2
my $string = join "-", @array; # join elements with "-"
my @sorted = sort @array; # sort elements
my @reversed = reverse @array; # reverse elements
my @even = grep { $_ % 2 == 0 } @array; # filter even elements

%Hash, or Associative Array

# Create a hash
my %hash = (
    "key1" => "value1",
    "key2" => "value2",
    "key3" => "value3"
);
#or simply, in an old-schooled manner, treat it as an array
%ARRAY = ("key1", "value1", "key2", "value2");

# Access a value in the hash
print $hash{"key1"}; # Output: value1

# Assignment
$hash{"key2"} = "new_value2";

Originally, a "hash" was called an "associative array", but this term is a bit outdated (people just got sick and tired of using seven syllables). Although it isn't intuitive for newcomers to programming, "hash" is now the preferred term. The name is derived from the computer science term, hashtable. (from wikibook)

In Perl, you can access a hash using several methods:

1. Using the hash key: You can access a value in a hash by specifying the key in curly braces.

2. Using the keys function: The `keys` function returns a list of all the keys in a hash, which you can then use to access the values. For example:

my %hash = ('key1' => 'value1', 'key2' => 'value2');
my @keys = keys %hash;
foreach my $key (@keys) {
    print "$key: $hash{$key}\n";
}

This will print:
key1: value1
key2: value2

3. Using the values function: The `values` function returns a list of all the values in a hash.

my %hash = ('key1' => 'value1', 'key2' => 'value2');
my @values = values %hash;
foreach my $value (@values) {
    print "$value\n";
}

This will print:
value1
value2

4. Using the each function: The `each` function returns the key-value pair for the next element in a hash. You can use this function in a loop to iterate over all the key-value pairs in a hash. For example:

my %hash = ('key1' => 'value1', 'key2' => 'value2');
while (my ($key, $value) = each %hash) {
    print "$key: $value\n";
}

This will print:
key1: value1
key2: value2

5. delete an entry:

delete $hash{'key2'};    #to reset, just set the value to ''

Operators

Basic Operators

==> the arithmetical, logical and bitwise operators are the same as in C or Python, hence skipped; the precedence is the same as C.

==> perl has condtional operator if?then:else

==> ',' is the same as in C, where expr1, expr2, expr3, are executed in the order of 1,2,3

Assignment Operator

The basic assignment operator is = that sets the value on the left side to be equal to the value on the right side. It also returns the value. Thus you can do things like $a = 5 + ($b = 6), which will set $b to a value of 6 and $a to a value of 11 (5 + 6).

there are some implicit assignment rules concerning @Arrays:

1. when an array is assigned to a scalar, it is inferred that the length of the array is assigned:

@arr = (1, 2, 3);
$a = @arr;     #$a == 3

2. "$#" returns the last index of the array

$b = $#arr;    #$b == 2

3. asymmetric mapping: when left_arr is shorter than the right_arr, the first few match terms would be assigned

($c) = @arr;    #$c == 1

4. hash to array

@associative_arr = %hash;

String Operators

Concatenation: .
Repetition: x

my $str1 = "Hello"; 
my $str2 = "World"; 

print $str1 . " " . $str2 . "\n";    # Output: Hello World 
print $str1 x 3;                     # Output: HelloHelloHello

#concatenation assign: '.='
$str1 .= $str2;        #$str1 is now "HelloWorld"

3. Comparison Operators (eq, ne, lt, gt, le, ge) - These operators are used to compare two strings.

my $string1 = "Hello";
my $string2 = "World";
if ($string1 eq $string2) {
    print "Strings are equal";
} else {
    print "Strings are not equal";
}
# Output: Strings are not equal

4. Substring Operator (substr) - This operator is used to extract a substring from a string.

my $string = "Hello World";
my $result = substr($string, 0, 5);
print $result; # Output: Hello

5. Length Operator (length) - This operator is used to determine the length of a string.

my $string = "Hello World";
my $length = length($string);
print $length; # Output: 11

6. Regular Expression Operator (=~) - This operator is used to match a regular expression against a string. ==> more on pattern matching and RE later.

my $string = "Hello World";
if ($string =~ /World/) {
    print "Match found";
} else {
    print "Match not found";
}
# Output: Match found

7. chop(), used to "chop off" the last char, useful for getting rid of newlines when grabbing logs from terminals, e.g.

chop($a = <STDIN>);

Perl's chop and chomp functions can often be a source of confusion. Not only do they sound similar, they do similar things. Unfortunately, there is a critical difference—chop removes the last character of the string completely, while chomp only removes the last character if it is a newline.

Conditionals

Perl has several conditional statements, including `if`, `elsif`, `else`, `unless`, and `given/when`.

my $x = 10;

if ($x > 5) {
    print "x is greater than 5\n";
} elsif ($x == 5) {
    print "x is equal to 5\n";
} else {
    print "x is less than 5\n";
}

"given/when" is switch in C:

use feature 'switch';

my $num = 10;

given ($num) {
    when ($_ < 0) {
        print "Number is negative\n";
    }
    when ($_ > 0) {
        print "Number is positive\n";
    }
    default {
        print "Number is zero\n";
    }
}

'unless' is the complement to 'if'

my $name = "Alice";

unless ($name eq "Bob") {
    print "Hello $name\n";
}

if, unless, while, until can also be used in single-line conditionals as in:

expr if/unless, while/until cond.

print "Enter a number: ";
chomp($num = <STDIN>) while ($num !~ /^\d+$/);
print "$num is ", ($num % 2 == 0 ? "even" : "odd"), "\n";

In this example, the `while` loop is used in a one-line conditional statement to repeatedly prompt the user to enter a number until a valid number is entered (i.e., a number consisting of one or more digits).

Perl execute the conditional statement (`$num !~ /^\d+$/`).

Loops

Perl has several types of loops, including `for`, `while`, `until`, and `foreach`.

# for loop
for (my $i = 0; $i < 5; $i++) {
    print "$i\n";
}

#'do' can be used with both 'while' and 'until', for executing expr. at least once
# while loop
my $i = 0;
while ($i < 5) {
    print "$i\n";
    $i++;
}

#until is the opposite to while, terminate loop on cond. true
my $j = 0;
until ($j >= 5) {
    print "$j\n";
    $j++;
}

# foreach loop
my @numbers = (1, 2, 3, 4, 5);
foreach my $number (@numbers) {
    print "$number\n";
}

loop control: last, next, redo

"last" is "break" in C,

"next" is "continue" in C,

redo:

This statement is used to restart the current iteration of a loop. Cannot be used within scope of "do":. Here's an example:

for (my $i = 1; $i <= 10; $i++) {
    print "$i\n";
    if ($i == 5) {
        $i++; #without adding incr. here, $i++ is skipped, resulting in an endless loop
        redo; # Restart iteration
    }
}    

my $i = 0;
while ($i < 10) {
  $i++;
  if ($i == 5) {
    redo; # restart the loop block
  }
  print "The value of i is $i\n";
}    #skip 5

Subroutines

1. Defining and calling subroutines:

To define a subroutine in Perl, you use the `sub` keyword, followed by the name of the subroutine and its code block. Here's an example:

sub hello {
    print "Hello, world!\n";
}

#to call this subroutine
hello();
&hello();
do hello();

- Subroutine names in Perl are case-insensitive, so `hello` and `Hello` are the same subroutine.
- You can define subroutines anywhere in your code, even inside other subroutines. Subroutines defined multiple times will use the last definition.

2. Passing arguments into subroutines:

To pass arguments into a subroutine, you simply include them inside the parentheses when you call the subroutine.

!! the interface is completely mutable !!

Here's an example:

sub greet {
    my ($name) = @_;
    print "Hello, $name!\n";
}

greet("Alice");

Inside the subroutine, we use the `my` keyword to declare a new variable called `$name`; variables declared within subroutines are by default global (like Python).

Argument passed into the subroutine is stored in the special `@_` array.

my vs. local

In Perl, `my` and `local` are both used for declaring variables, but they have different scoping rules.

`my` declares a variable that is local to the block in which it is declared. This means that the variable is only accessible within that block, including any nested blocks. Once the block is exited, the variable goes out of scope and its value is lost. ==> so crucially the subroutine called by the block DOES NOT have access to the variable. Here is an example:
sub example {
    my $x = 10;
    if ($x == 10) {
        my $y = 20;
        print "x is $x, y is $y\n";
    }
    # $y is out of scope here
    print "x is $x\n";
}
`local` is used to temporarily override the value of a global variable within a specific scope. This means that the variable retains its value outside of the scope, but within the scope it has a new value, and subsequent subroutine called within the scope will have access to its "local" value. Here is an example:
$foo = 10;

sub example {
    local $foo = 20;
    print "foo is $foo\n";
}

example();  # prints "foo is 20"
print "foo is $foo\n";  # prints "foo is 10"
In this example, `$foo` is a global variable with a value of 10. Inside the `example` subroutine, `local` is used to temporarily set `$foo` to 20. This only affects the value of `$foo` within the `example` subroutine. Once the subroutine is exited, the value of `$foo` returns to 10.

- You can pass any number of arguments into a subroutine, and you can optionally use default values for those arguments.

- You can use references to pass complex data structures into subroutines and return them as well.

# Define a variable
my $num = 10;

# Pass the variable by reference to a function
my_function(\$num);

# Define the function
sub my_function {
    my $num_ref = shift;
    $$num_ref = 20; # Update the value of the original variable
}

print "$num\n";    #now auto newline for print in Perl, unlike python
In Perl, you can pass variables by reference using the backslash operator \. This creates a reference to the original variable, which can be passed to a function or used in other parts of your code.

In this example, we define a variable $num and pass it by reference to a function my_function. Inside the function, we use the shift function to get the first argument (from @__) passed to the function, which is the reference to the original variable. We then use the double dereference operator $$ to access the value of the original variable and update it to 20.

- Subroutines can modify their arguments or global variables, so be careful when using them. ==> pass by reference then.

3. Returning results from subroutines:

Use the `return` keyword. Here's an example:

sub square {
    my ($num) = @_;
    return $num * $num;    #without 'return' as the value of the last expr, 
                            #$num*$num will still be returned as the result.
}

my $result = square(5);
print "The result is $result\n";

- You can return any type of value from a subroutine, including scalars, arrays, and hashes.

Basic I/O Processing

from Perl - File I/O

Open/Close and Read/Write

Following is the syntax to open file.txt in read-only mode. Here less than < sign indicates that file has to be opend in read-only mode.

open(DATA, "<file.txt"); ==> returns 0 on failure
close (file_handle);

Here DATA is the file handle ==> could also be explicitly defined as a variable, which will be used to read the file. Here is the example, which will open a file and will print its content over the screen.

#!/usr/bin/perl

open(DATA, "<file.txt") or die "Couldn't open file file.txt, $!";

while(<DATA>) {
   print "$_";
}

Sr.No.	Entities & Definition
1	< or r Read Only Access ==> remember the direction as in C++ 'cout << "content"; '
2	> or w Creates, Writes, and Truncates ==> cin >> var
3	>> or a Writes, Appends, and Creates
4	+< or r+ Reads and Writes
5	+> or w+ Reads, Writes, Creates, and Truncates
6	+>> or a+ Reads, Writes, Appends, and Creates

`$_` and `$!`

are special variables that are used for specific purposes.

`$_` is the default variable in Perl. It is often referred to as the "topic" variable because it is used as the default argument in many Perl functions. When no other variable is specified, Perl assumes that you are referring to `$_`. For example, the `print` function without any arguments will print the contents of `$_`.

Here's an example:
$_ = "Hello, world!";
print; # Output: Hello, world!
In this example, we set the value of `$_` to "Hello, world!" and then use the `print` function without any arguments, which prints the contents of `$_`.
#!/usr/bin/perl

my $string = "Hello, world!";

if ($string =~ /world/) {
   print "Found 'world' in $_\n";
}
In this example, the script checks if the string "world" is present in the variable $string. If it is, it prints a message that includes $_. Since $_ is not explicitly defined in this code, it takes on the value of $string because it was the last variable used in a pattern match.

==> similarly in the file handling example above, $_ is inferred to be the last data processed, i.e. line content from the file.

`$!` is a special variable that contains the system error message. It is often used in conjunction with the `die` function to print error messages when a program encounters an error. For example:
open(my $file, "<", "non_existent_file.txt") or die "Failed to open file: $!";
In this example, we attempt to open a file that does not exist. If the `open` function fails, the `die` function will be called with the error message "Failed to open file: $!", where `$!` will be replaced with the system error message.

Note that the error message contained in `$!` is specific to the system being used, so it may vary depending on the operating system and other factors.

an read example with "<>"

open(my $fh, "<", "file.txt") or die "Can't open file: $!";
while (my $line = <$fh>) {
    chomp $line;
    print "$line\n";
}
close($fh);

write example with printf, obviously print also works

my $filename = "output.txt";
open(my $fh, '>', $filename) or die "Could not open file '$filename' $!";

my $name = "John";
my $age = 30;
my $height = 6.2;

printf $fh "Name: %s, Age: %d, Height: %.1f\n", $name, $age, $height;

close $fh;

the built-in file handlers:

STDIN,

STDOUT

STDERR

are rather self-explanatory

Check File Status

In Perl, you can use the stat function to retrieve information about a file, including its size, modification time, owner, permissions, and more. You can then use conditional statements, such as if, to check for various file properties.

use strict;
use warnings;

my $filename = 'example.txt';

# Check if the file exists
if (-e $filename) {
    print "File $filename exists\n";
} else {
    print "File $filename does not exist\n";
}

# Get the file status
my @status = stat($filename);

# Print the file size
my $size = $status[7];
print "File $filename has size $size bytes\n";

# other property tags:
if (-f "/path/to/file") {
    print "File is a regular file\n";
} else {
    print "File is not a regular file\n";
}

if (-d "/path/to/directory") {
    print "File is a directory\n";
} else {
    print "File is not a directory\n";
}

if (-r "/path/to/file") {
    print "File is readable\n";
} else {
    print "File is not readable\n";
}

if (-w "/path/to/file") {
    print "File is writable\n";
} else {
    print "File is not writable\n";
}

if (-x "/path/to/file") {
    print "File is executable\n";
} else {
    print "File is not executable\n";
}

Perl R.E.

is perhaps THE reason people use Perl, it is, however, left to the last, since RE takes time and practice to master.

see this awesome blog for introduction and cheatsheet for RE (albeit .Net based, not that different from Perl RE.)

Perl provides a rich set of regular expression (RE) operators that allow you to match and manipulate strings based on patterns. Here are some of the most commonly used Perl RE operators:

1. `=~`: This operator is used to match a string with a regular expression. The syntax is as follows:

$string =~ /pattern/;    #m// with m omitted here

This will return true if the pattern matches the string, and false otherwise.

2. `!~`: This operator is used to match a string against a regular expression, but it returns true if the pattern does not match the string. The syntax is as follows:

$string !~ /pattern/;

This will return true if the pattern does not match the string, and false otherwise.

3. `m//`: This is the matching operator, which is used to match a regular expression against a string. The syntax is as follows:

$string =~ m/pattern/modifiers;

The `m` can be replaced with any non-alphanumeric character, such as `#` or `!`. The `modifiers` are optional and can include `i` (case-insensitive), `g` (global ==> dose not matter here, but for s///, means replace every matched pattern), and `s` (single-line).

4. `s///`: This is the substitution operator, which is used to replace parts of a string that match a regular expression with a new string. The syntax is as follows:

$string =~ s/pattern/replacement/modifiers;

The `replacement` can include references to captured groups in the pattern using `$1`, `$2`, etc. The `modifiers` are the same as for the matching operator.

Substitution Operator Modifiers

from The Substitution Operator in Perl

Here is the list of all the modifiers used with substitution operator.

Sr.No Modifier & Description
1 i
Makes the match case insensitive.
2 m
Specifies that if the string has newline or carriage return characters, the ^ and $ operators will now match against a newline boundary, instead of a string boundary.
3 o
Evaluates the expression only once.
4 s
Allows use of . to match a newline character.
5 x
Allows you to use white space in the expression for clarity.
6 g
Replaces all occurrences of the found expression with the replacement text.
7 e
Evaluates the replacement as if it were a Perl statement, and uses its return value as the replacement text.

Sr.No	Modifier & Description
1	i Makes the match case insensitive.
2	m Specifies that if the string has newline or carriage return characters, the ^ and $ operators will now match against a newline boundary, instead of a string boundary.
3	o Evaluates the expression only once.
4	s Allows use of . to match a newline character.
5	x Allows you to use white space in the expression for clarity.
6	g Replaces all occurrences of the found expression with the replacement text.
7	e Evaluates the replacement as if it were a Perl statement, and uses its return value as the replacement text.

example for 'e' as a modifier
my $string = "Hello, World!";
$string =~ s/(\w+)/uc($1)/eg;
print $string;
In this example, the s/// operator is used to replace all occurrences of a word character with its uppercase equivalent. The e modifier is used to evaluate the expression uc($1) for each match, which converts the matched string to uppercase.

The output of the above code would be:

HELLO, WORLD!

($digit is the equivalent to \digit in general R.E.; here (\w+) denotes the 1st pattern, it is set to "Hello", then "World" due to the 'g' option, which enables golbal matching).

here is another:
my $string = "The quick brown fox jumps over the lazy dog.";
$string =~ s/(\w+)/length($1) > 4 ? "red" : "blue"/eg;
print $string;
the result is:

blue red red blue red blue blue blue blue

5. `tr///`: This is the transliteration operator, which is used to replace characters in a string based on a set of mappings. The syntax is as follows:

$string =~ tr/characters/mappings/;

The `characters` and `mappings` can be ranges of characters, such as `a-z` or `A-Z`. The `tr` operator does not use regular expressions.

6. `qr//`: This is the quote-like operator, which is used to create a regular expression object that can be used later in a matching or substitution operation. The syntax is as follows:

my $regex = qr/pattern/modifiers;
$string =~ $regex;

The `modifiers` are the same as for the matching operator.

Regular Expression Special Variables

from Perl - Special Variables

$digit	Contains the text matched by the corresponding set of parentheses in the last pattern matched. For example, $1 matches whatever was contained in the first set of parentheses in the previous regular expression.
$&	The string matched by the last successful pattern match.
$MATCH
$`	The string preceding whatever was matched by the last successful pattern match.
$PREMATCH
$'	The string following whatever was matched by the last successful pattern match.
$POSTMATCH
$+	The last bracket matched by the last search pattern. This is useful if you don't know which of a set of alternative patterns was matched. For example : /Version: (.)\|Revision: (.)/ && ($rev = $+);