问题：

当密钥是动态的时，在Perl中对哈希进行排序

容磊

2023-03-14

我的哈希如下：

my %data = (
    'B2' => {
        'one' => {
            timestamp => '00:12:30'
        },
        'two' => {
            timestamp => '00:09:30'
        }
    },
    'C3' => {
        'three' => {
            timestamp => '00:13:45'
        },
        'adam' => {
            timestamp => '00:09:30'
        }
    }
);

（结构实际上比这更复杂；我在这里将其简化。）

我希望对时间戳进行“全局”排序，然后对内部散列的键（一、二、三adam）进行排序。但是内部散列的键是动态的；在从文件中读取数据之前，我不知道它们将是什么。

我希望上述散列的排序输出为：

00:09:30,C3,adam
00:09:30,B2,two
00:12:30,B2,one
00:13:45,C3,three

我已经看了很多关于按键和/或值对哈希进行排序的问题/答案，但是当键名称事先不知道时，我还无法弄清楚。（或者也许我只是不理解它。）

我现在要做的是两个步骤。

将哈希平展为数组：

my @flattened;
for my $outer_key (keys %data) {
    for my $inner_key (keys %{$data{$outer_key}}) {
        push @flattened, [
            $data{$outer_key}{$inner_key}{timestamp}
            , $outer_key
            , $inner_key
        ];
    }
}

然后进行排序：

for my $ary (sort { $a->[0] cmp $b->[0] || $a->[2] cmp $b->[2] } @flattened) {
    print join ',' => @$ary;
    print "\n";
}

我想知道是否有更简洁、优雅、高效的方法？

巫健柏

2023-03-14

此类型问题可能更适合程序员堆栈交换站点或代码审查站点。既然是问的是执行问题，我认为在这里问这个问题很好。这些网站往往有一些重叠。

正如@DondiMichaelStroma所指出的，正如你已经知道的，你的代码工作得很好！然而，有不止一种方法可以做到这一点。对我来说，如果这是在一个小脚本中，我可能会保持原样，继续项目的下一部分。如果这是在一个更专业的代码库中，我会做一些更改。

对我来说，在编写专业代码库时，我会尽量记住一些事情。

可读性
效率至关重要
不镀金
单元测试

让我们来看看您的代码:

my %data = (
    'B2' => {
        'one' => {
            timestamp => '00:12:30'
        },
        'two' => {
            timestamp => '00:09:30'
        }
    },
    'C3' => {
        'three' => {
            timestamp => '00:13:45'
        },
        'adam' => {
            timestamp => '00:09:30'
        }
    }
);

my @flattened;
for my $outer_key (keys %data) {
    for my $inner_key (keys %{$data{$outer_key}}) {
        push @flattened, [
            $data{$outer_key}{$inner_key}{timestamp}
            , $outer_key
            , $inner_key
        ];
    }
}
for my $ary (sort { $a->[0] cmp $b->[0] || $a->[2] cmp $b->[2] } @flattened) {
    print join ',' => @$ary;
    print "\n";
}

变量名可以更具描述性，并且＜code＞扁平化C3和B2。

$VAR1 = [
          '00:13:45',
          'C3',
          'three'
        ];
$VAR2 = [
          '00:09:30',
          'C3',
          'adam'
        ];
$VAR3 = [
          '00:12:30',
          'B2',
          'one'
        ];
$VAR4 = [
          '00:09:30',
          'B2',
          'two'
        ];

也许这没什么大不了的，或者您希望将获取所有数据的功能保留在键B2下。

这是我们存储数据的另一种方式:

my %flattened = (
    'B2' => [['one', '00:12:30'],
             ['two', '00:09:30']],
    'C3' => [['three','00:13:45'],
             ['adam', '00:09:30']]
);

这可能会使排序更复杂，但会使数据结构更简单！也许这离镀金越来越近了，或者您可能会在代码的另一部分受益于这种数据结构。我的偏好是保持数据结构简单，并在处理它们时添加额外的代码。如果您决定需要将＜code＞%flatted＜/code＞转储到日志文件，您可能会希望看到重复数据。

设计:我认为我们希望保持这是两个不同的操作。这将有助于代码清晰，我们可以单独测试每个功能。第一个函数将在我们想要使用的数据格式之间进行转换，第二个函数将对数据进行排序。这些函数应该在Perl模块中，我们可以使用Test::More来进行单元测试。我不知道我们从哪里调用这些函数，所以让我们假设我们从< code>main.pl调用它们，我们可以将这些函数放在一个名为< code>Helper.pm的模块中。这些名称应该更具描述性，但是我也不确定这里的应用程序是什么！伟大的名字导致可读的代码。

这就是main.pl的样子。即使没有注释，描述性名称也可以使其自我记录。这些名称也可以改进！

#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
use Utilities::Helper qw(sort_by_times_then_names convert_to_simple_format);

my %data = populate_data();

my @sorted_data = @{ sort_by_times_then_names( convert_to_simple_format( \%data ) ) };

print Dumper(@sorted_data);

这是可读的和优雅的吗？我认为它需要一些改进。在本模块中，更具描述性的变量名称也会有所帮助。然而，它很容易测试，并保持我们的主要代码干净和数据结构简单。

package Utilities::Helper;
use strict;
use warnings;

use Exporter qw(import);
our @EXPORT_OK = qw(sort_by_times_then_names convert_to_simple_format);

# We could put a comment here explaning the expected input and output formats.
sub sort_by_times_then_names {

    my ( $data_ref ) = @_;

    # Here we can use the Schwartzian Transform to sort it
    # Normally, we would just be sorting an array. But here we
    # are converting the hash into an array and then sorting it.
    # Maybe that should be broken up into two steps to make to more clear!
    #my @sorted = map  { $_ } we don't actually need this map
    my @sorted = sort {
                        $a->[2] cmp $b->[2] # sort by timestamp
                                 ||
                        $a->[1] cmp $b->[1] # then sort by name
                      }
                 map  { my $outer_key=$_;       # convert $data_ref to an array of arrays
                        map {                    # first element is the outer_key
                             [$outer_key, @{$_}] # second element is the name
                            }                    # third element is the timestamp
                            @{$data_ref->{$_}}
                      }
                      keys %{$data_ref};
    # If you want the elements in a different order in the array,
    # you could modify the above code or change it when you print it.
    return \@sorted;
}


# We could put a comment here explaining the expected input and output formats.
sub convert_to_simple_format {
    my ( $data_ref ) = @_;

    my %reformatted_data;

    # $outer_key and $inner_key could be renamed to more accurately describe what the data they are representing.
    # Are they names? IDs? Places? License plate numbers?
    # Maybe we want to keep it generic so this function can handle different kinds of data.
    # I still like the idea of using nested for loops for this logic, because it is clear and intuitive.
    for my $outer_key ( keys %{$data_ref} ) {
        for my $inner_key ( keys %{$data_ref->{$outer_key}} ) {
            push @{$reformatted_data{$outer_key}},
                 [$inner_key, $data_ref->{$outer_key}{$inner_key}{timestamp}];
        }
    }

    return \%reformatted_data;
}

1;

最后，让我们实现一些单元测试。这可能比你在这个问题上想要的要多，但是我认为干净的测试接缝是优雅代码的一部分，我想演示一下。测试::更多非常适合这个。我甚至会加入一个测试工具和格式化程序，这样我们就可以得到一些优雅的输出。如果你没有安装TAP::F或事件::JUnit，你可以使用TAP::F或事件::Console。

#!/usr/bin/env perl
use strict;
use warnings;
use TAP::Harness;

my $harness = TAP::Harness->new({
    formatter_class => 'TAP::Formatter::JUnit',
    merge           => 1,
    verbosity       => 1,
    normalize       => 1,
    color           => 1,
    timer           => 1,
});

$harness->runtests('t/helper.t');

#!/usr/bin/env perl
use strict;
use warnings;
use Test::More;
use Utilities::Helper qw(sort_by_times_then_names convert_to_simple_format);

my %data = (
    'B2' => {
        'one' => {
            timestamp => '00:12:30'
        },
        'two' => {
            timestamp => '00:09:30'
        }
    },
    'C3' => {
        'three' => {
            timestamp => '00:13:45'
        },
        'adam' => {
            timestamp => '00:09:30'
        }
    }
);

my %formatted_data = %{ convert_to_simple_format( \%data ) };

my %expected_formatted_data = (
    'B2' => [['one', '00:12:30'],
             ['two', '00:09:30']],
    'C3' => [['three','00:13:45'],
             ['adam', '00:09:30']]
);

is_deeply(\%formatted_data, \%expected_formatted_data, "convert_to_simple_format test");

my @sorted_data = @{ sort_by_times_then_names( \%formatted_data ) };

my @expected_sorted_data = ( ['C3','adam', '00:09:30'],
                             ['B2','two',  '00:09:30'],
                             ['B2','one',  '00:12:30'],
                             ['C3','thee','00:13:45'] #intentionally typo to demonstrate output
                            );

is_deeply(\@sorted_data, \@expected_sorted_data, "sort_by_times_then_names test");

done_testing;

这种测试方式的好处是，当测试失败时，它会告诉你哪里出了问题。

<testsuites>
  <testsuite failures="1"
             errors="1"
             time="0.0478239059448242"
             tests="2"
             name="helper_t">
    <testcase time="0.0452120304107666"
              name="1 - convert_to_simple_format test"></testcase>
    <testcase time="0.000266075134277344"
              name="2 - sort_by_times_then_names test">
      <failure type="TestFailed"
               message="not ok 2 - sort_by_times_then_names test"><![CDATA[not o
k 2 - sort_by_times_then_names test

#   Failed test 'sort_by_times_then_names test'
#   at t/helper.t line 45.
#     Structures begin differing at:
#          $got->[3][1] = 'three'
#     $expected->[3][1] = 'thee']]></failure>
    </testcase>
    <testcase time="0.00154280662536621" name="(teardown)" />
    <system-out><![CDATA[ok 1 - convert_to_simple_format test
not ok 2 - sort_by_times_then_names test

#   Failed test 'sort_by_times_then_names test'
#   at t/helper.t line 45.
#     Structures begin differing at:
#          $got->[3][1] = 'three'
#     $expected->[3][1] = 'thee'
1..2
]]></system-out>
    <system-err><![CDATA[Dubious, test returned 1 (wstat 256, 0x100)
]]></system-err>
    <error message="Dubious, test returned 1 (wstat 256, 0x100)" />
  </testsuite>
</testsuites>

总之，我更喜欢可读和清晰，而不是简洁。有时，您可以编写更容易编写、逻辑更简单的效率更低的代码。将丑陋的代码放入函数中是隐藏它的好方法！不值得在运行代码时浪费时间来节省15毫秒。如果您的数据集足够大，以至于性能成为一个问题，那么Perl可能不是该工作的合适工具。如果您真的在寻找一些简洁的代码，请在代码高尔夫堆栈交换上发布一个挑战。

当密钥是动态的时，在Perl中对哈希进行排序

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档