C-like arrays in perl

User submissions are the sole responsibility of contributors, with TuteeHUB disclaiming liability for accuracy, copyrights, or consequences of use; content is for informational purposes only and not professional advice.

use PDL; use Benchmark qw/cmpthese/; my $N = 1_000_000; my @perl = (0 .. $N - 1); my $pdl = zeroes $N; cmpthese(-1,{ perl => sub{ $perl[int(rand($N))]++; }, pdl => sub{ # note that I'm not even incrementing here just setting to 1 $pdl->set(int(rand($N)), 1); } });

I cannot tell what sort of performance you will get, but I recommend using the vec function, documented here, to split a string into bit fields. I have experimented and found that my Perl will tolerate a string up to 500_000_000 characters long. which corresponds to 125,000,000 32-bit values.

my $data = "\0" x 500_000_000;
vec($data, 0, 32)++;            # Increment data[0]
vec($data, 100_000_000, 32)++;  # Increment data[100_000_000]

If this isn't enough there may be something in the build of Perl that controls the limit. Alternatively if you think you can get smaller fields - say 16-bit counts - vec will accept field widths of any power of 2 up to 32.

Edit: I believe the string size limit is related to the 2GB maximum private working set on 32-bit Windows processes. If you are running Linux or have a 64-bit perl you may be luckier than me.

I have added to your benchmark program like this

my $vec = "\0" x ($N * 4);

cmpthese(-3,{ 
    perl => sub{
        $perl[int(rand($N))]++;
    },
    pdl => sub{
        # note that I'm not even incrementing here just setting to 1
        $pdl->set(int(rand($N)), 1);
    },
    vec => sub {
        vec($vec, int(rand($N)), 32)++; 
    },
});

giving these results

          Rate  pdl  vec perl
pdl   472429/s   -- -76% -85%
vec  1993101/s 322%   -- -37%
perl 3157570/s 568%  58%   --

so using vec is two-thirds the speed of a native array. Presumably that's acceptable.

manpreet Best Answer 3 years ago

I want to create and manipulate large arrays of (4 byte) integers in-memory. By large, I mean on the order of hundreds of million. Each cell in the array will act as a counter for a position on a chromosome. All I need is for it to fit in memory, and have fast (O(1)) access to elements. The thing I'm counting is not a sparse feature, so I can't use a sparse array.

I can't do this with a regular perl list, because perl (at least on my machine) uses 64 bytes per element, so the genomes of most of the organisms I work with are just too big. I've tried storing the data on-disk via SQLite and hash tying, and though they work, are very slow, especially on ordinary drives. (It works reasonably ok when I run on 4-drive raid 0's).

I thought I could use PDL arrays, b/c PDL stores its arrays just as C does, using only 4 bytes per element. However, I found that update speed to be excruciatingly slow compared to perl lists:

Returns:

          Rate  pdl perl
pdl   481208/s   -- -87%
perl 3640889/s 657%   --

Does anyone know how to increase pdl set() performance, or know of a different module that can accomplish this?

0 views

0 shares

manpreet 3 years ago

0 views 0 shares

Popular Categories

C-like arrays in perl

Manpreet Singh

Answers (2)

manpreet Best Answer 3 years ago

manpreet 3 years ago

Similar Forum

Which operating system you favour and why?

What are the most popular tech portals in India?

What are best technologies available today for education / aiding learning?

Explore Other Libraries

Online Exams

Question Bank

Career News

Feeds

Full Forms

Dictionary

Interview Question

Gigs

Quotes

Lyrics

Videos

Courses

Blogs

Tutorials

Forum

Educators

Corporates

Tools

Related Searches

Important General Tech Links

Join Our Community Today