Speak now
Please Wait Image Converting Into Text...
Embark on a journey of knowledge! Take the quiz and earn valuable credits.
Challenge yourself and boost your learning! Start the quiz now to earn credits.
Unlock your potential! Begin the quiz, answer questions, and accumulate credits along the way.
General Tech Learning Aids/Tools 2 years ago
Posted on 16 Aug 2022, this text provides information on Learning Aids/Tools related to General Tech. Please note that while accuracy is prioritized, the data presented might not be entirely correct or up-to-date. This information is offered for general knowledge and informational purposes only, and should not be considered as a substitute for professional advice.
Turn Your Knowledge into Earnings.
I want to create and manipulate large arrays of (4 byte) integers in-memory. By large, I mean on the order of hundreds of million. Each cell in the array will act as a counter for a position on a chromosome. All I need is for it to fit in memory, and have fast (O(1)) access to elements. The thing I'm counting is not a sparse feature, so I can't use a sparse array.
I can't do this with a regular perl list, because perl (at least on my machine) uses 64 bytes per element, so the genomes of most of the organisms I work with are just too big. I've tried storing the data on-disk via SQLite and hash tying, and though they work, are very slow, especially on ordinary drives. (It works reasonably ok when I run on 4-drive raid 0's).
I thought I could use PDL arrays, b/c PDL stores its arrays just as C does, using only 4 bytes per element. However, I found that update speed to be excruciatingly slow compared to perl lists:
use PDL; use Benchmark qw/cmpthese/; my $N = 1_000_000; my @perl = (0 .. $N - 1); my $pdl = zeroes $N; cmpthese(-1,{ perl => sub{ $perl[int(rand($N))]++; }, pdl => sub{ # note that I'm not even incrementing here just setting to 1 $pdl->set(int(rand($N)), 1); } });
Returns:
Rate pdl perl pdl 481208/s -- -87% perl 3640889/s 657% --
Does anyone know how to increase pdl set() performance, or know of a different module that can accomplish this?
I cannot tell what sort of performance you will get, but I recommend using the vec function, documented here, to split a string into bit fields. I have experimented and found that my Perl will tolerate a string up to 500_000_000 characters long. which corresponds to 125,000,000 32-bit values.
vec
500_000_000
my $data = "\0" x 500_000_000; vec($data, 0, 32)++; # Increment data[0] vec($data, 100_000_000, 32)++; # Increment data[100_000_000]
If this isn't enough there may be something in the build of Perl that controls the limit. Alternatively if you think you can get smaller fields - say 16-bit counts - vec will accept field widths of any power of 2 up to 32.
Edit: I believe the string size limit is related to the 2GB maximum private working set on 32-bit Windows processes. If you are running Linux or have a 64-bit perl you may be luckier than me.
I have added to your benchmark program like this
my $vec = "\0" x ($N * 4); cmpthese(-3,{ perl => sub{ $perl[int(rand($N))]++; }, pdl => sub{ # note that I'm not even incrementing here just setting to 1 $pdl->set(int(rand($N)), 1); }, vec => sub { vec($vec, int(rand($N)), 32)++; }, });
giving these results
Rate pdl vec perl pdl 472429/s -- -76% -85% vec 1993101/s 322% -- -37% perl 3157570/s 568% 58% --
so using vec is two-thirds the speed of a native array. Presumably that's acceptable.
No matter what stage you're at in your education or career, TuteeHub will help you reach the next level that you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice sessions to improve your knowledge and scores.
General Tech 9 Answers
General Tech 7 Answers
General Tech 3 Answers
General Tech 2 Answers
Ready to take your education and career to the next level? Register today and join our growing community of learners and professionals.