Popular Categories

for loops to grep multiple texts from parent file to multiple files in single command

4.4K 2 0 0 0

Manpreet Singh

Previous Next

Posted on 16 Aug 2022, this text provides information on Bugs & Fixes related to General Tech. Please note that while accuracy is prioritized, the data presented might not be entirely correct or up-to-date. This information is offered for general knowledge and informational purposes only, and should not be considered as a substitute for professional advice.

Answers (2)

Post Answer

manpreet Best Answer 2 years ago

I have 29 fasta files (.fa as extension) named and stored sequences according to their genes.

(Example: ribosomal protein L1, ribosomal protein L6P/L9E,...)

There were a total of 722 species existed among all those 29 fasta files. Each sequence has their genes and species name labelled in the first line and second line was filled with its sequence.

There will be more than 1 gene sequence for 1 species.

I want to transfer the 722 species from 29 fasta files sorted according to genes into separate 722 files (sort them under species instead of genes).

The name of the species in the parent file were enclosed by square brakets [ ].

How can I use for loops to extract the 722 files and name the files according to its sequence name?

Example from Ribosomal Protein L1.fa:

>gi|103486926|ref|YP_616487.1| 50S ribosomal protein L1 [Sphingopyxis alaskensis RB2256]
MAKLTKKQKALEGKVDAQKLHGVDEAIKLVRELATAKFDETLEIAMNLGVDPRHADQMVRGVVTLPAGTGKDVKVAVFAR

Example from Ribosomal Protein L6PL9E.fa:

>gi|410479108|ref|YP_006766745.1| ribosomal protein L6P/L9E [Leptospirillum ferriphilum ML-04]
MGFTHTVEFTLPSLIKASIEKQTIITLSSPDKELLGQFAADVRSIRPPEPYKGKGIKYSGEKILRKEGKTGKK

For the first example,

Species name: Sphingopyxis alaskensis RB2256

Gene sequence: MAKLTKKQKALEGKVDAQKLHGVDEAIKLVRELATAKFDETLEIAMNLGVDPRHADQMVRGVVTLPAGTGKDVKVAVFA

I wanted to name the file as Sphingopyxis alaskensis RB2256.fa and insert all sequences with this species name into this file.

I am using bash shell to do this. I can use grep to have things done:

grep -A+1 "Sphingopyxis alaskensis RB2256" *.fa >> Sphingopyxis alaskensis RB2256.fa

But I will need to do it 722 times to get my sequences sorted according to species.

Is grep in for loops can be used to simplify the work? Or there are alternative methods to do so?

0 views

0 shares

$userId = is_array($answer) ? ($answer['user_id'] ?? null) : ($answer->user_id ?? null); $commentuser = getUserWithId($userId);

manpreet 2 years ago

The Fasta format doesn't require that all sequences be on a single line. In fact, that isn't even common, since most biological sequences are long. So your grep will fail in any cases where there's more than one line of sequence for the ID. Also, your grep command will create a file called Sphingopyxis and not a file called Sphingopyxis alaskensis RB2256.fa.

In any case, you can do something like this to get each sequence into a file names after the species:

awk -F'[][]' '/>/{n=$2}; {print >> n".fa"}' *.fa

However, I strongly urge you not to use spaces in your file names since that will only make your life harder. A safer approach would be:

awk -F'[][]' '/>/{n=$2; gsub(/ /,"_",n)}; {print >> n".fa"}' *.fa

The gsub replaces all spaces in the species name with _, resulting in these files:

Leptospirillum_ferriphilum_ML-04.fa  Sphingopyxis_alaskensis_RB2256.fa

Note that both approaches above can deal with multi-line sequences.

0 views 0 shares

No matter what stage you're at in your education or career, TuteeHUB will help you reach the next level that you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice sessions to improve your knowledge and scores.

Popular Categories

for loops to grep multiple texts from parent file to multiple files in single command

Manpreet Singh

Answers (2)

manpreet Best Answer 2 years ago

manpreet 2 years ago

Similar Forum

Which operating system you favour and why?

What are the most popular tech portals in India?

What are best technologies available today for education / aiding learning?

Explore Other Libraries

Online Exams

Question Bank

Career News

Feeds

Full Forms

Dictionary

Interview Question

Gigs

Quotes

Lyrics

Videos

Courses

Blogs

Tutorials

Forum

Educators

Corporates

Tools

Related Searches

Important General Tech Links

Join Our Community Today