Friday 15 January 2010

regex - Extract line pattern from a text file -


I have a file with many entries in which I want to change the location of the headers for each entry.

The file content looks something like this:

  & gt; G. 215277009 | Reference | NR_024540.1. Homo sapiens protein family homolog 7 pseudogene (WASH7P), non-coding RNA RNARNARNARNARNA & gt; Gee 389,886,562 | Referee | NR_046018.2. Like Homo Sapiens Dead / H (ASP-Glu-Alla-ASP / Her) box helicase 11 1 (DDX11L1), non-coding RNA MORERNARNARNARNA RNARNARNARNARNA ...   

And I want to create Something like this:

  & gt; NR_024540 RNARNARNARNARNA & gt; NR_046018 MORERNARNARNARNA RNARNARNARNNA   

Now I make a regex that works fine when I run it pearl (on a teststring), but when I go to the following sed Command (in Ubuntu) I run, nothing happens. What's wrong with this order?

  SED -ri's / \>. [\ W \ |] \ | Referee \ | (\ W +) \ \ d + \ | * / \ & Gt; Text after "1 / g" rna_copy.fa    

SED < Code> \ w or \ d . To use character classes instead of you

  SED -r /> [[: Alnum:] \ |] + \ | Referee \ | ([A-zA-Z0-9_] \ [[:: digits:] ..]) * /> \ 1 / g \ '   

No comments:

Post a Comment