- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Merge multiple lines in a file
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-23-2009 03:53 PM
тАО10-23-2009 03:53 PM
My input file looks like (Unix):
marker,allele1,allele2
RS1002244,1,1
RS1002244,1,3
RS1002244,3,3
RS1003719,2,2
RS1003719,2,4
RS1003719,4,4
Most markers are listed 3 times but a few have 3 alleles and are listed more.
An example of a marker with 3 alleles is:
marker,allele1,allele2
RS757210,2,2
RS757210,2,3
RS757210,2,4
RS757210,3,3
RS757210,3,4
RS757210,4,4
I would like to get output like:
marker,allele1,allele2,allele3
RS1002244,1,3,.
RS1003719,2,4,.
RS757210,2,3,4
Everything I've found gives me
RS1002244,1,1,1,3,3,3,
RS1003719,2,2,2,4,4,4,
etc.
Thanks very much in advance, Peggy 10/23
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-23-2009 06:36 PM
тАО10-23-2009 06:36 PM
Re: Merge multiple lines in a file
Not a very complete description of what
you've tried.
I'd probably write a real program for a job
like this, but I assume that you're trying to
write a shell script.
Incomplete, but possibly useful:
dy # echo ',1,1,1,3,3,3,4,4' | sed -e 's/\(,.\)\1*/\1/g'
,1,3,4
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-23-2009 07:08 PM
тАО10-23-2009 07:08 PM
SolutionAre you sure that sample output matches that? Is the 'allele2' column used at all?
Can you re-state the problem with non-identical values in allele1 and allele2?
Like:
RS1003719,2,7
RS1003719,2,8
RS1003719,4,8
Or is it critical that a allele2 value comes back as allele1?
Is the input garantueed to be sorted?
Anyway... here is some perl which generates the specified output from the specified input, but admittedly I doubt it matches the actual need.
--- x.pl -----------
while (<>) { # Go over all input
$x{"$1 $2"}=1 if /^(\w+),(\d+),/; # remember marker and allele1 if found
}
$x{x} = 1; # this is the end # any ASCII value higher than highest input marker
for (sort keys %x) { # go over accumulated markers
( $marker, $a) = split;
if ($marker eq $old) { # just add column if already seen
$count++;
$text .= ','.$a;
} else {
$text .= ",." if $count == 2; # add empty third if need be
print $text."\n" if $count; # print except for first 1
$count = 1; # First 1 for this marker
$old = $marker;
$text = $marker.','.$a;
}
}
-------------
Run as : perl x.pl x
fwiw,
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-23-2009 07:29 PM
тАО10-23-2009 07:29 PM
Re: Merge multiple lines in a file
>An example of a marker with 3 alleles is:
>marker,allele1,allele2
I'm not sure I see the "3"? Also, is this a title line with a description of the fields?
>RS1002244,1,3,.
It seems you want to collect all of the numbers that occur after the first field (key) and sort unique them?
>Steven: echo ',1,1,1,3,3,3,4,4' | sed -e 's/\(,.\)\1*/\1/g'
Thanks, didn't know you could use \# on the LHS.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-23-2009 09:00 PM
тАО10-23-2009 09:00 PM
Re: Merge multiple lines in a file
I'd never thought of trying it before, and I
wasn't sure until I had tried it, but there
it is. "man 5 regex" doesn't limit it, and
there's even an example using it that way.
> I'm not sure I see the "3"?
I was guessing that
RS757210,2,2
RS757210,2,3
RS757210,2,4
RS757210,3,3
RS757210,3,4
RS757210,4,4
had the three alleles, 2, 3, and 4 (in
various places), attached to the name
("marker") RS757210.
> You appear to be using business logic
> (domain-specific) terminology [...]
Yup. It pays to watch CSI to keep up on the
latest genetics terminology.
I long ago stopped expecting clear problem
statements in this forum. Hoping for, yes;
expecting, no. (I keep asking, but my
success rate is pretty low.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-24-2009 04:06 AM
тАО10-24-2009 04:06 AM
Re: Merge multiple lines in a file
The column headings can be anything; I'm happy with name, column 2, and column 3.
There are indeed 3 separate values for the 2nd example I give - 2,2 - 2,3 - 2,4; values are 2, 3, and 4.
I would like output that lists each number once for each of the names it goes with. It doesn't matter if it's sorted or not.
I didn't include any code because I haven't been able to do much. I found one thing on this web page which is what I used for my last example, where all numbers were included. It was from March of this year, and the subject was "Merging lines into one from one file using awk or gawk".
Sorry, Peggy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-24-2009 04:41 AM
тАО10-24-2009 04:41 AM
Re: Merge multiple lines in a file
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-24-2009 05:39 PM
тАО10-24-2009 05:39 PM
Re: Merge multiple lines in a file
I know you already got good suggestions.
Pere is another one (just to show you
that we are all different :)
#!/usr/bin/perl
use strict;
use warnings;
my %seen = ();
my @MyArr = ();
my @arr = ();
my %myhash;
my %final;
while () {
chomp $_;
my @arr = split(/,/, $_);
push(@MyArr, join ",", $arr[0], $arr[1]);
}
foreach my $elem (@MyArr)
{
$seen{$elem}++;
$myhash{$elem} = "($seen{$elem})";
my @arr = split(/,/, $elem);
$myhash{$elem} =~ s/\(|\)//g;
if ( defined($final{$arr[0]}) ) {
if ( $myhash{$elem} < 2 ) {
$final{$arr[0]} = "$final{$arr[0]},$arr[1]";
}
}
else {
$final{$arr[0]} = "$arr[0],$arr[1]";
}
}
foreach my $hkey (sort keys %final) {
my $ff = $final{$hkey} =~ tr/,/,/;
my $add = q{};
if ( $ff < 3 ) {
$add=",.";
}
print "$final{$hkey}$add\n";
}
exit(0);
__DATA__
RS1002244,1,1
RS1002244,1,3
RS1002244,2,4
RS1002244,3,3
RS1003719,2,2
RS1003719,2,4
RS1003719,4,4
When you run it, the following comes:
RS1002244,1,3,.
RS1003719,2,4,.
RS757210,2,3,4
Cheers,
VK2COT
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-24-2009 09:32 PM
тАО10-24-2009 09:32 PM
Re: Merge multiple lines in a file
IF all the row for a given marker are garantueed to come together, then the output can be generated as the rows are processed.
For example:
---------------------------------------
while (<>) { # Go over all input
next unless /^(\w+),(\d+),/; # marker and allele1 number on this line?
if ($1 eq $old) { # just add column if already seen
next if $allele{$2}++;
print ",$2";
$count++;
} else {
print ",." if $count == 2;
print "\n" if $count;
print "$1,$2"; # print except for first 1
$old = $1;
%allele = ($2 => 1);
$count = 1;
}
}
print ",." if $count == 2;
print "\n";
---------------
or using an array to build the output line....
--------------
while (<>) { # Go over all input
if ( /^(\w+),(\d+),/ ) { # remember marker and allele1 number on this line?
$marker = $1;
} else {
next;
}
if ($marker eq $old) { # just add column if already seen
next if $allele{$2}++; # Seen this one already?
$allele[$count++] = $2; # Put in list if new.
} else {
print join (q(,),$old,@allele),"\n" if $count; # print except for first 1
$count = 1; # First 1 for this marker
$old = $marker;
@allele = ($2, q(.), q(.)); # seed output columns
%allele = ($2 => 1); # only one value allele seen so for
}
}
print join (q(,),$old,@allele),"\n";
--------------
TimTowTdi
enough already!
:-)
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-25-2009 04:06 AM
тАО10-25-2009 04:06 AM