- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - Linux
- >
- Re: Merging files in different formats
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-16-2010 04:34 PM
тАО01-16-2010 04:34 PM
Merging files in different formats
I have 3 files - 1 for each of 3 different chromosomes. Each of the 3 files looks like (with different SNPS):
SNP Al1 Al2
rs7342690 C A
rs11862844 A C
rs4021617 A G
I have 3 other files (for the same 3 chromosomes) that are all the same line length - 2,457 individuals. Each line looks like:
C005->C005-000 ML_DOSE 1.178 1.177 1.333 1.782 0.225 0.437 0.586 1.999 2.000 0.523 ....
Each of the 2,457 lines start with a four character/digit family id pointing (->) to a study id. Each value in this file after ML_DOSE corresponds (in order) to the SNPs in the first file.
I need to match up the values in the file containing study ids to the file containing the SNPs. There are 577,282 SNPs in 1 file (I need to match up 5), 598,112 SNPs in the second file (I need to match up 2), and 263,830 in the third file (I need to match up 4).
I've done this before but only for 1 studyid, so it was just a matter of grepping out the ID, replacing blank spaces with with \n in sed, and pasting the 2 files together. I can't figure out how to do this with 2,457 people though.
Any help will be very much appreciated, and I will certainly send more details about the files if it helps. Thanks to all, Peggy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-17-2010 10:21 PM
тАО01-17-2010 10:21 PM
Re: Merging files in different formats
Simply use hash arrays in Perl or awk.
First split the line into the fields you are interested in and then put the data in hash arrays as you wish.
Of course it may become hash array of more complex structures, but that too is not hard to do. For perl see these man pages:
perldata
perldsc
Regards,
Goran
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-18-2010 02:58 AM
тАО01-18-2010 02:58 AM
Re: Merging files in different formats
if you need to match them based on the first field, then:
# man 1 join
if it's not the first field you want to match by then awk both files to rearrange the columns.
if you want to join them without any matching (I mean 1st line to 1st line and so on...)
# man 1 paste
Unix operates with beer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-18-2010 03:09 AM
тАО01-18-2010 03:09 AM
Re: Merging files in different formats
Again refer to my comment about domain specific terminology in your previous thread:
http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1381141
>C005->C005-000 ML_DOSE 1.178 1.177 1.333 1.782 0.225 0.437 0.586 1.999 2.000 0.523 ....
How many fields on each line? Or if more than X, you have a continuation line?
>Each value in this file after ML_DOSE corresponds (in order) to the SNPs in the first file.
Is there one field for each record in your first file?
>There are 577,282 SNPs in first file (I need to match up 5)
Match 5 to what?
>598,112 SNPs in the second file (I need to match up 2), and 263,830 in the third file (I need to match up 4).
I'm not sure what 2 and 4 you want to match?
You seem to mention 3 SNPs files and 3 other files for study ids. What files match to what?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-18-2010 04:36 AM
тАО01-18-2010 04:36 AM
Re: Merging files in different formats
SNP:
SNP Al1 Al2
rs7342690 C A
rs11862844 A C
rs4021617 A G
study_ID:
C005->C005-000 ML_DOSE 1.178 1.177 1.333 1.782 0.225 0.437 0.586 1.999 2.000 0.523 ....
>Each value in this file after ML_DOSE corresponds (in order) to the SNPs in the first file.
How? I don't see any correspondence here.
And after matching these, what ouput would you like to have?
Unix operates with beer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-18-2010 04:23 PM
тАО01-18-2010 04:23 PM
Re: Merging files in different formats
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-18-2010 07:59 PM
тАО01-18-2010 07:59 PM
Re: Merging files in different formats
The computer science terminology is pretty simple: files, records, fields and characters
If you wish to name these items with your terminology, do so, but it is probably easier for us dummies to deal with "fileb", "field2", etc. :-)
It would also help if you can show some small sample input files and where fields need to go for your output.