- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - Linux
- >
- Re: merge files and match columns
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-30-2005 03:18 PM
тАО08-30-2005 03:18 PM
#:::::::::::::::::::::::::::::::::::::::::::::::
#:::: File1 (en -> cn)
#:::::::::::::::::::::::::::::::::::::::::::::::
#:::::::::::::::::::::::::::::::::::::::::::::::
#:::: FILE 2 (en -> jp)
#:::::::::::::::::::::::::::::::::::::::::::::::
So I want to end up with an english sentence in column 1, and its translated values (in Japanese & Chinese) in columns 2 & 3 - all on one line. Ideally I'd end up with all three languages on one line, but as you can see, to begin with the files are different and I can't always expect to have data in every language.
#:::::::::::::::::::::::::::::::::::::::::::::::
#:::: NewFile (en -> cn -> jp)
#:::::::::::::::::::::::::::::::::::::::::::::::
Rather than manually aligning the content in an editor, here is my idea: run a script that
Compares line by line:
1. Reads the first column of File1
2. Looks for a match in File2; grabs the matching line, sending it to NewFile.
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-30-2005 03:58 PM
тАО08-30-2005 03:58 PM
Re: merge files and match columns
while read line; do
match=$(echo $line | sed -e "s/
grep $match
done <
hth.
- Tags:
- while loop
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-30-2005 04:39 PM
тАО08-30-2005 04:39 PM
Re: merge files and match columns
I had a little trouble with your data as it seems to be unicode no?
Here is one approach in PERL with normal text.
How did you wond the
Provide input files as arguments:
perl x.p file1 file2 ...
---- x.p ----
while ($file = shift @ARGV) {
open (FILE, "<$file") or die "Failed to open $file";
while (
chop;
if (/\s+(<\w+>)/) {
$english{$`}++;
$foreign{$`.$1} = "$1$'";
}
}
}
foreach (sort keys %english) {
printf ("%s\t%s\t%s\n", $_, $foreign{$_."
});
}
Here is a waekish attempt on doing unicode with normal libraries. It adds dynamically finding the languages.
while ($file = shift @ARGV) {
open (FILE, "<$file") or die "Failed to open $file";
while (
chop;
chop;
if (/\s.(<.\w.\w.>.)/) {
$english{$`}++;
$languages{$1}++;
$foreign{$`.$1} = "$1$'";
}
}
}
foreach (sort keys %languages){
$l[$i++]=$_;
}
foreach (sort keys %english) {
printf ("%-50s\t%30s\t%30s\n", $_, $foreign{$_.$l[0]},$foreign{$_.$l[1]});
}
fwiw,
Hein.
- Tags:
- Perl
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-30-2005 06:20 PM
тАО08-30-2005 06:20 PM
Re: merge files and match columns
Hein, could you also explain what the script does? It also did not work for me..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-30-2005 07:01 PM
тАО08-30-2005 07:01 PM
Re: merge files and match columns
Have a look at the "join" command. It is very useful for matching lines in two files when there is uneven number of lines or a "one to many relation".
- Tags:
- join
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-30-2005 09:33 PM
тАО08-30-2005 09:33 PM
Re: merge files and match columns
#!/bin/ksh
file1="file1";
file2="file2";
newfile="newfile";
> $newfile
while read line;
do
col=$(echo $line | sed 's/
loop=0
grep "$col" $file2 | while read nline;
do
echo $line $nline >> ${newfile}
loop=1
done
[[ $loop -eq 0 ]] && echo $line >> ${newfile}
done < file1
# end #
## Check ###
# cat > file2
# cat file1
# cat scr.sh
file1="file1";
file2="file2";
newfile="newfile";
> $newfile
while read line;
do
col=$(echo $line | sed 's/
loop=0
grep "$col" $file2 | while read nline;
do
echo $line $nline
loop=1
done
[[ $loop -eq 0 ]] && echo $line
done < file1
# sh scr.sh
hth.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-30-2005 09:59 PM
тАО08-30-2005 09:59 PM
Re: merge files and match columns
you can try this script, using your input files as $1 and $2:
#!/usr/bin/sh
while read line1
do
F1=$(echo "$line1" | awk -F"
F11=$(echo "$line1" | awk -F"
while read line2
do
F2=$(echo "$line2" | awk -F"
F22=$(echo "$line2" | awk -F"
if [ "$F1" = "$F2" ]
then
echo "${F1}
else
continue
fi
done < $2
done < $1
However, it takes years to execute....
regards,
John K.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-31-2005 12:35 AM
тАО08-31-2005 12:35 AM
Solutionyou can use the join command to solve your problem.
The files ghave to have a filed separator "|" (you can quickly chnage tabs to this by tr or vi) and the second file has to contain all the key. both files have to be sorted.
i.e.:
cat file1
cat file2
join -a2 -t"|" -j1 1 -j2 1 -o 2.1,1.2,2.2 file1 file2
join files by first field keeping records in file2 which haven't key in file1 writing in output file1.field1,file1.field2,file2.filed2
type man join for furhet info
Hope this solve your problem
Art
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-31-2005 01:00 AM
тАО08-31-2005 01:00 AM
Re: merge files and match columns
Sure... I often do, but it was late. I also failed to add to check out 'join' as i intended. That may do the job, but with less control/options.
I suspected problems with unicode chars. I could not nicely 'paste' the data from your topic into a Unix window. Windows accepted it, but suggested to store as unicode. The partially pasted sample does work with my first script on a Unix box. Try that? For better help, and any future questions, be sure to ATTACH the real data (in a txt document?) as the forum munges tabs and spaces.
Anyways.... my first script:
1) loop over input arguments (allows for more than 2 lanugage files some day)
2) open current input file
3) loop through current input file
4) chop to drop newline for future 'append'.
5) if you see whitespace followed by "<", a word, and "> then it looks like a useful line.
6) take everything 'left' of the matched string a key for a list of english words.
7) create a list of translated words using the englisgh key, with the language appended to the key, storign the value for that combo.
8) when all the loops are done, look through all the english words (from whatever input file) and make a print line with the english, the looked-up value for the translated values (if any)!
You may wnat to replace the \s+ with \t for just a tab.
The second example dealt with the files stored as unicode in a crummy way by just treating each 16 bit unicode char as a single char plus '.' = any-char.
It also 'counts' each foreign language usage, with the main intent to just register that language.
After the input loops it then make and inventory of the languages seen and uses those to select the foreing language entries.
Currently hardcoded as just 2, but easily expanded to a loop over more language if that is ever needed.
hth,
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-31-2005 11:11 AM
тАО08-31-2005 11:11 AM
Re: merge files and match columns
My file was encoded in UTF8 and then pasted from Linux into IE on a Windows box (in order to post here on ITRC). If you're a Chinese or Japanese reader you'll notice the characters don't appear quite right, but you get the general idea I think.
My file is several million lines long and after some testing on this snippet, I think I'm going to try the "join" solution -- very fast.