Splitting a large txt file into 200 smaller txt files on a regex using shell script in BASH -
hi guys hope subject clear enough, haven't found in asked bin. i've tried implementing in perl or python, think may trying hard.
is there simple shell command / pipeline split 4mb .txt file seperate .txt files, based on beginning , ending regex?
i provide short sample of file below.. can see every "story" starts phrase "x of xxx documents", used split file.
i think should easy , i'd surprised if bash can't - faster perl/py.
here is:
1 of 999 documents copyright 2011 virginian-pilot companies llc rights reserved virginian-pilot(norfolk, va.) ... 3 of 999 documents copyright 2011 canwest news service rights reserved canwest news service ...
thanks in advance help.
ross
awk '/[0-9]+ of [0-9]+ documents/{g++} { print $0 > g".txt"}' file
osx users need
gawk
, builtinawk
produce errorawk: illegal statement @ source line 1
ruby(1.9+)
#!/usr/bin/env ruby g=1 f=file.open(g.to_s + ".txt","w") open("file").each |line| if line[/\d+ of \d+ documents/] f.close g+=1 f=file.open(g.to_s + ".txt","w") end f.print line end
Comments
Post a Comment