Splitting a large txt file into 200 smaller txt files on a regex using shell script in BASH -


hi guys hope subject clear enough, haven't found in asked bin. i've tried implementing in perl or python, think may trying hard.

is there simple shell command / pipeline split 4mb .txt file seperate .txt files, based on beginning , ending regex?

i provide short sample of file below.. can see every "story" starts phrase "x of xxx documents", used split file.

i think should easy , i'd surprised if bash can't - faster perl/py.

here is:

                           1 of 999 documents                 copyright 2011 virginian-pilot companies llc                           rights reserved                    virginian-pilot(norfolk, va.)  ...                               3 of 999 documents                     copyright 2011 canwest news service                           rights reserved                           canwest news service  ... 

thanks in advance help.

ross

awk '/[0-9]+ of [0-9]+ documents/{g++} { print $0 > g".txt"}' file 

osx users need gawk, builtin awk produce error awk: illegal statement @ source line 1

ruby(1.9+)

#!/usr/bin/env ruby g=1 f=file.open(g.to_s + ".txt","w") open("file").each |line|   if line[/\d+ of \d+ documents/]     f.close     g+=1     f=file.open(g.to_s + ".txt","w")   end   f.print line end 

Comments

Popular posts from this blog

python - Scipy curvefit RuntimeError:Optimal parameters not found: Number of calls to function has reached maxfev = 1000 -

c# - How to add a new treeview at the selected node? -

java - netbeans "Please wait - classpath scanning in progress..." -