MATLAB: Saving several variables to "-v7.3" (HDF5) .mat-files seems to be faster when using the "-append" flag. How come? -


note: this question deals issue observed in 2011 old matlab version (r2009a). per update below july 2016, issue/bug in matlab seems no longer exist (tested r2016a; scroll down end of question see update).

i using matlab r2009b , need write larger script converts contents of larger set of .zip files v7.3 mat files (with underlying hdf5-datamodel). reading ok. issue saving. , there no problem. files saves nicely using save command.

my question more in sense: why observing following surprising (for me) behavior in matlab?

let's @ issue in general. in current test-scenario generating 1 output: -v7.3 mat-file. .mat-file contain 40 blocks individual variables. each variable named "block_nnn" 1 40 , contain struct fields frames , blockno. field frames contains 480x240x65 sequence of uint8 imagedata (here random data generated using randi). field blockno contains block number.

remark: in real script (that have yet finish) doing above @ total of 370 times, converting total of 108gb of raw data. why concerned following.

anyway, first define general variables:

 % sizes dummy data , loops: num_blockcount = 40; num_blocklength = 65; num_frameheight = 480; num_framewidth = 240; 

i generate dummy code has shape , size identical actual raw data:

 % generate empty struct: stu_data2disk = struct();  % loop on blocks: num_k = 1:num_blockcount     % generate block-name:    temp_str_blockname = sprintf('block_%03u', num_k);     % generate temp struct current block:    temp_stu_value = struct();    temp_stu_value.frames = randi( ...       [0 255], ...       [num_frameheight num_framewidth num_blocklength], ...       'uint8' ...    );    temp_stu_value.blockno = num_k;     % using dynamic field names:    stu_data2disk.(sprintf('block_%03u', num_k)) = temp_stu_value;  end 

i have random test-data in struct stu_data2disk. save data using 1 of 2 possible methods.

let's try simple 1 first:

 % save data (simple): disp('save data simple way:') tic; save converted.mat -struct stu_data2disk -v7.3; toc; 

the file written without problems (286mb). output is:

 save data simple way: elapsed time 14.004449 seconds. 

ok - remembered follow save-procedure on 40 blocks. instead of above loop on blocks , append them in sequence:

 % save file, using append: disp('save data using -append:') tic; num_k = 1:num_blockcount     % generate block-name:    temp_str_blockname = sprintf('block_%03u', num_k);     temp_str_appendtoggle = '';    if (num_k > 1)       temp_str_appendtoggle = '-append';    end     % generate save command:    temp_str_savecommand = [ ...       'save ', ...       'converted_append.mat ', ...       '-struct stu_data2disk ', temp_str_blockname, ' '...       temp_str_appendtoggle, ' ', ...       '-v7.3', ...       ';' ...    ];     % evaluate save command:    eval(temp_str_savecommand);  end toc; 

and again file saves nicely (286mb). output is:

 save data using -append: elapsed time 0.956968 seconds. 

interestingly append-method faster? my question why?

output dir converted*.mat:

 09-02-2011  20:38       300,236,392 converted.mat 09-02-2011  20:37       300,264,316 converted_append.mat                2 file(s)    600,500,708 bytes 

the files not identical in size. , test fc in windows 7 revealed ... many binary differences. perhaps data shifted bit - tells nothing.

does have idea going on here? appended file using more optimized data-structure perhaps? or maybe windows has cached file , makes access faster?

i made effort of test-reading 2 files well. without presenting numbers here appended version little bit faster (could mean in long run though).

[edit]: tried using no format flag (defaults -v7 on system) , there not difference anymore:

 save data simple way (-v7): elapsed time 13.092084 seconds. save data using -append (-v7): elapsed time 14.345314 seconds. 

[edit]: corrected above mistake. mentioned stats -v6 mistaken. had removed format flag , assumed default -v6 -v7.

i have created new test stats formats on system using andrew's fine framework (all formats same random test data, read file):

 15:15:51.422: testing speed, format=-v6, r2009b on pcwin, arch=x86, os=microsoft windows 7 professional  6.1.7600 n/a build 7600 15:16:00.829: save simple way:            0.358 sec 15:16:01.188: save using multiple append:     7.432 sec 15:16:08.614: save using 1 big append:      1.161 sec  15:16:24.659: testing speed, format=-v7, r2009b on pcwin, arch=x86, os=microsoft windows 7 professional  6.1.7600 n/a build 7600 15:16:33.442: save simple way:           12.884 sec 15:16:46.329: save using multiple append:    14.442 sec 15:17:00.775: save using 1 big append:     13.390 sec  15:17:31.579: testing speed, format=-v7.3, r2009b on pcwin, arch=x86, os=microsoft windows 7 professional  6.1.7600 n/a build 7600 15:17:40.690: save simple way:           13.751 sec 15:17:54.434: save using multiple append:     3.970 sec 15:17:58.412: save using 1 big append:      6.138 sec 

and sizes of files:

 10-02-2011  15:16       299,528,768 converted_format-v6.mat 10-02-2011  15:16       299,528,768 converted_append_format-v6.mat 10-02-2011  15:16       299,528,832 converted_append_batch_format-v6.mat 10-02-2011  15:16       299,894,027 converted_format-v7.mat 10-02-2011  15:17       299,894,027 converted_append_format-v7.mat 10-02-2011  15:17       299,894,075 converted_append_batch_format-v7.mat 10-02-2011  15:17       300,236,392 converted_format-v7.3.mat 10-02-2011  15:17       300,264,316 converted_append_format-v7.3.mat 10-02-2011  15:18       300,101,800 converted_append_batch_format-v7.3.mat                9 file(s)  2,698,871,005 bytes 

thus -v6 seems fastest writing. not large differences in files sizes. hdf5 have basic inflate-method built-in far know.

hmm, optimization in underlying hdf5-write functions?

currently still think underlying fundamental hdf5-write function optimized adding datasets hdf5-file (which happens when adding new variables -7.3 file). believe have read somewhere hdf5 should optimized in way... though cannot sure.

other details note:

the behavior systemic see in andrew's answer below. seems quite important whether or not run these things in local scope of function or in "global" of m-script. first results m-script files written current directory. can still reproduce 1-second write -7.3 in m-script. function-calls add overhead apparently.

update july 2016:

i found again , thought might test newest matlab available me @ moment. matlab r2016a on windows 7 x64 problem seems have been fixed:

 14:04:06.277: testing speed, imax=255, r2016a on pcwin64, arch=amd64, 16 gb, os=microsoft windows 7 enterprise  version 6.1 (build 7601: service pack 1) 14:04:10.600: basic -v7.3:                    7.599 sec      5.261 gb used 14:04:18.229: basic -v7.3:                    7.894 sec      5.383 gb used 14:04:26.154: basic -v7.3:                    7.909 sec      5.457 gb used 14:04:34.096: basic -v7.3:                    7.919 sec      5.498 gb used 14:04:42.048: basic -v7.3:                    7.886 sec      5.516 gb used     286 mb file   7.841 sec mean 14:04:50.581: multiappend -v7.3:              7.928 sec      5.819 gb used 14:04:58.544: multiappend -v7.3:              7.905 sec      5.834 gb used 14:05:06.485: multiappend -v7.3:              8.013 sec      5.844 gb used 14:05:14.542: multiappend -v7.3:              8.591 sec      5.860 gb used 14:05:23.168: multiappend -v7.3:              8.059 sec      5.868 gb used     286 mb file   8.099 sec mean 14:05:31.913: bigappend -v7.3:                7.727 sec      5.837 gb used 14:05:39.676: bigappend -v7.3:                7.740 sec      5.879 gb used 14:05:47.453: bigappend -v7.3:                7.645 sec      5.884 gb used 14:05:55.133: bigappend -v7.3:                7.656 sec      5.877 gb used 14:06:02.824: bigappend -v7.3:                7.963 sec      5.871 gb used     286 mb file   7.746 sec mean 

this tested andrew janke's repromatfileappendspeedup function in accepted answer below (5 passes format 7.3). now, -append equally slow, or slower, single save - should be. perhaps problem build of hdf5 driver used in r2009a.

holy cow. can reproduce. tried single-append variation too; it's speedier. looks "-append" magically makes hdf5-based save() 30x faster. don't have explanation wanted share found.

i wrapped test code in function, refactoring make save logic agnostic test data structure can run on other data sets, , added more diagnostic output.

don't see big speedup everywhere. it's huge on 64-bit xp box , 32-bit server 2003 box, big on 64-bit windows 7 box, nonexistent on 32-bit xp box. (though multiple appends huge loss on server 2003.) r2010b slower in many cases. maybe hdf5 appends or save's use of rock on newer windows builds. (xp x64 server 2003 kernel.) or maybe it's machine config difference. there's fast raid on xp x64 machine, , 32-bit xp has less ram rest. os , architecture running? can try repro too?

19:36:40.289: testing speed, format=-v7.3, r2009b on pcwin64, arch=amd64, os=microsoft(r) windows(r) xp professional x64 edition 5.2.3790 service pack 2 build 3790 19:36:55.930: save simple way:           11.493 sec 19:37:07.415: save using multiple append:     1.594 sec 19:37:09.009: save using 1 big append:      0.424 sec   19:39:21.681: testing speed, format=-v7.3, r2009b on pcwin, arch=x86, os=microsoft windows xp professional 5.1.2600 service pack 3 build 2600 19:39:37.493: save simple way:           10.881 sec 19:39:48.368: save using multiple append:    10.187 sec 19:39:58.556: save using 1 big append:     11.956 sec   19:44:33.410: testing speed, format=-v7.3, r2009b on pcwin64, arch=amd64, os=microsoft windows 7 professional  6.1.7600 n/a build 7600 19:44:50.789: save simple way:           14.354 sec 19:45:05.156: save using multiple append:     6.321 sec 19:45:11.474: save using 1 big append:      2.143 sec   20:03:37.907: testing speed, format=-v7.3, r2009b on pcwin, arch=x86, os=microsoft(r) windows(r) server 2003, enterprise edition 5.2.3790 service pack 2 build 3790 20:03:58.532: save simple way:           19.730 sec 20:04:18.252: save using multiple append:    77.897 sec 20:05:36.160: save using 1 big append:      0.630 sec 

this looks huge. if holds on other data sets, might use trick in lot of places myself. may bring mathworks, too. use fast append technique in normal saves or other os versions, too?

here's self-contained repro function.

function out = repromatfileappendspeedup(npasses, tests, imax, formats) %repromatfileappendspeedup show how -append makes v7.3 saves faster % % examples: % repromatfileappendspeedup() % repromatfileappendspeedup(2, [], 0, {'7.3','7','6'}); % low-entropy test  if nargin < 1 || isempty(npasses);  npasses = 1;  end if nargin < 2 || isempty(tests);    tests = {'basic','multiappend','bigappend'}; end if nargin < 3 || isempty(imax);     imax = 255; end if nargin < 4 || isempty(formats);  formats = '7.3'; end % -v7 , -v6 not show speedup tests = cellstr(tests); formats = cellstr(formats);  fprintf('%s: testing speed, imax=%d, r%s on %s\n',...     timestamp, imax, version('-release'), systemdescription());  tempdir = setuptempdir(); testdata = generatetestdata(imax);  testmap = struct('basic','savesimple', 'multiappend','savemultiappend', 'bigappend','savebigappend');  iformat = 1:numel(formats)     format = formats{iformat};     formatflag = ['-v' format];     %fprintf('%s: format %s\n', timestamp, formatflag);     itest = 1:numel(tests)         testname = tests{itest};         savefcn = testmap.(testname);         te = nan(1, npasses);         ipass = 1:npasses             fprintf('%s: %-30s', timestamp, [testname ' ' formatflag ':']);             t0 = tic;             matfile = fullfile(tempdir, sprintf('converted-%s-%s-%d.mat', testname, format, i));             feval(savefcn, matfile, testdata, formatflag);             te(ipass) = toc(t0);             if ipass == npasses                 fprintf('%7.3f sec      %5.3f gb used   %5.0f mb file   %5.3f sec mean\n',...                     te(ipass), physicalmemoryused/(2^30), getfield(dir(matfile),'bytes')/(2^20), mean(te));             else                 fprintf('%7.3f sec      %5.3f gb used\n', te(ipass), physicalmemoryused/(2^30));             end         end         % verify data make sure sane         gotback = load(matfile);         gotback = rmfield(gotback, intersect({'dummy'}, fieldnames(gotback)));         if ~isequal(gotback, testdata)             fprintf('error: loaded data differs original %s %s\n', formatflag, testname);         end     end end  % clean rmdir(tempdir, 's');  %% function savesimple(file, data, formatflag) save(file, '-struct', 'data', formatflag);  %% function out = physicalmemoryused() if ~ispc     out = nan;     return; % memory() works on windows end [u,s] = memory(); out = s.physicalmemory.total - s.physicalmemory.available;  %% function savebigappend(file, data, formatflag) dummy = 0; save(file, 'dummy', formatflag); fieldnames = fieldnames(data); save(file, '-struct', 'data', fieldnames{:}, '-append', formatflag);  %% function savemultiappend(file, data, formatflag) fieldnames = fieldnames(data); = 1:numel(fieldnames)     if (i > 1); appendflag = '-append'; else; appendflag = ''; end     save(file, '-struct', 'data', fieldnames{i}, appendflag, formatflag); end   %% function testdata = generatetestdata(imax) nblocks = 40; blocksize = [65 480 240]; = 1:nblocks     testdata.(sprintf('block_%03u', i)) = struct('blockno',i,...         'frames', randi([0 imax], blocksize, 'uint8')); end  %% function out = timestamp() %timestamp showing timestamps make sure not tic/toc problem out = datestr(now, 'hh:mm:ss.fff');  %% function out = systemdescription() if ispc     platform = [system_dependent('getos'),' ',system_dependent('getwinsys')]; elseif ismac     [fail, input] = unix('sw_vers');     if ~fail         platform = strrep(input, 'productname:', '');         platform = strrep(platform, sprintf('\t'), '');         platform = strrep(platform, sprintf('\n'), ' ');         platform = strrep(platform, 'productversion:', ' version: ');         platform = strrep(platform, 'buildversion:', 'build: ');     else         platform = system_dependent('getos');     end else     platform = system_dependent('getos'); end arch = getenv('processor_architew6432'); if isempty(arch)     arch = getenv('processor_architecture'); end try     [~,sysmem] = memory(); catch     sysmem.physicalmemory.total = nan; end out = sprintf('%s, arch=%s, %.0f gb, os=%s',...     computer, arch, sysmem.physicalmemory.total/(2^30), platform);  %% function out = setuptempdir() out = fullfile(tempdir, sprintf('%s - %s', mfilename, datestr(now, 'yyyymmdd-hhmmss-fff'))); mkdir(out); 

edit: modified repro function, adding multiple iterations , parameterizing save styles, file formats, , imax randi generator.

i think filesystem caching big factor fast -append behavior. when bunch of runs in row repromatfileappendspeedup(20) , watch system information in process explorer, of them under second, , physical memory usage ramps couple gb. every dozen passes, write stalls , takes 20 or 30 seconds, , physical ram usage ramps down started. think means windows caching lot of writes in ram, , -append makes more willing so. amortized time including stalls still lot faster basic save, me.

by way, after doing multiple passes couple hours, i'm having hard time reproducing original timings.


Comments

Popular posts from this blog

python - Scipy curvefit RuntimeError:Optimal parameters not found: Number of calls to function has reached maxfev = 1000 -

binding - How can you make the color of elements of a WPF DrawingImage dynamic? -

c# - How to add a new treeview at the selected node? -