.net - Help understanding C# optimization -
i playing c# , wanted speed program. made changes , able so. however, need understanding why change made faster.
i've attempted reduce code easier understand in question. score1 , report1 slower way. score2 , report2 faster way. first method first stores string , int in struct in parallel. next, in serial loop, loops through array of structs , writes data buffer. second method first writes data string buffer in parallel. next, in serial loop, writes string data buffer. here sample run times:
run 1 total average time = 0.492087 sec run 2 total average time = 0.273619 sec
when working earlier non-parallel version of this, times same. why difference parallel version?
even if reduce loop in report1 write single line of output buffer still slower (total time .42 sec).
here simplified code:
using system; using system.collections.generic; using system.linq; using system.text; using system.diagnostics; using system.threading.tasks; using system.io; namespace optimizationquestion { class program { struct validword { public string word; public int score; } validword[] valid; stringbuilder output; int total; public void score1(string[] words) { valid = new validword[words.length]; (int = 0; < words.length; i++) { stringbuilder builder = new stringbuilder(); foreach (char c in words[i]) { if (c != 'u') builder.append(c); } if (words[i].length == 3) { valid[i] = new validword { word = builder.tostring(), score = words[i].length }; } } } public void report1(stringbuilder outputbuffer) { int total = 0; foreach (validword wordinfo in valid) { if (wordinfo.score > 0) { outputbuffer.appendline(string.format("{0} {1}", wordinfo.word.tostring(), wordinfo.score)); total += wordinfo.score; } } outputbuffer.appendline(string.format("total = {0}", total)); } public void score2(string[] words) { output = new stringbuilder(); total = 0; (int = 0; < words.length; i++) { stringbuilder builder = new stringbuilder(); foreach (char c in words[i]) { if (c != 'u') builder.append(c); } if (words[i].length == 3) { output.appendline(string.format("{0} {1}", builder.tostring(), words[i].length)); total += words[i].length; } } } public void report2(stringbuilder outputbuffer) { outputbuffer.append(output.tostring()); outputbuffer.appendline(string.format("total = {0}", total)); } static void main(string[] args) { program[] program = new program[100]; (int = 0; < program.length; i++) program[i] = new program(); string[] words = file.readalllines("words.txt"); stopwatch stopwatch = new stopwatch(); const int timing_repetitions = 20; double averagetime1 = 0.0; stringbuilder output = new stringbuilder(); (int = 0; < timing_repetitions; ++i) { stopwatch.reset(); stopwatch.start(); output.clear(); parallel.foreach<program>(program, p => { p.score1(words); }); (int k = 0; k < program.length; k++) program[k].report1(output); stopwatch.stop(); averagetime1 += stopwatch.elapsed.totalseconds; gc.collect(); } averagetime1 /= (double)timing_repetitions; console.writeline(string.format("run 1 total average time = {0:0.000000} sec", averagetime1)); double averagetime2 = 0.0; (int = 0; < timing_repetitions; ++i) { stopwatch.reset(); stopwatch.start(); output.clear(); parallel.foreach<program>(program, p => { p.score2(words); }); (int k = 0; k < program.length; k++) program[k].report2(output); stopwatch.stop(); averagetime2 += stopwatch.elapsed.totalseconds; gc.collect(); } averagetime2 /= (double)timing_repetitions; console.writeline(string.format("run 2 total average time = {0:0.000000} sec", averagetime2)); console.readline(); } } }
first of all, parallelizing repeated runs. improve benchmark time, may not out real production run time much. accurately measure how long take run through 1 word list, need have 1 word list going @ time. otherwise, individual threads processing lists competing each other extent system resources , time per list suffers, if time lists in total faster.
to speed time process 1 word list, want processes individual words in list in parallel, 1 list @ time. enough definition/size measurement, either make list long or process list many times in serial.
in case, gets bit tricky because stringbuilder needed final product not documented being thread-safe. it's not bad, though. here's example of calling parallel foreach single word list:
var locker = new object(); //i'd make static, should end closure , still work var outputbuffer = new stringbuilder(); // can improve things futher if can make estimate final size , force allocate memory need front int score = 0; parallel.foreach(words, w => { // want push of work individual threads possible. // if run in 1 thread, stringbuilder per word bad. // run in parallel, allows little more of work outside of locked code. var buf = new stringbuilder(w.length + 5); string word = buf.append(w.where(c=>c!='u').concat(' ').toarray()).append(w.length).tostring(); lock(locker) { outputbuffer.append(word); score += w.length; } }); outputbuffer.append("total = ").append(score);
just call 20 times in normal sequentially processed loop. again, might finish benchmarks little slower, think perform real-world little faster because of flaw in benchmark. note typed right reply window — i've never event tried compile it, , it's not perfect right out of gate.
after fixing benchmark more accurately reflect how parallel code impact real-world processing time, next step profiling see program spending it's time. how know areas @ improvement.
out of curiosity, i'd know how version performs:
var agg = new {score = 0, outputbuffer = new stringbuilder()}; agg = words.where(w => w.length == 3) .select(w => new string(w.where(c => c!='u').toarray()) .aggregate(agg, (a, w) => {a.outputbuffer.appendformat("{0} {1}\n", w, w.length); score += w.length;}); agg.outputbuffer.append("total = ").append(score);
Comments
Post a Comment