c++ - What's the meaning of thread concurrency overhead time in the profiler output? -


i'd appreciated if experience of intel vtune amplifier tell me thing.

recently received performance analysis report other guys used intel vtune amplifier against program. tells, there high overhead time in thread concurrency area.

what's meaning of overhead time? don't know (asked me), don't have access intel vtune amplifier.

i have vague ideas. program has many thread sleep calls because pthread condition unstable (or did badly) in target platform change many routines works in loop below:

while (true) {    mutex.lock();    if (event changed)    {       mutex.unlock();       //       break;    }    else    {       mutex.unlock();       usleep(3 * 1000);    } } 

this can flagged overhead time?

any advice?


i found documentation overhead time intel site. http://software.intel.com/sites/products/documentation/hpc/amplifierxe/en-us/win/ug_docs/olh/common/overhead_time.html#overhead_time

excerpt:

overhead time duration starts release of shared resource , ends receipt of resource. ideally, duration of overhead time short because reduces time thread has wait acquire resource. however, not cpu time in parallel application may spent on doing real pay load work. in cases when parallel runtime (intel® threading building blocks, openmp*) used inefficiently, significant portion of time may spent inside parallel runtime wasting cpu time @ high concurrency levels. example, may result low granularity of work split in recursive parallel algorithms: when workload size becomes low, overhead on splitting work , performing housekeeping work becomes significant.

still confusing.. mean "you made unnecessary/too frequent lock"?

i not of expert on that, though have tried use pthread bit myself.

to demonstrate understanding of overhead time, let take example of simple single-threaded program compute array sum:

for(i=0;i<num;i++) {     sum += array[i]; } 

in simple [reasonably done] multi-threaded version of code, array broken 1 piece per thread, each thread keeps own sum, , after threads done, sums summed.

in poorly written multi-threaded version, array broken down before, , every thread atomicadd global sum.

in case, atomic addition can done 1 thread @ time. believe overhead time measure of how long of other threads spend while waiting own atomicadd (you try writing program check if want sure).

of course, takes account time takes deal switching semaphores , mutexes around. in case, means significant amount of time spent on internals of mutex.lock , mutex.unlock.

i parallelized piece of software while ago (using pthread_barrier), , had issues took longer run barriers did use 1 thread. turned out loop had have 4 barriers in executed enough make overhead not worth it.


Comments

Popular posts from this blog

python - Scipy curvefit RuntimeError:Optimal parameters not found: Number of calls to function has reached maxfev = 1000 -

binding - How can you make the color of elements of a WPF DrawingImage dynamic? -

c# - How to add a new treeview at the selected node? -