Friday, August 13, 2010

Computer Performance

From Stanford EE282 + Computer Architecture: A Quantitative Approach, 4th Edition

  • Latency or execution time or response time
    • Wall-clock time to complete a task
  •  Bandwidth or throughput or execute rate
    • Number of tasks completed per unit of time
 Latency matric: program execution time

CPUTime = Seconds/Program
                = Cycles/Program * Seconds/Cycle
                = Instructions/Program * Cycles/Instruction * Seconds/Cycle
                = IC * CPI * CCT

IC: Instruction Count
CPI: Clock Cycles Per Instruction
CCT: Clock Cycle Time

Amdahl’s Law

Speedup = Execution time for entire task without using the enhancement/Execution time for entire task using the enhancement when possible

It should be greater than 1 (when there is an improvement, that is)

new_execution_time
= (original_execution_time * (1 - enhanced_fraction))  + original_execution_time * enhanced_fraction * (1 / speedup_enhanced)
= (original_execution_time)((1-enhanced_fraction) + enhanced_fraction/speedup_enhanced)

speedup_overall
= original_execution_time / new_execution_time
= 1/((1-enhanced_fraction) + enhanced_fraction/speedup_enhanced)

In the case of parallelization, Amdahl's law states that if enhanced_fraction is the proportion of a program that can be made parallel (i.e. benefit from parallelization), and (1 − enhanced_fraction) is the proportion that cannot be parallelized (remains serial), then the maximum speedup that can be achieved by using N processors (N times faster for the part that can be enhanced = speedup_enhanced) is

speedup_overall

= 1/((1-enhanced_fraction) + enhanced_fraction/speedup_enhanced)

In the limit, as N tends to infinity, the maximum speedup tends to 1 / (1 − enhanced_fraction).

No comments: