12 Conclusions 185 Multi−Threaded Wide SIMD I$ D$ Multi−Threaded Wide SIMD I$ D$ Multi−Threaded Wide SIMD I$ D$ Multi−Threaded Wide SIMD I$ D$ Memory Controller Fixed FunctionTexture Logic Memory Controller Display InterfaceSystem Interface L2 Cache Fig. 12.2 Larrabee architecture from Intel DRAM I/F HOST I/F DRAM I/F DRAM I/F DRAM I/F DRAM I/F DRAM I/F Giga Thread L2 Shared Multiprocessor Core Fig. 12.3 Fermi architecture from NVIDIA multiprocessor (SM). The block diagram of a single SM is shown in Fig. 12.4 and the block diagram of a core within an SM is shown in Fig. 12.5. With these upcoming architectures, newer approaches for hardware acceleration of algorithms would become viable. These approaches could exploit the more gen- eral computing paradigm offered by the newer architectures. For example, the close coupling between the GPU and the CPU (which reside on the same die) would 186 12 Conclusions Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Instruction Cache Register File Dispatch Dispatch SchedulerScheduler Load/Store Units X 16 Special Func Units X 4 Interconnect Network 64K Configurable Cache/Shared Mem Uniform Cache Fig. 12.4 Block diagram of a single shared multiprocessor (SM) in Fermi reduce the communication cost. Also, in these upcoming architectures the instruc- tion dispatch unit is distributed, and the instruction set is more general purpose. These enhancements would enable a more general computing paradigm (in compar- ison to the SIMD paradigm for current GPUs), which in turn would enable acceler- ation opportunities for more EDA applications. The approaches presented in this monograph collectively aim to contribute toward enabling the CAD community to accelerate EDA algorithms on modern hardware platforms. Our work demonstrates techniques to rearchitect several EDA algorithms to maximally harness their performance on the alternative platforms under consideration. References 187 Dispatch Port Operand Collector FP Unit INT Unit Result Queue CUDA Core Fig. 12.5 Block diagram of a single processor (core) in SM References 1. http://www.cs.chalmers.se/cs/research/formalmethods/minisat/main.html. The MiniSAT Page 2. NVIDIA Tesla GPU Computing Processor. http://www.nvidia.com/object/IO_ 43499.html 3. OmegaSim Mixed-Signal Fast-SPICE Simulator. http://www.nascentric.com/ product.html 4. Lee, H.K., Ha, D.S.: An efficient, forward fault simulation algorithm based on the parallel pattern single fault propagation. In: Proceedings of the IEEE International Test Conference on Test, pp. 946–955. IEEE Computer Society, Washington, DC (1991) 5. Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M., Dubey, P., Junkins, S., Lake, A., Sugerman, J., Cavin, R., Espasa, R., Grochowski, E., Juan, T., Hanrahan, P.: Larrabee: A many-core x86 architecture for visual computing. ACM Transactions on Graphics 27(3), 1–15 (2008) 6. Silva, M., Sakallah, J.: GRASP-a new search algorithm for satisfiability. In: Proceedings of the International Conference on Computer-Aided Design (ICCAD), pp. 220–7 (1996) Index A Accelerators, 9 ACML-GPU, 15 Activity, 93 Algorithm parallel, 120, 121, 134 Amdahl’s Law, 158, 170 Application specific, 64 Arrival time, 110 Assignment, 31, 37, 40 B Backtracking, 32 Bandwidth, 13 Bandwidth minimization, 52 Bank conflict, 27 BCP, 32, 37, 40 Bias survey propagation, 89 Bins, 64 Bin packing, 52, 70 Bin utilization, 74 Bit parallel, 135, 146 Block, 28 Block-based SSTA, 108 Board test, 15 Boolean Constant Propagation, see BCP Boolean Satisfiability, see SAT Box-Muller, 101 BRAM, 11, 14, 32, 63, 66, 72, 78 Brook+, 15 BSIM3 SPICE, 158 BSIM4 SPICE, 158 Bulldog Fortran, 171 C Capacity, 31, 35 CDFG, 160 Clause, 31 Clock speed, 11 CNF, 31, 34 Co-processors, 9 Compilers, 16 Complete SAT, 83, 85 Conflict, 37, 40, 42, 44, 71 Conflict clause, 31 Conflict clause generation, 33, 64 Conjunctive Normal Form, 34 Constant Memory, 26, 161 Control and dataflow graph, 173 Control dominated EDA, 3 Control plus data parallel EDA, 3 Core, 185 Critical line critical path tracing, 138 Critical path tracing, 138 CUBLAS, 15 CUDA, 15, 24 CUFFT, 15 Cumulative detectability, 138 Custom IC, 7, 10, 33 D Data parallel, 28, 106, 120, 122, 134 Debuggers, 16 Decision engine, 37, 39, 49, 70 Decision level, 39, 67 Decisions SAT, 32 Detectability, 138 DFF, 11 DIMACS, 45 Dimblock, 29 K. Gulati, S.P. Khatri, Hardware Acceleration of EDA Algorithms, DOI 10.1007/978-1-4419-0944-2, C Springer Science+Business Media, LLC 2010 189 190 Index Dimensionality, 29 Dimgrid, 29 Divide, 12 Dominator, 138 DPLL, 85 DRAM, 14, 66, 184 Dropped fault table, 134 Dynamic power, 10 Dynamic bulk modulation, 10 E EDA, 3 Embedded processor, 10 F Factor Graph, 87 Fault detection, 134 Fault diagnosis, 134 Fault dropping, 134 Fault grading, 102, 120 Fault injection, 135 Fault parallel data parallel, 120 Fault simulation, 4, 119 Fault table, 4, 134 Fermi, 184 Fingerprinting, 19 FPGA, 3, 7, 10, 32 Function Factor Graph, 87 G Global Memory, 13, 27, 110, 159 GPGPU, 3 Graphics Processors, see GPU GRASP, 35, 64, 85 Grid, see dimgrid GridSAT, 87 GSAT, 85 H Hardware IP cores, 15 HDL, 10, 14, 19 Hybrid SAT, 85 I Immediate dominator dominator, 138 Implication, 37, 40, 44 Implication graph, 31, 33, 37, 50, 64 Infringement security, 19 Input vector control, 10 Instance specific, 64 Inter-bin non-chronological, 32 Intra-bin non-chronological, 32 IP cores, 15 K Kernel, 28, 167, 184 L Larrabee, 184 Latency, 11, 13 Leakage power, 10 Levelize, 112 Literal, 37 free literal, 41 Local memory, 12, 27 Logic analyzers, 15 Lookup table, 11, 106, 120 LUT, 12 M Memory bandwidth, 1, 13 Memory wall, 1 Mersenne Twister, 101, 106, 112 MIMD, 171 Minimum unsatisfiable core, 31, 33, 53 MiniSAT, 85 MNA SPICE, 154 Model evaluation, 154 Model parallel, 122, 134 Monte Carlo, 4 SSTA, 101, 106 Moore’s Law, 24 MOPs, 17 MOPs per watt MOPs, 17 Multi-GPU, 16 Multi-port memory, 20 Multiprocessor, 12, 24 MUX, 11 N Newton-Raphson, 154 NMOS passgates, 11 Index 191 Non-chronological backtrack, 32, 43, 45, 64, 68, 85 Non-recurring engineering, 10, 18 Non-volatile memory, 20 O Off-chip, 14 On-chip, 14 OPB, 67, 72 P Paging, 12 Parafrase, 170 Parallel SAT, 85 Partition, 32, 35, 63, 78 Pass/fail fault dictionary, 134 Path-based SSTA, 108 Pattern parallel data parallel, 120 PCI, 15 PCI-X PCI, 15 Pipeline, 11 Piracy security, 19 PLB, 67, 72 PLB2OPB bridge, 72 Power, 10, 56 average power, 58 Power delay product, 18 Power gating, 10 Power wall, 1 PowerPC, 32 Precharged, 39 Predischarged, 39 Process variations, 106 Processor, 24 Profiling code, 16 Programmable, 12 Prototyping, 16 Q QuickPath Interconnect, 18 R Random variations, 106 Re-programmability, 19 Reconfigurable logic FPGA, 11 Reconfigure, 12 Reduced OR, 144 Register, 26, 172 Resolution, 36 Reuse-based design, 19 S Sample parallelism, 106 SAT, 4, 31, 33, 34, 36 3-SAT, 36 Scalability, 15, 31, 35, 66 Scattered reads, 29 SEE, 18, 114 Self-test, 15 Sensitive input, 138 Shared Memory, 26, 27, 110 Shared multiprocessor, 185 SIMD, 3, 18, 29 Software IP cores, 15 Span, 69 Speedup, 31 SPICE, 31, 153 Square root, 12 SRAM, 11 SSTA, 4, 101, 106 STA, 101, 106 SPICE, 154 Stem, 137 Stem region, 138, 143 Stochastic SAT, 83, 85 Subroutine, 167 Subsumption resolution, 56 Successive chord, 156 Supply voltage, 10 Survey propagation, 84 Surveys survey propagation, 88 Synchronization points, 29 Synchronize, 28 System test, 15 Systematic variations, 106 T Termination cell, 40 Texture fetching Texture Memory, 27 Texture Memory, 26, 110, 155, 160 Thread, 28, 146 Thread block, 28 Thread parallel, 135 192 Index Thread scheduler, 29 Throughput, 11 Time slicing, 29 Tree Factor Graph, 87 U Unate covering, 134 V Variable, 31, 37 Factor Graph, 87 Variable ordering SAT, 32 Variable Vt, 10 Variations, 106 Virtual memory, 12 VLIW, 171 VLSI, 106 VSIDS, 93 W WalkSAT, 85, 90, 96 Warp size, 29 Warps, 29 Watermarking, 19 X XC2VP30 FPGA, 32 . diagram of a single SM is shown in Fig. 12.4 and the block diagram of a core within an SM is shown in Fig. 12.5. With these upcoming architectures, newer approaches for hardware acceleration of algorithms. 67 Decisions SAT, 32 Detectability, 138 DFF, 11 DIMACS, 45 Dimblock, 29 K. Gulati, S.P. Khatri, Hardware Acceleration of EDA Algorithms, DOI 10.1007/978-1-4419-0944-2, C Springer Science+Business Media,. opportunities for more EDA applications. The approaches presented in this monograph collectively aim to contribute toward enabling the CAD community to accelerate EDA algorithms on modern hardware platforms.