Figure 15-6. Performance Bottlenecks and Capacities of Programs
Benchmarks and Repeatable Testing
Is It Really Faster?
Table 15-1. Runtimes for Four Trials
Table 15-2. Runtimes for Ten Trials
General Performance Optimizations
Best Algorithm
Compiler Optimization
C Compiler Optimization
Java Compiler Optimization
Buy Enough RAM
Minimize I/O
Minimize Cache Misses
Any Other Loop Optimizations
Thread-Specific Performance Optimizations
Reducing Contention
Minimizing MT Overhead
Reducing Paging
Figure 15-7. Using Threads to Optimize Paging
Communications Bandwidth
Right Number of Threads
Short-Lived Threads
Dealing with Many Open Sockets
The Lessons of NFS
Figure 15-8. NFS Throughput on a Series of Sun UE Machines (The performance improvement is somewhat exaggerated, as a two-way UE6000 will outperform a two-way UE 2.)
Summary
Chapter 16. Hardware
Types of Multiprocessors
Shared Memory Symmetric Multiprocessors
The CPU
The System
Figure 16-1. SMP System Architecture
Store Barriers
Bus Architectures
Direct-Switched Buses
Figure 16-2. Direct-Switched Memory Bus
Packet-Switched Buses
Figure 16-3. Packet-Switched Memory Bus
Crossbar Switches
Figure 16-4. Cluster Using a Crossbar Switch
Hierarchical Interconnects
Figure 16-5. Hierarchical Design of the SGI Origin Series
ccNUMA
Packet-Switched Buses and ldstub
Figure 16-6. Packet-Switched Memory Bus Running ldstub
Example 16-1 Spin Locks Done Better
The Thundering Herds
LoadLocked/StoreConditional and Compare and Swap
Example 16-2 Atomic Increment Using LoadLocked and StoreConditional
Figure 16-7. SMP System Architecture
Lock-Free Semaphores and Reference Counting
Volatile: The Rest of the Story
Atomic Reads and Writes
Interlocked Instructions
Memory Systems
Reducing Cache Misses
Table 16-1. Selected SPEC Benchmarks for Two UE 3500s
Cache Blocking
Data Reorganization
Word Tearing
False Sharing
Example 16-3 False Sharing
Summary
Chapter 17. Examples
Threads and Windows
Example 17-1 ThreadedSwing Program
Figure 17-1. ThreadedSwing Window Example
Displaying Things for a Moment (Memory.java)
Figure 17-2. The Memory Game
Example 17-2 How to Display Something for a Short Time
Socket Server (Master/Slave Version)
Socket Server (Producer/Consumer Version)
Example 17-3 Producer/Consumer Socket Program
Making a Native Call to pthread_setconcurrency()
Example 17-4 Setting the Concurrency Level in Solaris (TimeDiskSetConc.java)
Actual Implementation of POSIX Synchronization
Example 17-5 Correct Implementation of Mutexes and Condition Variables
A Robust, Interruptible Server
Example 17-6 A Robust Server
Disk Performance with Java
Example 17-7 Measuring Disk Access Throughput
Other Programs on the Web
Summary
Appendix A. Internet
Threads Newsgroup
Code Examples
Vendor's Threads Pages
Threads Research
Freeware Tools
Other Pointers
The Authors on the Net
Appendix B. Books
Threads Books
Java Threads
POSIX Threads
Win32 Threads
Related Books
Appendix C. Timings
Timings
Mutex Lock/Unlock
Table C-1. Timings of Various Thread-Related Functions on POSIX and Java (?s)
Explicit Synchronized
Implicit Synchronized
Readers/Writer Lock/Unlock
Semaphore Post/Wait
notify()
condSignal()
Local Context Switch (unbound)
Local Context Switch (bound)
Process Context Switch
Cancellation Disable/Enable
Test for Deferred Cancellation
Reference a Global Variable
Reference Thread-Specific Data
Reference "Fake" Thread-Specific Data
Appendix D. APIs
Function Descriptions
The Class java.lang.Thread
The Interface java.lang.Runnable
The Class java.lang.Object
The Class java.lang.ThreadLocal
The Class java.lang.ThreadGroup
Helper Classes from Our Extensions LibraryThe Class Extensions.InterruptibleThread