About SSD Dongjun Shin Samsung Electronics Outline SSD primer Optimal I/O for SSD Benchmarking Linux FS on SSD Case study: ext4, btrfs, xfs Design consideration for SSD What’s next? New interfaces for SSD Parallel processing of small I/O SSD Primer (1/2) Physical unit of flash memory Page NAND – unit for read & write Block NAND – unit for erase (a.k.a erasable block) Physical characteristics Erase before re-write Sequential write within an erasable block LBA space (visible to OS) Flash memory space NAND page (2-4kB) NAND block = 64-128 NAND pages Flash Translation Layer SSD Primer (2/2) Internal organization: 2-dimensional (NxM parallelism) Similar to RAID-0 (stripe size = sector or page NAND ) Effective page & block size is multiplied by NxM (max) SSD Controller running F/W(FTL) Host I/F (ex. SATA) N-channel (striping) M-way (pipelining) 0 4 8 12 32364044 1 5 9 13 33374145 2 6 1014 34384246 3 7 1015 35394347 16202428 48525660 17212529 49535761 18222630 50545862 Ch0 Ch1 Ch2 Ch3 Chip0 Chip1 Chip2 Chip3 32364044 64687276 48525660 80848892 Optimal I/O for SSD Key points Parallelism • The larger the size of I/O request, the better Match with physical characteristics • Alignment with page or block size of NAND* • Segmented sequential write (within an erasable block) What about Linux? HDD also favors larger I/O read-ahead, deferred aggregated write Segmented FS layout good if aligned with erasable block boundary Write optimization FS dependent (ex. allocation policy) * Usually, partition layout is not aligned (1st partition at LBA 63) Test environment (1/2) Hardware Intel Core 2 Duo E6550@2.33GHz, 1GB RAM Software Fedora 7 (Kernel 2.6.24) Benchmark: postmark Filesystems No journaling - ext2 Journaling - ext3, ext4, reiserfs, xfs • ext3, ext4: data=writeback,barrier=1[,extents] • xfs: logbsize=128k COW, log-structured - btrfs (latest unstable, 4k block), nilfs (testing-8) SSD Vendor M (32GB, SATA): read 100MB/s, write 80MB/s Test partition starts at LBA 16384 (8MB, aligned) Test environment (2/2) Postmark workload Ref: Evaluating Block-level Optimization through the IO Path (USENIX 2007) 9G/17G 9.7G/12G 600M/1.8G 630M/755M* Total app read/write 10,0004,2500.1-3MLL 10,0001,0000.1-3MLS 100,000100,0009-15KSL 100,00010,0009-15KSS # of transaction # of file (work-set) File sizeWorkload * Mostly write-only Benchmark results (1/2) Small file size (SS, SL) SS SL 0 500 1000 1500 2000 2500 ext2 ext3 ext4 reiserfs xfs btrfs nilfs transaction/sec Benchmark results (2/2) Large file size (LS, LL) LS LL 0 5 10 15 20 25 30 ext2 ext3 ext4 reiserfs xfs btrfs nilfs transaction/sec I/O statistics (1/2) Average size of I/O 0 20 40 60 80 100 120 140 SS SL LS LL SS SL LS LL read write Avg I/O size (Kbytes) ext2 ext3 ext4 reiserfs xfs btrfs nilfs . memory http://download.microsoft.com/download/a/f/d/afdfd50d-6eb 9-4 25e-84e1-b4085a80e34e/WNS-T432_WH07.pptx http://download.microsoft.com/download/d/f/6/df6accd 5-4 bf 2-4 98 4-8 285-f4f23b7b1f37/WinHEC2007_Micron_NAND_FlashMemory.doc http://download.microsoft.com/download/a/f/d/afdfd50d-6eb 9-4 25e-84e1-b4085a80e34e/SS-S486_WH07.pptx FTL. LL transaction/sec btrfs-4k btrfs-16k btrfs-ssd-4k 1. 4k is better than 16k (sequentiality = 12% : 2%) 2. ssd option is effective (1 0-4 0% improvement) Case study - xfs Condition Mount