Connecting with Computer Science Chapter 10 Review: Chapter Summary: A hard drive is an example of a random access device that stores information in tracks and sectors and access data through read/write heads A file system is responsible for: creating, manipulating, renaming, copying and moving files on a storage device Windows uses FAT and NTFS as the file system The File Allocation Table (FAT) file system keeps track of which clusters files are using FAT is prone to disk fragmentation New Technology File System (NTFS) is the DEFAULT file system on current Windows operating systems and uses a Master File Table (MFT) to keep track of files and directories on a volume NTFS has many advantages over FAT, such as better reliability and security, journaling, file encryption and file compression Linux can be used with many file systems, such as: XFS, JFS, Reiser and ext3, but ext3 is used most commonly A file contains binary or text data Data is usually stored and accessed sequentially or randomly (relative access.) Hashing is a common method for accessing a relative file and uses a hashing algorithm to generate a hash key value for identifying a record location Collisions occur when the hash key is duplicated for more than one relative record location The goal of hashing is to create an algorithm that allows converting a key field into a relative record number with few collisions Key Terms: Cluster (356) Collision (368) Disk Fragmentation (358) Encryption File System (361) FAT (356) File Compression (361) File System (353) Hash Key (366) Hashing (366) Hashing Algorithm (367) Master File Table (360) NTFS (356) Overflow Area (368) Random Access (355) Sequential Access (355) Area of the hard drive containing a group of the smallest units that can be accessed on a disk (sectors) In hashing, what happens when the hashing algorithm generates the same relative key for more than one original key value Occurs when files' clusters are scattered in different locations on the storage medium instead of being in contiguous locations An encryption technology that converts data in a file to unreadable information by using an encryption algorithm and key value; to make the information readable again, you must decrypt it with another key value File management system used to locate files on a storage medium The process of reducing file size and, therefore, taking up less disk space The part of the OS responsible for: creating, manipulating, renaming, copying and moving files to and from a storage device A unique value used in hashing algorithms and identifying records A common method for accessing data in a file or database table with a unique value called the hash key A routine of logic used for determining how hash values are created A table used in NTFS to store data about every file and directory on the volume File management system introduced in Windows NT and incorporated into all desktop and server Windows OS's since then; used to locate files on a storage medium Area in a file that is used in case a collision occurs during the hashing algorithm Reading data from or writing data to anywhere on a disk Reading and writing data in order from the beginning Test Yourself: 1.) Describe what a file system does A file system is the part of the OS responsible for creating, manipulating, renaming, copying and moving files to and from storage devices A hard drive is the most common place to store files 2.) Describe the key characteristics of FAT The key characteristics of FAT are: FAT keeps track of which clusters it can use for writing information and which are already in use storing file data it also marks which clusters are bad and no longer usable so that the file system does not attempt to use them The main advantage of FAT is its efficient use of disk space Because large files are not required to have continuous clusters, FAT can place the parts of the file where they fit Another advantage of fat is that filenames can now have up to 255 characters Under FAT16 file names were limited to characters for names and characters for file extensions One more advantage of using FAT is how easy it is to recover files that have been deleted When a file is deleted, the file system does not actually remove it from the hard drive; instead it places the hex value E5h in the first position of the filename The actual file remains on the hard drive and can be recovered via software 3.) Describe how a drive becomes fragmented Occurs when files' clusters are scattered in different locations on the storage medium instead of being in contiguous locations Usually happens when users install, uninstall and delete files and programs that reside on their hard drive on a perpetual basis 4.) Explain how defragmentation works and how it can improve system performance Defragmentation works by putting "chunks of data" back into contagious blocks so your computer hard drive doesn't have to spin as much to get the information How much of a difference it makes depends on the level of fragmentation on your drive at the particular time 5.) How does FAT differ from NTFS and when is each used The main advantage of FAT is its efficient use of disk space Because large files are not required to have contiguous clusters, FAT can place the parts of the file wherever they fit One more advantage of using FAT is how easy it is to recover files that have been deleted When a file is deleted, the file system does not actually remove it from the hard drive; instead it places the hex value E5h in the first position of the filename The actual file remains on the hard drive and can be recovered via software NTFS is used when journaling and metadata support is required FAT although an older file system is used when data recovery is essential 6.) What are the advantages and disadvantages of FAT? Advantages of FAT: The main advantage of FAT is its efficient use of disk space FAT can place the parts of the file wherever they fit File names can be up to 255 characters and file extensions longer than characters Easy to recover file names that have been deleted Disadvantages of FAT: Overall performance slows down as more files are stored on the drive Drives can become fragmented quite easily FAT lacks many of the security features in NTFS such as being able to assign access rights to files and directories It can also have file integrity problems such as lost clusters, invalid files and directories and allocation errors 7.) Describe the key characteristics of NTFS key characteristics of NTFS are: Master File table and Journaling 8.) What are the advantages and disadvantages of NTFS? Advantages: The structure of NTFS makes file access fast and reliable In NTFS, the file system simply goes right to the file as soon as you request it With the MFT, the file system can recover from problems without losing a lot of data The journaling feature also makes restoring to a stable system easy There is also a backup copy of the MFT (mirrored) in case of damage to the main MFT Security has been improved compared with FAT Under NTFS, an administrator can specify which user or groups of users can perform certain operations on files and directories NTFS is geared more towards a networked environment, so security measures have been increased NTFS also supports file encryption with Encrypting File System and file attributes so you can encrypt files to protect them from unauthorized access Finally NTFS includes a file attribute that controls file compression A user can set a file to be compressed and save disk space Because NTFS has a larger system overhead than FAT, it is not recommended as a file management system on volumes smaller than 4GB Another disadvantage is that you cannot access NTFS volumes from MS-DOS, Win 95 or Win 98 Also many Linux distributions cannot write to NTFS drives 9.) Describe the Master File Table (MFT) and how it works The MFT is used to store data about every file and directory [Meta-Data] on the volume, and the OS uses data in this table to retrieve files Data stored in the MFT includes a file's size, name, and permission among other information 10.) What are the advantages and disadvantages of file compression? Advantages: It minimizes the amount of disk space needed And the entire process is transparent to the user, meaning that the file does not have to be uncompressed before the user can read it Disadvantage: It slows performance and you cannot encrypt a compressed file 11.) What is the difference between a text file and a binary file? Text : Consist of ASCII or Unicode characters Each time you type a character in a file, the file system stores a byte in the file Text files are typically read with word processing programs or text editors, such as Notepad in Windows or gedit in UNIX/Linux, and are easy to view and modify Binary: files cannot be read with these programs, and the term binary is often used to refer to any file that is not a text file Binary files can be read by computers but not humans and contain coded and numeric information They are also more compact than text files Some examples of binary files are: executable programs, applications, images and sound files 12.) How does sequential file access differ from random file access? Sequential file access: A sequential file is accessed starting at the beginning of the file and is processed to the end of the file The data stored in the file can be thought of as one long row of information An example of a sequential file in an audio file or a video file When you add data it is written at the end of the file Random file access: Accessing a particular record in a file is faster if you can position the read/write head directly on the record without having to read all the records in-front of it If all records are the same size you can mathematically calculate the record's position on the disk surface and go right to it This is the principle behind random access Random access requires fixed-length records For example, you have a file of 1000 records and each record contains 100 bytes If you want to access record 538, you multiply the record number by the record size (538 * 1000) and calculate that the record you are looking for is 53,800 bytes into the file Depending on how many bytes per sector and per track are on the disk, you can position the read/write head at the exact point 13.) What are the strengths and weaknesses of sequential file access and random file access? The advantage of sequential file access is that because data is appended to the end of the file, the writing process is fast On the other hand, retrieving data can be extremely slow, depending of the data's location The advantages of random file access are getting to a particular file faster and being able to update the record in place The disadvantage is that disk space can be wasted if data does not fill the entire record or if some record numbers not have data Random file access works well when a sequential record number can identify records easly 14.) Explain how hashing works When records cannot be identified by a sequential numeric value, or the numeric identified value is not in sequential order beginning with 1, another technique has been developed that allows using nonnumeric record keys to access relative records This technique called hashing, is widely used in database management systems Hashing uses a hashing algorithm to generate a unique value called a hash key for each record The hash key is then used as a key value in a list of rows or records of information Combining hash keys establishes an index 15.) You are trying to create a hashing algorithm to work with information stored for a student registration system Each student is identified by a student ID, which is characters The number of student ID's range from 1,000,000 to 9,999,999 Write a hashing algorithm that minimizes collisions Practice Exercises : 1.) Which of the following IS NOT a responsibility of the file system? None of the above [The file system IS responsible for doing everything listed.] 2.) Sectors are made up of clusters True Sectors are made up of clusters [the smallest units that can be accessed on a disk] 3.) In FAT, files not need to be stored in contiguous blocks of memory True 4.) Which of the following FAT formats allows the greatest volume size? FAT32 5.) Which tool is used reorganize clusters so as to minimize drive head movement? Disk Defragmenter Utility 6.) FAT 32 provides the capability to assign access rights to a file and directory False 7.) Which is NOT an advantage of using NTFS? Efficient disk use on small volumes 8.) Which is not a file system used in Linux? HFS+ Mac OS not Linux 9.) You are tracking information on rocket launches Each launch is assigned a number from 1,000 to 100,000 There will probably be around 5000 launches, and you are using a hashing algorithm that divides the highest possible number of launches by the expected number of launches What is the hashing algorithm key in this situation? 20 10.) Using the information from problem 9, if you have a rocket launch number of 80,000, what is the relative record? 4000 ... example, you have a file of 100 0 records and each record contains 100 bytes If you want to access record 538, you multiply the record number by the record size (538 * 100 0) and calculate that the... files Data stored in the MFT includes a file's size, name, and permission among other information 10. ) What are the advantages and disadvantages of file compression? Advantages: It minimizes the... You are tracking information on rocket launches Each launch is assigned a number from 1,000 to 100 ,000 There will probably be around 5000 launches, and you are using a hashing algorithm that