Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 85 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
85
Dung lượng
10,37 MB
Nội dung
I warmly recommend that you crank UAC up to the maximum (and put up with the occasional security dialog), run Visual Studio as a nonadministrator (as far as is possi- ble), and think at every stage about the least possible privileges you can grant to your users that will still let them get their work done. Making your app more secure benefits everyone: not just your own users, but everyone who doesn’t receive a spam email or a hack attempt because the bad guys couldn’t exploit your application. We’ve now handled the exception nicely—but is stopping really the best thing we could have done? Would it not be better to log the fact that we were unable to access particular directories, and carry on? Similarly, if we get a DirectoryNotFoundException or FileNot FoundException, wouldn’t we want to just carry on in this case? The fact that someone has deleted the directory from underneath us shouldn’t matter to us. If we look again at our sample, it might be better to catch the DirectoryNotFoundExcep tion and FileNotFoundException inside the InspectDirectories method to provide a more fine-grained response to errors. Also, if we look at the documentation for FileInfo, we’ll see that it may actually throw a base IOException under some circum- stances, so we should catch that here, too. And in all cases, we need to catch the security exceptions. We’re relying on LINQ to iterate through the files and folders, which means it’s not entirely obvious where to put the exception handling. Example 11-28 shows the code from InspectDirectories that iterates through the folders, to get a list of files. We can’t put exception handling code into the middle of that query. Example 11-28. Iterating through the directories var allFilePaths = from directory in directoriesToSearch from file in Directory.GetFiles(directory, "*.*", searchOption) select file; However, we don’t have to. The simplest way to solve this is to put the code that gets the directories into a separate method, so we can add exception handling, as Exam- ple 11-29 shows. Example 11-29. Putting exception handling in a helper method private static IEnumerable<string> GetDirectoryFiles( string directory, SearchOption searchOption) { try { return Directory.GetFiles(directory, "*.*", searchOption); } catch (DirectoryNotFoundException dnfx) { Console.WriteLine("Warning: The specified directory was not found"); Console.WriteLine(dnfx.Message); } catch (UnauthorizedAccessException uax) 406 | Chapter 11: Files and Streams { Console.WriteLine( "Warning: You do not have permission to access this directory."); Console.WriteLine(uax.Message); } return Enumerable.Empty<string>(); } This method defers to Directory.GetFiles, but in the event of one of the expected errors, it displays a warning, and then just returns an empty collection. There’s a problem here when we ask GetFiles to search recursively: if it encounters a problem with even just one directory, the whole opera- tion throws, and you’ll end up not looking in any directories. So while Example 11-29 makes a difference only when the user passes multiple directories on the command line, it’s not all that useful when using the /sub option. If you wanted to make your error handling more fine- grained still, you could write your own recursive directory search. The GetAllFilesInDirectory example in Chapter 7 shows how to do that. If we modify the LINQ query to use this, as shown in Example 11-30, the overall pro- gress will be undisturbed by the error handling. Example 11-30. Iterating in the face of errors var allFilePaths = from directory in directoriesToSearch from file in GetDirectoryFiles(directory, searchOption) select file; And we can use a similar technique for the LINQ query that populates the fileNameGroups—it uses FileInfo, and we need to handle exceptions for that. Exam- ple 11-31 iterates through a list of paths, and returns details for each file that it was able to access successfully, displaying errors otherwise. Example 11-31. Handling exceptions from FileInfo private static IEnumerable<FileDetails> GetDetails(IEnumerable<string> paths) { foreach (string filePath in paths) { FileDetails details = null; try { FileInfo info = new FileInfo(filePath); details = new FileDetails { FilePath = filePath, FileSize = info.Length }; When Files Go Bad: Dealing with Exceptions | 407 } catch (FileNotFoundException fnfx) { Console.WriteLine("Warning: The specified file was not found"); Console.WriteLine(fnfx.Message); } catch (IOException iox) { Console.Write("Warning: "); Console.WriteLine(iox.Message); } catch (UnauthorizedAccessException uax) { Console.WriteLine( "Warning: You do not have permission to access this file."); Console.WriteLine(uax.Message); } if (details != null) { yield return details; } } } We can use this from the final LINQ query in InspectDirectories. Example 11-32 shows the modified query. Example 11-32. Getting details while tolerating errors var fileNameGroups = from filePath in allFilePaths let fileNameWithoutPath = Path.GetFileName(filePath) group filePath by fileNameWithoutPath into nameGroup select new FileNameGroup { FileNameWithoutPath = nameGroup.Key, FilesWithThisName = GetDetails(nameGroup).ToList() }; Again, this enables the query to process all accessible items, while reporting errors for any problematic files without having to stop completely. If we compile and run again, we see the following output: C:\Users\mwa\AppData\Local\dcyx0fv1.hv3 C:\Users\mwa\AppData\Local\0nf2wqwr.y3s C:\Users\mwa\AppData\Local\kfilxte4.exy Warning: You do not have permission to access this directory. Access to the path 'C:\Users\mwa\AppData\Local\r2gl4q1a.ycp\' is denied. SameNameAndContent.txt C:\Users\mwa\AppData\Local\dcyx0fv1.hv3 C:\Users\mwa\AppData\Local\0nf2wqwr.y3s C:\Users\mwa\AppData\Local\kfilxte4.exy 408 | Chapter 11: Files and Streams We’ve dealt cleanly with the directory to which we did not have access, and have con- tinued with the job to a successful conclusion. Now that we’ve found a few candidate files that may (or may not) be the same, can we actually check to see that they are, in fact, identical, rather than just coincidentally having the same name and length? Reading Files into Memory To compare the candidate files, we could load them into memory. The File class offers three likely looking static methods: ReadAllBytes, which treats the file as binary, and loads it into a byte array; File.ReadAllText, which treats it as text, and reads it all into a string; and File.ReadLines, which again treats it as text, but loads each line into its own string, and returns an array of all the lines. We could even call File.OpenRead to obtain a StreamReader (equivalent to the StreamWriter, but for reading data—we’ll see this again later in the chapter). Because we’re looking at all file types, not just text, we need to use one of the binary- based methods. File.ReadAllBytes returns a byte[] containing the entire contents of the file. We could then compare the files byte for byte, to see if they are the same. Here’s some code to do that. First, let’s update our DisplayMatches function to do the load and compare, as shown by the highlighted lines in Example 11-33. Example 11-33. Updating DisplayMatches for content comparison private static void DisplayMatches( IEnumerable<FileNameGroup> filesGroupedByName) { var groupsWithMoreThanOneFile = from nameGroup in filesGroupedByName where nameGroup.FilesWithThisName.Count > 1 select nameGroup; foreach (var fileNameGroup in groupsWithMoreThanOneFile) { // Group the matches by the file size, then select those // with more than 1 file of that size. var matchesBySize = from match in fileNameGroup.FilesWithThisName group match by match.FileSize into sizeGroup where sizeGroup.Count() > 1 select sizeGroup; foreach (var matchedBySize in matchesBySize) { List<FileContents> content = LoadFiles(matchedBySize); CompareFiles(content); } } } Reading Files into Memory | 409 Notice that we want our LoadFiles function to return a List of FileContents objects. Example 11-34 shows the FileContents class. Example 11-34. File content information class internal class FileContents { public string FilePath { get; set; } public byte[] Content { get; set; } } It just lets us associate the filename with the contents so that we can use it later to display the results. Example 11-35 shows the implementation of LoadFiles, which uses ReadAllBytes to load in the file content. Example 11-35. Loading binary file content private static List<FileContents> LoadFiles(IEnumerable<FileDetails> fileList) { var content = new List<FileContents>(); foreach (FileDetails item in fileList) { byte[] contents = File.ReadAllBytes(item.FilePath); content.Add(new FileContents { FilePath = item.FilePath, Content = contents }); } return content; } We now need an implementation for CompareFiles, which is shown in Example 11-36. Example 11-36. CompareFiles method private static void CompareFiles(List<FileContents> files) { Dictionary<FileContents, List<FileContents>> potentiallyMatched = BuildPotentialMatches(files); // Now, we're going to look at every byte in each CompareBytes(files, potentiallyMatched); DisplayResults(files, potentiallyMatched); } This isn’t exactly the most elegant way of comparing several files. We’re building a big dictionary of all of the potential matching combinations, and then weeding out the ones that don’t actually match. For large numbers of potential matches of the same size this could get quite inefficient, but we’ll not worry about that right now! Exam- ple 11-37 shows the function that builds those potential matches. 410 | Chapter 11: Files and Streams Example 11-37. Building possible match combinations private static Dictionary<FileContents, List<FileContents>> BuildPotentialMatches(List<FileContents> files) { // Builds a dictionary where the entries look like: // { 0, { 1, 2, 3, 4, N } } // { 1, { 2, 3, 4, N } // // { N - 1, { N } } // where N is one less than the number of files. var allCombinations = Enumerable.Range(0, files.Count - 1).ToDictionary( x => files[x], x => files.Skip(x + 1).ToList()); return allCombinations; } This set of potential matches will be whittled down to the files that really are the same by CompareBytes, which we’ll get to momentarily. The DisplayResults method, shown in Example 11-38, runs through the matches and displays their names and locations. Example 11-38. Displaying matches private static void DisplayResults( List<FileContents> files, Dictionary<FileContents, List<FileContents>> currentlyMatched) { if (currentlyMatched.Count == 0) { return; } var alreadyMatched = new List<FileContents>(); Console.WriteLine("Matches"); foreach (var matched in currentlyMatched) { // Don't do it if we've already matched it previously if (alreadyMatched.Contains(matched.Key)) { continue; } else { alreadyMatched.Add(matched.Key); } Console.WriteLine(" "); Console.WriteLine(matched.Key.FilePath); foreach (var file in matched.Value) { Console.WriteLine(file.FilePath); alreadyMatched.Add(file); } } Console.WriteLine(" "); } Reading Files into Memory | 411 This leaves the method shown in Example 11-39 that does the bulk of the work, com- paring the potentially matching files, byte for byte. Example 11-39. Byte-for-byte comparison of all potential matches private static void CompareBytes( List<FileContents> files, Dictionary<FileContents, List<FileContents>> potentiallyMatched) { // Remember, this only ever gets called with files of equal length. int fileLength = files[0].Content.Length; var sourceFilesWithNoMatches = new List<FileContents>(); for (int fileByteOffset = 0; fileByteOffset < fileLength; ++fileByteOffset) { foreach (var sourceFileEntry in potentiallyMatched) { byte[] sourceContent = sourceFileEntry.Key.Content; for (int otherIndex = 0; otherIndex < sourceFileEntry.Value.Count; ++otherIndex) { // Check the byte at i in each of the two files, if they don't // match, then we remove them from the collection byte[] otherContent = sourceFileEntry.Value[otherIndex].Content; if (sourceContent[fileByteOffset] != otherContent[fileByteOffset]) { sourceFileEntry.Value.RemoveAt(otherIndex); otherIndex -= 1; if (sourceFileEntry.Value.Count == 0) { sourceFilesWithNoMatches.Add(sourceFileEntry.Key); } } } } foreach (FileContents fileWithNoMatches in sourceFilesWithNoMatches) { potentiallyMatched.Remove(fileWithNoMatches); } // Don't bother with the rest of the file if // there are no further potential matches if (potentiallyMatched.Count == 0) { break; } sourceFilesWithNoMatches.Clear(); } } We’re going to need to add a test file that differs only in the content. In CreateTest Files add another filename that doesn’t change as we go round the loop: string fileSameSizeInAllButDifferentContent = "SameNameAndSizeDifferentContent.txt"; 412 | Chapter 11: Files and Streams Then, inside the loop (at the bottom), we’ll create a test file that will be the same length, but varying by only a single byte: // And now one that is the same length, but with different content fullPath = Path.Combine(directory, fileSameSizeInAllButDifferentContent); builder = new StringBuilder(); builder.Append("Now with "); builder.Append(directoryIndex); builder.AppendLine(" extra"); CreateFile(fullPath, builder.ToString()); If you build and run, you should see some output like this, showing the one identical file we have in each file location: C:\Users\mwa\AppData\Local\e33yz4hg.mjp C:\Users\mwa\AppData\Local\ung2xdgo.k1c C:\Users\mwa\AppData\Local\jcpagntt.ynd Warning: You do not have permission to access this directory. Access to the path 'C:\Users\mwa\AppData\Local\cmoof2kj.ekd\' is denied. Matches C:\Users\mwa\AppData\Local\e33yz4hg.mjp\SameNameAndContent.txt C:\Users\mwa\AppData\Local\ung2xdgo.k1c\SameNameAndContent.txt C:\Users\mwa\AppData\Local\jcpagntt.ynd\SameNameAndContent.txt Needless to say, this isn’t exactly very efficient; and it is unlikely to work so well when you get to those DVD rips and massive media repositories. Even your 64-bit machine probably doesn’t have quite that much memory available to it. * There’s a way to make this more memory-efficient. Instead of loading the file completely into memory, we can take a streaming approach. Streams You can think of a stream like one of those old-fashioned news ticker tapes. To write data onto the tape, the bytes (or characters) in the file are typed out, one at a time, on the continuous stream of tape. We can then wind the tape back to the beginning, and start reading it back, character by character, until either we stop or we run off the end of the tape. Or we could give the tape to someone else, and she could do the same. Or we could read, say, 1,000 characters off the tape, and copy them onto another tape which we give to someone to work on, then read the next 1,000, and so on, until we run out of characters. * In fact, it is slightly more constrained than that. The .NET Framework limits arrays to 2 GB, and will throw an exception if you try to load a larger file into memory all at once. Streams | 413 Once upon a time, we used to store programs and data in exactly this way, on a stream of paper tape with holes punched in it; the basic tech- nology for this was invented in the 19th century. Later, we got magnetic tape, although that was less than useful in machine shops full of electric motors generating magnetic fields, so paper systems (both tape and punched cards) lasted well into the 1980s (when disk systems and other storage technologies became more robust, and much faster). The concept of a machine that reads data items one at a time, and can step forward or backward through that stream, goes back to the very foundations of modern computing. It is one of those highly resilient metaphors that only really falls down in the face of highly parallelized algorithms: a single input stream is often the choke point for scalability in that case. To illustrate this, let’s write a method that’s equivalent to File.ReadAllBytes using a stream (see Example 11-40). Example 11-40. Reading from a stream private static byte[] ReadAllBytes(string filename) { using (FileStream stream = File.OpenRead(filename)) { long streamLength = stream.Length; if (streamLength > 0x7fffffffL) { throw new InvalidOperationException( "Unable to allocate more than 0x7fffffffL bytes" + "of memory to read the file"); } // Safe to cast to an int, because // we checked for overflow above int bytesToRead = (int) stream.Length; // This could be a big buffer! byte[] bufferToReturn = new byte[bytesToRead]; // We're going to start at the beginning int offsetIntoBuffer = 0; while (bytesToRead > 0) { int bytesRead = stream.Read(bufferToReturn, offsetIntoBuffer, bytesToRead); if (bytesRead == 0) { throw new InvalidOperationException( "We reached the end of file before we expected " + "Has someone changed the file while we weren't looking?"); } // Read may return fewer bytes than we asked for, so be // ready to go round again. bytesToRead -= bytesRead; offsetIntoBuffer += bytesRead; 414 | Chapter 11: Files and Streams } return bufferToReturn; } } The call to File.OpenRead creates us an instance of a FileStream. This class derives from the base Stream class, which defines most of the methods and properties we’re going to use. First, we inspect the stream’s Length property to determine how many bytes we need to allocate in our result. This is a long, so it can support truly enormous files, even if we can allocate only 2 GB of memory. If you try using the stream.Length argument as the array size without checking it for size first, it will compile, so you might wonder why we’re doing this check. In fact, C# converts the argument to an int first, and if it’s too big, you’ll get an OverflowException at runtime. By checking the size explicitly, we can provide our own error message. Then (once we’ve set up a few variables) we call stream.Read and ask it for all of the data in the stream. It is entitled to give us any number of bytes it likes, up to the number we ask for. It returns the actual number of bytes read, or 0 if we’ve hit the end of the stream and there’s no more data. A common programming error is to assume that the stream will give you as many bytes as you asked for. Under simple test conditions it usually will if there’s enough data. However, streams can and sometimes do return you less in order to give you some data as soon as possible, even when you might think it should be able to give you everything. If you need to read a certain amount before proceeding, you need to write code to keep calling Read until you get what you require, as Exam- ple 11-40 does. Notice that it returns us an int. So even if .NET did let us allocate arrays larger than 2 GB (which it doesn’t) a stream can only tell us that it has read 2 GB worth of data at a time, and in fact, the third argument to Read, where we tell it how much we want, is also an int, so 2 GB is the most we can ask for. So while FileStream is able to work with larger files thanks to the 64-bit Length property, it will split the data into more modest chunks of 2 GB or less when we read. But then one of the main reasons for using streams in the first place is to avoid having to deal with all the content in one go, so in practice we tend to work with much smaller chunks in any case. Streams | 415 Download from Library of Wow! eBook <www.wowebook.com> [...]... [Assembly] ... [Domain] ... [Assembly] ... args) { Random random = new Random(3); for (int i = 0; i < 5; ++i) { Console.WriteLine(random.Next()); } Console.ReadKey(); } If you compile and run, you should see this output: 63 0327709 14980442 46 1857544709 4 262 53993 120 364 3911 No, I’m not Nostradamus It is just that the “random” algorithm is actually entirely predictable, given a particular seed Normally that seed comes from Environment.Tick Count,... second is an object that we can pass in, that will get passed back to us in the callback 4 26 | Chapter 11: Files and Streams This user state object is common to a lot of asynchronous operations, and is used to get information from the calling site to callbacks from the worker thread It has become less useful in C# with the availability of lambdas and anonymous methods which have access to variables in... /machine option) or current roaming user (by specifying /roaming) 4 36 | Chapter 11: Files and Streams So, if you try running this command: storeadm /MACHINE /LIST you will see output similar to this (listing the various stores for this machine, along with the evidence that identifies them): Microsoft (R) NET Framework Store Admin 4.0. 30319.1 Copyright (c) Microsoft Corporation All rights reserved Record... stream, too One very common programming task is to copy data from one stream to another We use this kind of thing all the time—copying data, or concatenating the content of several files into another, for example (If you want to copy an entire file, you’d use File.Copy, but streams give you the flexibility to concatenate or modify data, or to work with nonfile sources.) Example 11- 46 shows how to read data... is just for illustrative purposes—.NET 4 added a new CopyTo method to Stream which does this for you In practice you’d need Example 11- 46 only if you were targeting an older version of the NET Framework, but it’s a good way to see how to write to a stream Example 11- 46 Copying from one stream to another private static void WriteTo(Stream source, Stream target, int bufferLength) { bufferLength = Math.Max(100,... which is the path for the file, and a value from the FileMode enumeration What’s the FileMode? Well, it lets us specify exactly what we want done to the file when we open it Table 11 -6 shows the values available Table 11 -6 FileMode enumeration FileMode Purpose CreateNew Creates a brand new file Throws an exception if it already existed Create Creates a new file, deleting any existing file and overwriting... working) We’ll look at that approach in Chapter 16; but you don’t necessarily have to go that far You can use the asynchronous mode built into the stream instead To see how it works, look at Example 11-47 Example 11-47 Asynchronous file I/O static void Main(string[] args) { string path = "mytestfile.txt"; // Create a test file using (var file = File.Create(path, 40 96, FileOptions.Asynchronous)) Asynchronous . sourceFileEntry.Key.Content.Seek (0, SeekOrigin.Begin); // Read 100 bytes from for (int index = 0; index < 100 ; ++index) { var val = sourceFileEntry.Key.Content.ReadByte(); if (val < 0) { break; } if (index != 0) . the same. Or we could read, say, 1 ,00 0 characters off the tape, and copy them onto another tape which we give to someone to work on, then read the next 1 ,00 0, and so on, until we run out of characters. *. 'C:UsersmwaAppDataLocal
2gl4q1a.ycp' is denied. SameNameAndContent.txt C:UsersmwaAppDataLocaldcyx0fv1.hv3 C:UsersmwaAppDataLocalnf2wqwr.y3s C:UsersmwaAppDataLocalkfilxte4.exy 40 8 | Chapter