Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 73 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
73
Dung lượng
7,25 MB
Nội dung
CHAPTER 13 ■ IN SEARCH OF C# CANONICAL FORMS 453 You’ve seen how equality tests on references to objects test identity by default. However, there might be times when an identity equivalence test makes no sense. Consider an immutable object that represents a complex number: public class ComplexNumber { public ComplexNumber( int real, int imaginary ) { this.real = real; this.imaginary = imaginary; } private int real; private int imaginary; } public class EntryPoint { static void Main() { ComplexNumber referenceA = new ComplexNumber( 1, 2 ); ComplexNumber referenceB = new ComplexNumber( 1, 2 ); System.Console.WriteLine( "Result of Equality is {0}", referenceA == referenceB ); } } The output from that code looks like this: Result of Equality is False Figure 13-2 shows the diagram representing the in-memory layout of the references. Figure 13-2. References to ComplexNumber This is the expected result based upon the default meaning of equality between references. However, this is hardly intuitive to the user of these ComplexNumber objects. It would make better sense for the comparison of the two references in the diagram to return true because the values of the two objects are the same. To achieve such a result, you need to provide a custom implementation of equality for these objects. I’ll show how to do that shortly, but first, let’s quickly discuss what value equality means. CHAPTER 13 ■ IN SEARCH OF C# CANONICAL FORMS 454 Value Equality From the preceding section, it should be obvious what value equality means. Equality of two values is true when the actual values of the fields representing the state of the object or value are equivalent. In the ComplexNumber example from the previous section, value equality is true when the values for the real and imaginary fields are equivalent between two instances of the class. In the CLR, and thus in C#, this is exactly what equality means for value types defined as structs. Value types derive from System.ValueType, and System.ValueType overrides the Object.Equals method. ValueType.Equals sometimes uses reflection to iterate through the fields of the value type while comparing the fields. This generic implementation will work for all value types. However, it is much more efficient if you override the Equals method in your struct types and compare the fields directly. Although using reflection to accomplish this task is a generally applicable approach, it’s very inefficient. ■ Note Before the implementation of ValueType.Equals resorts to using reflection, it makes a couple of quick checks. If the two types being compared are different, it fails the equality. If they are the same type, it first checks to see if the types in the contained fields are simple data types that can be bitwise-compared. If so, the entire type can be bitwise-compared. Failing both of these conditions, the implementation then resorts to using reflection. Because the default implementation of ValueType.Equals iterates over the value’s contained fields using reflection, it determines the equality of those individual fields by deferring to the implementation of Object.Equals on those objects. Therefore, if your value type contains a reference type field, you might be in for a surprise, depending on the semantics of the Equals method implemented on that reference type. Generally, containing reference types within a value type is not recommended. Overriding Object.Equals for Reference Types Many times, you might need to override the meaning of equivalence for an object. You might want equivalence for your reference type to be value equality as opposed to referential equality, or identity. Or, as you’ll see in a later section, you might have a custom value type where you want to override the default Equals method provided by System.ValueType in order to make the operation more efficient. No matter what your reason for overriding Equals, you must follow several rules: • x.Equals(x) == true. This is the reflexive property of equality. • x.Equals(y) == y.Equals(x). This is the symmetric property of equality. • x.Equals(y) && y.Equals(z) implies x.Equals(z) == true. This is the transitive property of equality. • x.Equals(y) must return the same result as long as the internal state of x and y has not changed. • x.Equals(null) == false for all x that are not null. • Equals must not throw exceptions. CHAPTER 13 ■ IN SEARCH OF C# CANONICAL FORMS 455 An Equals implementation should adhere to these hard-and-fast rules. You should follow other suggested guidelines in order to make the Equals implementations on your classes more robust. As already discussed, the default version of Object.Equals inherited by classes tests for referential equality, otherwise known as identity. However, in cases like the example using ComplexNumber, such a test is not intuitive. It would be natural and expected that instances of such a type are compared on a field-by-field basis. It is for this very reason that you should override Object.Equals for these types of classes that behave with value semantics. Let’s revisit the ComplexNumber example once again to see how you can do this: public class ComplexNumber { public ComplexNumber( int real, int imaginary ) { this.real = real; this.imaginary = imaginary; } public override bool Equals( object obj ) { ComplexNumber other = obj as ComplexNumber; if( other == null ) { return false; } return (this.real == other.real) && (this.imaginary == other.imaginary); } public override int GetHashCode() { return (int) real ^ (int) imaginary; } public static bool operator==( ComplexNumber me, ComplexNumber other ) { return Equals( me, other ); } public static bool operator!=( ComplexNumber me, ComplexNumber other ) { return Equals( me, other ); } private double real; private double imaginary; } public class EntryPoint { static void Main() { CHAPTER 13 ■ IN SEARCH OF C# CANONICAL FORMS 456 ComplexNumber referenceA = new ComplexNumber( 1, 2 ); ComplexNumber referenceB = new ComplexNumber( 1, 2 ); System.Console.WriteLine( "Result of Equality is {0}", referenceA == referenceB ); // If we really want referential equality. System.Console.WriteLine( "Identity of references is {0}", (object) referenceA == (object) referenceB ); System.Console.WriteLine( "Identity of references is {0}", ReferenceEquals(referenceA, referenceB) ); } } In this example, you can see that the implementation of Equals is pretty straightforward, except that I do have to test some conditions. I must make sure that the object reference I’m comparing to is both not null and does, in fact, reference an instance of ComplexNumber. Once I get that far, I can simply test the fields of the two references to make sure they are equal. You could introduce an optimization and compare this with other in Equals. If they’re referencing the same object, you could return true without comparing the fields. However, comparing the two fields is a trivial amount of work in this case, so I’ll skip the identity test. In the majority of cases, you won’t need to override Object.Equals for your reference type objects. It is recommended that your objects treat equivalence using identity comparisons, which is what you get for free from Object.Equals. However, there are times when it makes sense to override Equals for an object. For example, if your object represents something that naturally feels like a value and is immutable, such as a complex number or the System.String class, then it could very well make sense to override Equals in order to give that object’s implementation of Equals() value equality semantics. In many cases, when overriding virtual methods in derived classes, such as Object.Equals, it makes sense to call the base class implementation at some point. However, if your object derives directly from System.Object, it makes no sense to do this. This is because Object.Equals likely carries a different semantic meaning from the semantics of your override. Remember, the only reason to override Equals for objects is to change the semantic meaning from identity to value equality. Also, you don’t want to mix the two semantics together. But there’s an ugly twist to this story. You do need to call the base class version of Equals if your class derives from a class other than System.Object and that other class does override Equals to provide the same semantic meaning you intend in your derived type. This is because the most likely reason a base class overrode Object.Equals is to switch to value semantics. This means that you must have intimate knowledge of your base class if you plan on overriding Object.Equals, so that you will know whether to call the base version. That’s the ugly truth about overriding Object.Equals for reference types. Sometimes, even when you’re dealing with reference types, you really do want to test for referential equality, no matter what. You cannot always rely on the Equals method for the object to determine the referential equality, so you must use other means because the method can be overridden as in the ComplexNumber example. Thankfully, you have two ways to handle this job, and you can see them both at the end of the Main method in the previous code sample. The C# compiler guarantees that if you apply the == operator to two references of type Object, you will always get back referential equality. Also, System.Object supplies a static method named ReferenceEquals that takes two reference parameters and returns true if the identity test holds true. Either way you choose to go, the result is the same. If you do change the semantic meaning of Equals for an object, it is best to document this fact clearly for the clients of your object. If you override Equals for a class, I would strongly recommend that you tag its semantic meaning with a custom attribute, similar to the technique introduced for iCloneable implementations previously. This way, people who derive from your class and want to change the semantic meaning of Equals can quickly determine if they should call your implementation CHAPTER 13 ■ IN SEARCH OF C# CANONICAL FORMS 457 in the process. For maximum efficiency, the custom attribute should serve a documentation purpose. Although it’s possible to look for such an attribute at run time, it would be very inefficient. ■ Note You should never throw exceptions from an implementation of Object.Equals. Instead of throwing an exception, return false as the result instead. Throughout this entire discussion, I have purposely avoided talking about the equality operators because it is beneficial to consider them as an extra layer in addition to Object.Equals. Support of operator overloading is not a requirement for languages to be CLS-compliant. Therefore, not all languages that target the CLR support them thoroughly. Visual Basic is one language that has taken a while to support operator overloading, and it only started supporting it fully in Visual Basic 2005. Visual Basic .NET 2003 supports calling overloaded operators on objects defined in languages that support overloaded operators, but they must be called through the special function name generated for the operator. For example, operator== is implemented with the name op_Equality in the generated IL code. The best approach is to implement Object.Equals as appropriate and base any operator== or operator!= implementations on Equals while only providing them as a convenience for languages that support them. ■ Note Consider implementing IEquatable<T> on your type to get a type-safe version of Equals. This is especially important for value types, because type-specific versions of methods avoid unnecessary boxing. If You Override Equals, Override GetHashCode Too GetHashCode is called when objects are used as keys of a hash table. When a hash table searches for an entry after given a key to look for, it asks the key for its hash code and then uses that to identify which hash bucket the key lives in. Once it finds the bucket, it can then see if that key is in the bucket. Theoretically, the search for the bucket should be quick, and the buckets should have very few keys in them. This occurs if your GetHashCode method returns a reasonably unique value for instances of your object that support value equivalence semantics. Given the previous discussion, you can see that it would be very bad if your hash code algorithm could return a different value between two instances that contain values that are equivalent. In such a case, the hash table might fail to find the bucket your key is in. For this reason, it is imperative that you override GetHashCode if you override Equals for an object. In fact, if you override Equals and not GetHashCode, the C# compiler will let you know about it with a friendly warning. And because we’re all diligent with regard to building our release code with zero warnings, we should take the compiler’s word seriously. CHAPTER 13 ■ IN SEARCH OF C# CANONICAL FORMS 458 ■ Note The previous discussion should be plenty of evidence that any type used as a hash table key should be immutable. After all, the GetHashCode value is normally computed based upon the state of the object itself. If that state changes, the GetHashCode result will likely change with it. GetHashCode implementations should adhere to the following rules: • If, for two instances, x.Equals(y) is true, then x.GetHashCode() == y.GetHashCode(). • Hash codes generated by GetHashCode need not be unique. • GetHashCode is not permitted to throw exceptions. If two instances return the same hash code value, they must be further compared with Equals to determine whether they’re equivalent. Incidentally, if your GetHashCode method is very efficient, you can base the inequality code path of your operator!= and operator== implementations on it because different hash codes for objects of the same type imply inequality. Implementing the operators this way can be more efficient in some cases, but it all depends on the efficiency of your GetHashCode implementation and the complexity of your Equals method. In some cases, when using this technique, the calls to the operators could be less efficient than just calling Equals, but in other cases, they can be remarkably more efficient. For example, consider an object that models a multidimensional point in space. Suppose that the number of dimensions (rank) of this point could easily approach into the hundreds. Internally, you could represent the dimensions of the point by using an array of integers. Say you want to implement the GetHashCode method by computing a CRC32 on the dimension points in the array. This also implies that this Point type is immutable. This GetHashCode call could potentially be expensive if you compute the CRC32 each time it is called. Therefore, it might be wise to precompute the hash and store it in the object. In such a case, you could write the equality operators as shown in the following code: sealed public class Point { // other methods removed for clarity public override bool Equals( object other ) { bool result = false; Point that = other as Point; if( that != null ) { if( this.coordinates.Length != that.coordinates.Length ) { result = false; } else { result = true; for( long i = 0; i < this.coordinates.Length; ++i ) { if( this.coordinates[i] != that.coordinates[i] ) { result = false; break; CHAPTER 13 ■ IN SEARCH OF C# CANONICAL FORMS 459 } } } } return result; } public override int GetHashCode() { return precomputedHash; } public static bool operator ==( Point pt1, Point pt2 ) { if( pt1.GetHashCode() != pt2.GetHashCode() ) { return false; } else { return Object.Equals( pt1, pt2 ); } } public static bool operator !=( Point pt1, Point pt2 ) { if( pt1.GetHashCode() != pt2.GetHashCode() ) { return true; } else { return !Object.Equals( pt1, pt2 ); } } private float[] coordinates; private int precomputedHash; } In this example, as long as the precomputed hash is sufficiently unique, the overloaded operators will execute quickly in some cases. In the worst case, one more comparison between two integers—the hash values—is executed along with the function calls to acquire them. If the call to Equals is expensive, then this optimization will return some gains on a lot of the comparisons. If the call to Equals is not expensive, then this technique could add overhead and make the code less efficient. It’s best to apply the old adage that premature optimization is poor optimization. You should only apply such an optimization after a profiler has pointed you in this direction and if you’re sure it will help. Object.GetHashCode exists because the developers of the Standard Library felt it would be convenient to be able to use any object as a key to a hash table. The fact is, not all objects are good candidates for hash keys. Usually, it’s best to use immutable types as hash keys. A good example of an immutable type in the Standard Library is System.String. Once such an object is created, you can never change it. Therefore, calling GetHashCode on a string instance is guaranteed to always return the same value for the same string instance. It becomes more difficult to generate hash codes for objects that are mutable. In those cases, it’s best to base your GetHashCode implementation on calculations performed on immutable fields inside the mutable object. Detailing algorithms for generating hash codes is outside the scope of this book. I recommend that you reference Donald E. Knuth’s The Art of Computer Programming, Volume 3: Sorting and Searching, Second Edition (Boston: Addison-Wesley Professional, 1998). For the sake of example, suppose that you want to implement GetHashCode for a ComplexNumber type. One solution is to compute the hash based on the magnitude of the complex number, as in the following example: CHAPTER 13 ■ IN SEARCH OF C# CANONICAL FORMS 460 using System; public sealed class ComplexNumber { public ComplexNumber( double real, double imaginary ) { this.real = real; this.imaginary = imaginary; } public override bool Equals( object other ) { bool result = false; ComplexNumber that = other as ComplexNumber; if( that != null ) { result = (this.real == that.real) && (this.imaginary == that.imaginary); } return result; } public override int GetHashCode() { return (int) Math.Sqrt( Math.Pow(this.real, 2) * Math.Pow(this.imaginary, 2) ); } public static bool operator ==( ComplexNumber num1, ComplexNumber num2 ) { return Object.Equals(num1, num2); } public static bool operator !=( ComplexNumber num1, ComplexNumber num2 ) { return !Object.Equals(num1, num2); } // Other methods removed for clarity private readonly double real; private readonly double imaginary; } The GetHashCode algorithm is not meant as a highly efficient example. In fact, it’s not efficient at all because it is based on nontrivial floating-point mathematical routines. Also, the rounding could potentially cause many complex numbers to fall within the same bucket. In that case, the efficiency of the hash table would degrade. I’ll leave a more efficient algorithm as an exercise to the reader. Notice that I don’t use the GetHashCode method to implement operator!= because of the efficiency concerns. But more importantly, I rely on the static Object.Equals method to compare them for equality. This handy method checks the references for null before calling the instance Equals method, saving you from having to do that. Had I used GetHashCode to implement operator!=, I would have had to check the references for null values before calling GetHashCode on them. Also, note that both fields used to calculate the hash code are immutable. Thus, this instance of this object will always return the same hash code value as long as it lives. In fact, you might consider caching the hash code value once you compute it the first time to gain greater efficiency. CHAPTER 13 ■ IN SEARCH OF C# CANONICAL FORMS 461 Does the Object Support Ordering? Sometimes you’ll design a class for objects that are meant to be stored within a collection. When the objects in that collection need to be sorted, such as by calling Sort on an ArrayList, you need a well- defined mechanism for comparing two objects. The pattern that the Base Class Library designers provided hinges on implementing the following IComparable interface: 5 public interface IComparable { int CompareTo( object obj ); } Again, another one of these interfaces merely contains one method. Thankfully, IComparable doesn’t contain the same depth of pitfalls as ICloneable and IDisposable. The CompareTo method is fairly straightforward. It can return a value that is either positive, negative, or zero. Table 13-1 lists the return value meanings. Table 13-1. Meaning of Return Values of IComparable.CompareTo CompareTo Return Value Meaning Positive this > obj Zero this == obj Negative this < obj You should be aware of a few points when implementing IComparable.CompareTo. First, notice that the return value specification says nothing about the actual value of the returned integer. It only defines the sign of the return values. So, to indicate a situation where this is less than obj, you can simply return -1. When your object represents a value that carries an integer meaning, an efficient way to compute the comparison value is by subtracting one from the other. It can be tempting to treat the return value as an indication of the degree of inequality. Although this is possible, I don’t recommend it because relying on such an implementation is outside the bounds of the IComparable specification, and not all objects can be expected to do that. Keep in mind that the subtraction operation on integers might incur an overflow. If you want to avoid that situation, you can simply defer to the IComparable.CompareTo implemented by the integer type for greater safety. Second, keep in mind that CompareTo provides no return value definition for when two objects cannot be compared. Because the parameter type to CompareTo is System.Object, you could easily attempt to compare an Apple instance to an Orange instance. In such a case, there is no comparison, and you’re forced to indicate such by throwing an ArgumentException object. Finally, semantically, the IComparable interface is a superset of Object.Equals. If you derive from an object that overrides Equals and implements IComparable, you’re wise to override Equals and 5 You should consider using the generic IComparable<T> interface, as shown in Chapter 11 for greater type safety. CHAPTER 13 ■ IN SEARCH OF C# CANONICAL FORMS 462 reimplement IComparable in your derived class, or do neither. You want to make certain that your implementation of Equals and CompareTo are aligned with each other. Based upon all of this information, a compliant IComparable interface should adhere to the following rules: • x.CompareTo(x) must return 0. This is the reflexive property. • If x.CompareTo(y) == 0, then y.CompareTo(x) must equal 0. This is the symmetric property. • If x.CompareTo(y) == 0, and y.CompareTo(z) == 0, then x.CompareTo(z) must equal 0. This is the transitive property. • If x.CompareTo(y) returns a value other than 0, then y.CompareTo(x) must return a non-0 value of the opposite sign. In other terms, this statement says that if x < y, then y > x, or if x > y, then y < x. • If x.CompareTo(y) returns a value other than 0, and y.CompareTo(z) returns a value other than 0 with the same sign as the first, then x.CompareTo(y) is required to return a non-0 value of the same sign as the previous two. In other terms, this statement says that if x < y and y < z, then x < z, or if x > y and y > z, then x > z. The following code shows a modified form of the ComplexNumber class that implements IComparable and consolidates some code at the same time in private helper methods: using System; public sealed class ComplexNumber : IComparable { public ComplexNumber( double real, double imaginary ) { this.real = real; this.imaginary = imaginary; } public override bool Equals( object other ) { bool result = false; ComplexNumber that = other as ComplexNumber; if( that != null ) { result = InternalEquals( that ); } return result; } public override int GetHashCode() { return (int) this.Magnitude; } public static bool operator ==( ComplexNumber num1, ComplexNumber num2 ) { return Object.Equals(num1, num2); } public static bool operator !=( ComplexNumber num1, ComplexNumber num2 ) { [...]... field, read about the Pimpl Idiom in Herb Sutter’s Exceptional C++: 47 Engineering Puzzles, Programming Problems, and Exception-Safety Solutions (Boston: AddisonWesley Professional, 199 9) 475 CHAPTER 13 ■ IN SEARCH OF C# CANONICAL FORMS } } Notice that I’ve introduced a shim class named ConstComplexNumber When a method wants to accept a ComplexNumber object but guarantee that it won’t change that parameter,... of the foreach statement, a variable emp of type Employee references the current item in the collection during iteration One of the rules enforced by the C# compiler for the collection is that it must implement a public 4 69 CHAPTER 13 ■ IN SEARCH OF C# CANONICAL FORMS method named GetEnumerator, which returns a type used to enumerate the items in the collection This method is typically implemented as... type In reality, this efficiency hit is very minor with managed reference types in C# unless you’re doing it many times within a loop In some situations, the C# compiler will generate much more efficient code if you provide a typesafe implementation of a well-defined method Consider this typical foreach statement in C#: foreach( Employee emp in collection ) { // Do Something } Quite simply, the code... System.Convert class ■ Note C# offers conversion operators that allow you to do essentially the same thing you can do by implementing IConvertible However, C# implicit and explicit conversion operators aren’t CLS-compliant Therefore, not every language that consumes your C# code might call them to do the conversion It is recommended that you not rely on them exclusively to handle conversion Of course, if your project... across all types However, it can come at a price C++ and C# are both strongly typed languages where every variable is declared with a type Along with this comes type safety, which the compiler supplies to help you avoid errors For example, it keeps you from assigning an instance of class Apple from an instance of class MonkeyWrench However, C# (and C++) allows you to work in a less-type-safe way You... specifiers, as described under “Standard Numeric Format Strings” in the MSDN library In a nutshell, the format string consists of a single letter specifying the format, and then an optional number between 0 and 99 that declares the precision For example, you can specify that a double be output as a five-significant-digit floating-point number with F5 Not all types are required to support all formats except for... specific formatting needs for your objects However, that power comes at an implementation cost 466 CHAPTER 13 ■ IN SEARCH OF C# CANONICAL FORMS Implementing IFormattable.ToString can be a very detail-oriented task that takes a lot of time and attentiveness Is the Object Convertible? The C# compiler provides support for converting instances of simple built-in value types, such as int and long, from one type... enumerator.Current; } } public bool MoveNext() { 6 I use the word often here because the iterators could be reverse iterators In Chapter 9, I show how you can easily create reverse and bidirectional iterators that implement IEnumerator 470 CHAPTER 13 ■ IN SEARCH OF C# CANONICAL FORMS return enumerator.MoveNext(); } public void Reset() { enumerator.Reset(); } private IEnumerator enumerator; } public... that there are many ways to convert one type to another in C#, and in fact, there are However, the general rule of thumb is to rely on System.Convert when casting won’t do the trick Moreover, your custom objects, such as the ComplexNumber class, should implement IConvertible so they can work in concert with the System.Convert class ■ Note C# offers conversion operators that allow you to do essentially... value types should be discouraged If the field is a value type that requires disposal, you cannot guarantee that disposal happens 9 476 To avoid this complex ball of yarn, many of the value types defined by the NET Framework are, in fact, immutable CHAPTER 13 ■ IN SEARCH OF C# CANONICAL FORMS Value types and reference types do share many implementation idioms For example, it makes sense for both to consider . != that.coordinates[i] ) { result = false; break; CHAPTER 13 ■ IN SEARCH OF C# CANONICAL FORMS 4 59 } } } } return result; } public override int GetHashCode() {. Programming, Volume 3: Sorting and Searching, Second Edition (Boston: Addison-Wesley Professional, 199 8). For the sake of example, suppose that you want to implement GetHashCode for a ComplexNumber. string consists of a single letter specifying the format, and then an optional number between 0 and 99 that declares the precision. For example, you can specify that a double be output as a five-significant-digit