4. b Object Oriented Databases 2012 tài liệu, giáo án, bài giảng , luận văn, luận án, đồ án, bài tập lớn về tất cả các l...
Object-oriented Databases1 Lecture Notes: Scientific Databases, Prof Gaston Gonnet (2012) Stephanie Fingerhuth and Thomas Tschager Object-oriented databases (OODBs) can be viewed as an extension of relational databases (RDBs): the attributes of the database can be objects which are defined in an object-oriented language (OOL) In contrast to RDBs, OODBs are thus not completely isolated and portable, but are tightly connected to their OOL Objectivity (C++, C#, Java, Python, Smalltalk and XML), ObjectStore (C++, Java, NET), and O2 (C++) are the most prominent products The field of OODBs is in general much smaller than the RDB field OODBs vs RDBs OODBs are data management systems that store data in tuples of attributes organized in relations (Figure 1) There are thus similar to RDBs (previous lecture), but there are also major differences The first, distinctive difference is that the attributes of the OODB can be objects, i.e attributes not only contain a single entry, but consist of a collection of bits (example and subsection objects) Objects are defined in an OOL like Python, Java, C++ or a language unique to the database GRADES STUDENT LectureName ECTS S-ID Grade S-ID S-Name S-Birth SDB 4 001 002 5.75 5.5 001 002 Alice Bob SDB n:1 DATE Day Month Year Age() Example (OODB are RDBs where attributes can be objects) We want to store the name and age of all students In RDBs, the age of a student is stored, whereas in OODBs it can be computed by a complex object storing only the birthday and a method to compute the age: Using an RDB, the table ❙t✉❞❡♥t contains the tuple ❙t✉❞❡♥t✳❙✲■❉ (integer), ❙t✉❞❡♥t✳❙✲◆❛♠❡ (string), ❙t✉❞❡♥t✳❙✲❇✐rt❤ (date), and ❙t✉❞❡♥t✳❆❣❡ (integer) Using an OODB, the table ❙t✉❞❡♥t contains the tuple ❙t✉❞❡♥t✳❙✲■❉ (integer), ❙t✉❞❡♥t✳❙✲◆❛♠❡ (string), and ❙t✉❞❡♥t✳❙✲❇✐rt❤, where ❙t✉❞❡♥t✳❙✲❇✐rt❤ is an object storing the integers ❉❛②, ▼♦♥t❤, ❨❡❛r and the method ❆❣❡✭✮ (Figure 1) Figure 1: OODBs store data in tuples (rows) of attributes (columns) organized in relations (tables) Unlike RDBs, the attributes can be objects defined by the underlying OOL In the example ❉❆❚❊ is an object that stores the integers ❉❛②, ▼♦♥t❤ and ❨❡❛r as well as the method ❆❣❡✭✮ object-oriented databases A second distinction from RDBs is that there exists no established standard The Object Data Management Group (ODMG, ❤tt♣✿✴✴ ✇✇✇✳♦❞❜♠s✳♦r❣✴❖❉▼●✴) defined the Object Data Management Standard ODMG 3.0 (2000) A major component of this standard is Object Query Language (OQL), a non-procedural language similar to SQL for RDBs OQL is based on SQL (see similarity to SQL in example 2); it supports update and query functionalities But unlike SQL, OQL is not an established standard; it has never been fully implemented This is mainly because of the tight connection between OODBs and programming languages: A OODB is closely depending on a objectoriented language This causes unavoidable differences between the various OODBs Example Imagine the university using the database introduced above wants to find suitable candidates for a scholarship The criteria that have to be met are • grades that are on average better than 5.5 and • being less than 25 years old Suitable candidates can be found by querying the tables ❙t✉❞❡♥ts and ●r❛❞❡s (figure 1) using OQL ❙❊▲❊❈❚ ❙t✉❞❡♥t✳❙✲■❉✱ ❆❱●✭●r❛❞❡s✳❣r❛❞❡✮ ❋❘❖▼ ❙❊▲❊❈❚ ❙✲■❉✱ ❙✲◆❛♠❡ ❋❘❖▼ ❙t✉❞❡♥t ❲❍❊❘❊ ❇✐rt❤✳❆❣❡ ❁ ✷✺ ❲❍❊❘❊ ●r❛❞❡s✳❙✲■❉ ❂ ❙t✉❞❡♥t✳❙✲■❉ ●❘❖❯P ❇❨ ❙t✉❞❡♥t✳❙✲■❉ where ❇✐rt❤✳❛❣❡ is a function computing the age of student from his birthday (part of the object that forms the attribute ❇✐rt❤ in ❙t✉❞❡♥ts) and ❆❱●✭●r❛❞❡s✳❣r❛❞❡✮ computes the average of all grades obtained by the same student The SQL statement would look similar, but we would not be able to make use of a method to compute the age Therefore, we would need a more complex SQL statement, e.g using arithmetics or the function ❉❆❚❊❉■❋❋ Although OODBs are conceptually ideal for scientific databases (SDBs), the main disadvantages are: • not as frequently used as RDBs: less tools, libraries and support available object-oriented databases • fewer good implementations • restrictions imposed by use of a specific OOL • research-your-own is very popular Furthermore, RDBs converge into OODBs which makes OODBs more and more obsolete The convergence is possible through objectrelational mapping (ORM) The key idea is to store objects defined in the object-oriented language in a RDB using a group of attributes, such that the properties and relationships are conserved The object can be restored with all functionalities Therefore, this mapping creates a virtual OODB using an RDB However, this approach has some conceptual difficulties (object-relational impedance mismatch), which arise from the different concepts of RDBs (relational algebra) and OODBs (object orientation) Glue language All parts outside the database are programmed in the glue language These parts outside include for example Backups, Archives, Auditing, Computation, Validation, Output Production or Filters (see figure "General Picture/Flow of the SDB" from the first lecture) One glue language is thus always needed to connect an OODB to its outside In addition, there should never be more than a single glue language, as the glue language used should be the OOL the OODB is based on Thus we arrive at: Maxime 3: Oh No! #(glue languages) = Objects in SDBs Objects in the context of SDBs are named collections of bits which can include attributes or any other components They are understood by the system without additional knowledge Understanding means knowing the following: • the type (rich types, e.g the object ❉❆❚❊ in figure 1) • the size • the values • the validity Objects are fully described by blocks, selectors and constructors Blocks can for example be numbers, strings, object references, or Unlike in RDBs, a matrix can be stored in an OODBs in such a way that the system knows all properties (e.g number of rows and columns, the values and the validity) of the matrix object-oriented databases closed entities we not desire to look inside (pictures, movies, pdfs etc.) Constructors on the other hand are functions that are called when an object is initialized They guarantee, for example, that the object is valid (see table for an example of validity rules) Name Type Validity rules Day integer Month Year integer integer < Day ≤ 31 (Month = ∧ Year mod = 0) ⇒ Day ≤ 28 < Day ≤ 12 < Year ≤ today().getYear() Table 1: The fields of the object ❞❛t❡ (see figure 1) with some basic validity rules The validity rules are checked by the constructor and on every update event It is also possible that objects themselves contain objects These subordinate objects can can be either included or referenced Here, an included object is a direct part of the superordinate object, whereas a referenced object is an object that exists also outside of the superordinate object Each attribute has a name, a type and validity rules which are enforced by the object constructor Another important feature of objects is that they allow the addition of arbitrary fields that can also be empty Thus a value can be added to the fields of some objects of a class without having to update all This is a very desirable property for databases, as not every possible evolution can be foreseen in the process of designing a database This is more difficult using RDBs: even though nowadays ORMs allow for the migration of database schemes, changing the design of the relations in RDBs remains inconvenient Normal Forms All attributes or objects in OODB have to fulfil the normal forms (as known from RDB, see RDB lecture notes): A → B → C, where → ✟B✟ ✟ A is a violation of the normal means functional dependent C ✟ → → forms The 12 rules of Codd (see Appendix) also apply to OODBs if adapted Normal forms: (see also previous lecture on RDBs) The normal forms (NF) where defined in RDB theory in order to avoid anomalies after insert, update or delete events The first three normal forms were formulated by E F Codd in the early seventies: 1NF defines the relation property between tables: an attribute has to contain atomic values 2NF: No non prime attribute is dependent on any proper subset of any candidate key of the table 3NF: Every non-prime attribute is non-transitively dependent on every candidate key Codd, E.F A Relational Model of Data for Large Shared Data Banks Communications of the ACM 13 (6): 377-387, June 1970 Codd, E.F Further Normalization of the Data Base Relational Model IBM Research Report RJ909, August 1971 object-oriented databases Example (Violation of normal forms) The relation ●❘❆❉❊❙ in figure violates the second normal form: The candidate key is the set {▲❡❝t✉r❡◆❛♠❡✱ ❙✲■❉} The non-prime attribute ❊❈❚❙ is only depending on ▲❡❝t✉r❡◆❛♠❡ This database design can cause various anomalies: A change of the credit points for a lecture would cause update anomalies, if not all corresponding rows would be updated Moreover, a lecture can only be added, if the grade for at least one student would be available Figure shows an alternative design, which is in second normal form STUDENT GRADES LectureName S-ID Grade S-ID S-Name S-Birth SDB 001 002 5.75 5.5 001 002 Alice Bob SDB n:1 1:n LECTURES LectureName ECTS SDB DATE Day Month Year Age() As mentioned above, objects have names Names can be both URLs or URIs and the object names (onames) are used to reference the object within the DB The names can for example be composed of ❚②♣❡✿▲♦❝❛t♦r✿■❉, where ▲♦❝❛t♦r is in principle a query, a name (of a file or database) or a computation (see for example ❙t✉❞❡♥ts✳❙✲❇✐rt❤ in figure 1) Finally, objects have selectors which allow to select parts of individual objects They also supply attribute names of default values if not defined Furthermore, selectors can compute results from the objects Operations in OODBs Although operations in OODBs are dependent on the underlying OOL (see comparison of frequently used OOLs in Appendix 2), they have some common characteristics of object oriented languages First of all, operators can be polymorphic Polymorphism is the notion of using a common operator for various types of inputs For example a + b, a · b, ab , a ∧ b will adapt to the type of operands they are applied to This often means that an operator has different implementations for various types of inputs – a concept called operator overloading (see Figure 2: Table ●r❛❞❡s is in the second normal form (in contrast to figure 1), as the only non-prime attribute is neither only depending on ▲❡❝t✉r❡◆❛♠❡, nor on ❙✲■❉ object-oriented databases example 4) The used implementation is chosen based on the type of the inputs, i.e there are different implementations for adding up two integers or two matrices; the result will be valid in both cases Example (Operator overloading and computations on the fly) In Darwin the operator + can be used to add up integers ❃ ❛ ✿❂ ✺ ✰ ✼❀ ❛ ✿❂ ✶✷ but also to add random numbers to an existing ❙t❛t✭✮ data structure ❃ ❝ ✿❂ ❙t❛t✭✬♦♥❡ ♠✐❧❧✐♦♥ ❬✵✱✶❪ r❛♥❞♦♠ ♥✉♠❜❡rs✬✮✿ ❃ t♦ ✶❡✻ ❞♦ ❝ ✰ ❘❛♥❞✭✮ ♦❞✿ ♣r✐♥t✭❝✮❀ ♦♥❡ ♠✐❧❧✐♦♥ ❬✵✱✶❪ r❛♥❞♦♠ ♥✉♠❡rs✿ ♥✉♠❜❡r ♦❢ s❛♠♣❧❡s ✶❡✰✵✻ ♠❡❛♥ ❂ ✵✳✹✾✾✽✸ ✰✲ ✵✳✵✵✵✺✼ ✈❛r✐❛♥❝❡ ❂ ✵✳✵✽✸✸✷ ✰✲ ✵✳✵✵✵✶✺ s❦❡✇♥❡ss❂ ✵✳✵✵✵✽✻✽✷✶✱ ❡s❝❡ss❂✲✶✳✶✾✽✸✷ ♠✐♥✐♠✉♠❂✶✳✹✸✸✵✼❡✲✵✻✱ ♠❛①✐♠✉♠❂✵✳✾✾✾✾✾✼ The statistical information provided by the ❙t❛t✭✮ structure c is hereby not stored, but actually computed on the fly This can be seen if the union of another ❙t❛t✭✮ structure e with c is printed: ❃ ❡ ✿❂ ❙t❛t✭✬❛♥♦t❤❡r ♠✐❧❧✐♦♥ ❬✵✱✶❪ r❛♥❞♦♠ ♥✉♠❜❡rs✬✮✿ ❃ t♦ ✶❡✻ ❞♦ ❡ ✰ ❘❛♥❞✭✮ ♦❞✿ ❃ ♣r✐♥t ✭❝ ✉♥✐♦♥ ❡✮❀ ♦♥❡ ♠✐❧❧✐♦♥ ❬✵✱✶❪ r❛♥❞♦♠ ♥✉♠❜❡rs ❛♥❞ ❛♥♦t❤❡r ♠✐❧❧✐♦♥ ❬✵✱✶❪ r❛♥❞♦♠ ♥✉♠❜❡rs✿ ♥✉♠❜❡r ♦❢ s❛♠♣❧❡ ♣♦✐♥ts❂✷❡✰✵✻ ♠❡❛♥ ❂ ✵✳✺✵✵✵✻ ✰✲ ✵✳✵✵✵✹✵ ✈❛r✐❛♥❝❡ ❂ ✵✳✵✽✸✸✷ ✰✲ ✵✳✵✵✵✶✵ s❦❡✇♥❡ss❂✵✳✵✵✵✶✵✹✸✼✹✱ ❡①❝❡ss❂✲✶✳✶✾✾✸✸ ♠✐♥✐♠✉♠❂✸✳✻✷✽✽✻❡✲✵✼✱ ♠❛①✐♠✉♠❂✶ Here c union e is not the union of the fixed statistical values of c and e, but the statistical values of the union c and e Objects in OOLs can also have methods which can be accessed like attributes The result of methods is not stored, but computed on the fly (see example where the statistical values mean, variance, skewness, excess, minimum and maximum are computed on the fly) Hence, they correspond to the notion of views: A view is a stored query, the result of which is computed on the fly based on stored information As can be seen in example 2, searching in OODBs works similar to SQL queries in RDBs with Select From Where statements Apart from the obvious difference that not only attributes, but also object-oriented databases objects and attributes of objects can be used, the main difference is that also the methods of objects can be used for queries Again, the values have to be computed on the fly This has two consequences: first, query optimization in OODBs is complicated; the complexity of the model and query optimization are positively correlated Second, OODB systems are slower and less efficient than their RDB counterparts because of the overhead in storing objects and the increased complexity in interpretation Summary OODBs are similar to RDBs, but they have the huge advantage that their attributes can be objects Objects are collections of bits which are understood by the system without any further knowledge This and also the fact that computations on the fly and the addition of arbitrary fields are possible make OODBS very appealing for SDBs But they are also many drawbacks First, OODBs are used less frequenlty than RDBs, there are thus less tools, less support and less libraries available Also there are fewer good implemetations and no established standards Second, OODBs are dependent on and thus restricted by an OOL Third, OODBs are less efficient and much slower than their RDB counterparts because of the overhead in storing objects and increased complexity in interpretation And fourth, OODBs become more and more obsolete with ORM allowing for virtual OODBs in RDBs and thus convergence of RDBs to OODBs Nevertheless, OODBs are a very attractive model for SDBs because of their flexibilities object-oriented databases Appendix Codd’s 12 rules Rule (0): The system must qualify as relational, as a database, and as a management system For a system to qualify as a relational database management system (RDBMS), that system must use its relational facilities (exclusively) to manage the database Rule 1: The information rule: All information in a relational database (including table and column names) is represented in only one way, namely as a value in a table Rule 2: The guaranteed access rule: All data must be accessible This rule is essentially a restatement of the fundamental requirement for primary keys It says that every individual scalar value in the database must be logically addressable by specifying the name of the containing table, the name of the containing column and the primary key value of the containing row Rule 3: Systematic treatment of null values: The database management system must allow each field to remain null (or empty) Specifically, it must support a representation of "missing information and inapplicable information" that is systematic, distinct from all regular values (for example, "distinct from zero or any other number", in the case of numeric values), and independent of data type It is also implied that such representations must be manipulated by the DBMS in a systematic way Rule 4: Active online catalog based on the relational model: The system must support an online, inline, relational catalog that is accessible to authorized users by means of their regular query language That is, users must be able to access the database’s structure (catalog) using the same query language that they use to access the database’s data Rule 5: The comprehensive data sublanguage rule: The system must support at least one relational language that has a linear syntax, can be used both interactively and within application programs, supports data definition operations (including view definitions), data manipulation operations (update as well as retrieval), security and integrity constraints, and transaction management operations (begin, commit, and rollback) Rule 6: The view updating rule: All views that are theoretically updatable must be updatable by the system Rule 7: High-level insert, update, and delete: The system must support set-at-a-time insert, update, and delete operators This means that data can be retrieved from a relational database in sets constructed of data from multiple rows and/or multiple tables This rule states that insert, update, and delete operations should be supported cited from ❤tt♣✿✴✴❡♥✳✇✐❦✐♣❡❞✐❛✳♦r❣✴ ✇✐❦✐✴❈♦❞❞✪✷✼s❴✶✷❴r✉❧❡s★❚❤❡❴r✉❧❡s on 18/10/2012 object-oriented databases for any retrievable set rather than just for a single row in a single table Rule 8: Physical data independence: Changes to the physical level (how the data is stored, whether in arrays or linked lists etc.) must not require a change to an application based on the structure Rule 9: Logical data independence: Changes to the logical level (tables, columns, rows, and so on) must not require a change to an application based on the structure Logical data independence is more difficult to achieve than physical data independence Rule 10: Integrity independence: Integrity constraints must be specified separately from application programs and stored in the catalog It must be possible to change such constraints as and when appropriate without unnecessarily affecting existing applications Rule 11: Distribution independence: The distribution of portions of the database to various locations should be invisible to users of the database Existing applications should continue to operate successfully when a distributed version of the DBMS is first introduced and when existing distributed data are redistributed around the system Rule 12: The nonsubversion rule: If the system provides a lowlevel (record-at-a-time) interface, then that interface cannot be used to subvert the system, for example, bypassing a relational security or integrity constraint object-oriented databases Appendix Comparison of frequently used OOLs Type conversion Object selection C++ A,B X,Y,Z a,b f,g A a( ); A *a = new A( ) (B)a a.X Polymorphic functions Polymorphic methods Polymorphic operators f(a) virtual A::operator+( ) Inheritance class A : public B,C Multiple inheritance Different protection levels template class B template A max(A a,A b) yes no Object names Attribute names Variables Functions Object construction Generics/Templates for classes Generics/Templates for functions/methods Introspection Reflection as provided on the course website (❤tt♣✿✴✴✇✇✇✳❝❜r❣✳❡t❤③✳❝❤✴ ❡❞✉❝❛t✐♦♥✴❙❉❇✴❧❛♥❣✉❛❣❡s✳♣❞❢) on 20/10/2012 Java A,B X,Y,Z a,b f,g A a = new A( ) Python A,B X,Y,Z a,b f,g a = A( ) Darwin A,B X,Y,Z a,b f,g a := A( ) (B)a a.X Java.lang.reflection.* f(a) (all) (only predefined) (B)a a.X getattr(a,”X”) f(a) (all) a+b, c — set([1]), 5*’d’ A add (self,other): class A(B): B(a) a[X], a[’X’] a[other] (computed value) f(a) A f, A B (converter) a+b, c union 1, 5*d no no all parameters generic no yes yes Introspection(A) GetMethods(A) public class A extends B implements C (classes and interfaces) public class A public static A max(A a,A b) yes yes Table 1: Overview of some OO languages Inherit(A,B) ExtendClass(A,B, [name,type,def], ) 10 ... difference that not only attributes, but also object- oriented databases objects and attributes of objects can be used, the main difference is that also the methods of objects can be used for queries Again,... contain objects These subordinate objects can can be either included or referenced Here, an included object is a direct part of the superordinate object, whereas a referenced object is an object. .. should be the OOL the OODB is based on Thus we arrive at: Maxime 3: Oh No! #(glue languages) = Objects in SDBs Objects in the context of SDBs are named collections of bits which can include attributes