Database File System — An Alternative to Hierarchy Based File Systems Author: O Gorter
Supervisors: H Scholten, B van Dijk, P.G Jansen
Copyright ©2003, 2004 O Gorter
University of Twente, Computer Sciences Enschede, the Netherlands
Trang 3Database File System
Abstract
Trang 4Database File System
Foreword
This document is my graduation report for the University of Twente and is all about the Database File System (DBFS); my graduation project Because this document is mainly written for the University some content might not be relevant to all readers For those only interested in what the DBFS is, I recommend reading chapters
“Introduction” and “Database File System — Overview’
The document assumes the reader has a fair knowledge of current computer systems and related terms As a reference, most terms are explained in the chapter “List of Used Terms and Abbreviations”, related research and documents are in chapter “References”
All of the research in this document is done from a user and user interface point of view This is different from most research on file systems, and explains why compared to these writings, seemingly important information is left out, while almost trivial points are discussed in depth
For those who obtained this document digitally, it is available in two versions, one optimized for screen reading and one optimized for print (dbfs-screen.pdf and dbfs-paper.pdf resp.) The screen version has a 4 : 3 layout with a slightly larger font and is definitely recommended when reading from a monitor
‘Thanks
For all their advice, support, help and faith, I would like to thank Hans Scholten; Betsy van Dijk; Pierre Jansen; all the others on the DIES group; my flatmates; and my family
Also a special thanks goes out to all the usability testers, whom I will only mention by first name, but you know who you are
Trang 6Database File System
Contents
Abstract o 8 &@ @ © 8 &@ @ © 6 ee He S8 ee Hw ð 9 Hw 2 OH we
Foreword o 8 &@ @ © 8 &@ @ 6 6 ee He BH ee HF 6 ð 9% ð 0d 09.6 we
1 Introduction c cc eee ecw eee eae 1.4 Relevance 2 À2 3 Database File System —- Theory 3.4 Relevance 10 3.2 Categorisation 11 4 Database File System —~ Overview 4,1 User Interaction 14 5 Database File System — Internals Bed Server 21 5.2 Client 31 5:8 Graphical User Interface 31 6 Usability Testing .6., O Gorter
Hierachy Based File Systems — Theory
Trang 7Database File System Contents 6.4 Objective 35 6.2 Method 37 6.3 Results a8 ~Y « Conclusions 4 K3 “3 Cr 9% 59 6ô 0 %0 6ð 0 %0 6 PP 4% Ð0 ð 0% ð t9 9ð t0 l3 ð ở 9 6ð 9 9 6 0 v0 ð È %0 e ðt %* ð ð 9 vo ð 0 9Ð t9 9 6 9 9S 6 t9 9 1 z.1 Future Work and Recommendations 41 References ° ° e ` ° ° e `" ° e e `" ° e w `" ° e w * ° w w * ° w ` * ° w ` * ° e ` ° ° e ` ° e e `" ° e * `" ° e * `" ° w * * ° w * * ° w ` * ° w ` ° 43 Related Work 43 Related Software : 44 Other References 46
EMGOK Le ee ee ee ee eee eee eee eee ee eee eee ee eee eee eee eee ees AT List of Used Terms and Abbreviations ©6000 0 ce eee eee eee ee eee ees eee seer se 4G Assignment — Database File System 0 0 ccc ee ee ee ee eee ee ee tee esc ee ee Fl
Dutch Abstract e 6w 9 0 6 t 9% 0 6 9 * © 6 9 9 0 6 9 © © 9% 6 6 %9 0 6 6% 0 6€ 6t 9 0 6 6 3 0 6 9v 3 0 6 9 00 6 9 0ð 60 6 3 6 6 € FB OC & %0 54
Œ Test Í — Email 59 © © 1% 0 © & 3% 0 © %9 9% 0 6 %9 9 © © %9 0 0 6% 9 6 & » 8 © © » 90 © 6% 0 6 0 3 0 6 9 3 0 6 9 93 0 6 9 0 0 6 9 9 6 © 3% 9 6 & 6% 90 Cn cn
D Test 2 —- Arrange © % © © © * © © © &© © ©€ 9 0 © © ©» 6 © & » & © © » © © © 9% 0 6 6 3 0 6 t9 60 6 9 3 6 6 9 96 © 6€ 9 03 6e © FF © te % 0 6 & ew Cr
or) pet
t2 Interview i ~~ Email Normal eo ®@ %9 0 6 © ©» & © © © 09 © © 1% 0 6 w 9% 0 6 6 %0 6 9w 3 0 6 v0 0 6 9 0 6 6 9 96 € 9% 0 6© 6 3» © 6e w 9 0
r Interview i ~~ Email Dbfs e #8 3 © © %w 9 © € » 9 © € » © © © » © © 6 3 90 6 6 9% 0 6 9 3% 0 6 9 9 0 €6 9 09 0 6% 0 6 t 9% 9 6 © %» 5 © & 9% 90 65,
G Interview 2 — Arrange * 2» â Â %9 8 â © %9 9 © € » 8 © € » © © € » © © t6 3 0 6 6 9% 0 6 9 9% 0 6 9 9 0 €6 9 0 0 6 » © © © 9% 9 6 © %» 5 6 6.9% 90 79
Trang 8Database File System
Introduction
—
File access and file management is something we do on our PCs every day; lots of computer time is spend browsing our directories and opening and saving files, or worse, finding files The basis for this system was laid down over 30 years ago, and since graphical user interfaces became main-stream not much has changed Yet computer hardware has become increasingly more powerful and limitations that existed are no longer there Still the biggest change in our file interface is preview (thumbnail) rendering in the file manager
A new file system can introduce better metaphors on working with files and can make use of advanced GUI techniques not available when hierarchical file systems came into use It can bring the focus of the file system to the user, instead of the computer, and in doing so change how we think about files and the whole computer It can be an enabler for a new and more up-to-date user-oriented computer interface
And this is exactly what this research is about, trying to bring file management to the user It does so by providing a search based file interface, based on file meta-data, and it introduces keywords in favor off directories Being user oriented means only storing documents and not system files like shared libraries Where documents are all files the user is interested in, this can be a MSWord document, but also images, music and more
Because the systems searches and modifies meta-data, all meta-data can be treated equally, meaning that security, ownership and sharing are just as easy to manipulate as the filename or keywords And without directories the systems does away with locations, instead it can categorize documents in a more powerful way Without locations on a file system, we can have applications that just save everything you do The save button can be completely removed from every interface element of the computer, doing away with the dualistic nature
Trang 91.1
Database File System Introduction
of how we use a computer today; creating a system where there is no longer a difference on what you see on your screen and what is stored on the hard disk
Relevance
In the “References” we see more works that try to extend the file system with searching And it is a very relevant idea, because lately it is sometimes easier to find things on the enormous internet, using google, then on your own hard drive You can always use a search, but it is slow and not very ubiquitous
Trang 10Database File System
Hierachy Based File Systems — Theory
—
This is chapter is an overview of how todays (hierarchical) file systems work from a high level point of view Where appropriate some forward references are given to the database file system Four properties of hierarchy based filesystems are discussed, with an emphasis on their weak points But we start with a description of what a hierarchy based filesystem actually is
Hierarchy based filesystems are created by directories and files A directory is an object wich has a name and it can contain directories and files Files are objects that also have a name and contain data This data can be anything and is only relevant for the application that uses this data; the file system does not impose what the data must look like, in fact, this data is not shown in the file system We use a file system in order to keep track of this data, by keeping track of our files We keep track of our files by knowing their name and the directory in wich they reside
This type of filesystem creates a two dimensional space laid out as a tree like structure This structure is created from directory names and depth of directories (Also see Figure 2.1.) By choosing useful names for directories, we effectively create a categorisation over sub-hierarchies (sub-trees) and the files they contain For example, a digital picture from a certain user on a MS Windows system would typically be found somewhere below: /Documents and Settings/username/My Pictures/ We can trace back why the picture would be stored there: The picture is a document, therefor it should be in Documents and Settings; The picture is from ‘username’ therefor it should be in username; and it should be in My Pictures because it is a picture
Trang 11Database File System Hierachy Based File Systems — Theory User User2 Winnt ack Files | | Documents And Settings C: Figure 21 A typical MS Windows XP hierarchy
Trang 122.1.2
Database File System Hierachy Based File Systems — Theory
Properties of Hierarchy Based File Systems
URLs
Using directories and files, hierarchy based file systems create a unique name for every file; referred to as the path or as we will do here, the URL This is one of the strongest points of a hierarchy based file system An example of a URL is /Documents and Settings/username/My Documents/University/Final Project/report.doc A URL is a clear means if identifying one file, and one file only Hierarchy based filesystems are based on URLs, or create URLs, depending on your point of view This property comes from the fact that inside a directory there can be only one object (file or directory) with a certain name Otherwise a URL could point to two or more files at same time, without a clear way to know which file it actually refers to URLs are a feature lost in the database file system, at least to some extend There is more about this in the discussion of the database file system, see
“Database File System — Theory”
It is important to note that URLs are very useful and part of the reason why hierarchy based file systems are designed the way they are It gives both computers and humans a way to refer to files uniquely The next three properties to be discussed are very much connected to the URL creating property, but instead of being an advantage only, they are the properties that are the main motivation for a new file system
Hiding
Directories hide what is inside of them, directories were designed to work this way, it is what keeps the file system tidy and arranged and therefor useable; but there is a downside: Because directories hide what is inside of them, there could be an endless set of sub-directories inside a directory There is no way of knowing what is inside a directory until you traverse into it The consequences are that, if categorisation using directory names fails, or is unclear, then a file can be hidden There are tools to locate such ‘lost’ files, but wether they are able to help depends on the situation
Trang 132.1.3
2.1.4
Database File System Hierachy Based File Systems — Theory
Locations
Directories create locations, this enables users to create meaningful locations to store certain files But many directories means many locations, which might mean too many locations; which makes it hard to find the right location to store documents Also, documents can be stored at only one location, meaning that you need to find the right location if you want to find a certain document
To make matters worse, most systems don’t put emphasis on locations anymore, they emphasize the next property, hierarchy, instead The problem here is that locations are easier (less abstract) to understand then hierarchy, especially for less computer literate people Stressing locations can be done by not allowing users to create two windows on the same directory, opening sub-directories in new windows and presenting one directory always in the same window at the same location with the same layout This way a user recognizes his documents directory not only by its URL, but also by its window layout and position
The reason for the shift to hierarchy and away from locations is the ever growing size of the file system and the amount of files we store With too many locations it is hard to identify locations Notorious good systems that placed lots of emphasis on locations were Mac OS 9 (and below) and Risc OS Also earlier versions of Microsoft Windows placed more emphasis on locations then they do today
Hierarchy
Trang 14Database File System Hierachy Based File Systems — Theory other way around And where do we store programs only for one user? Clearly programs do not fall in the
category Documents and Settings So we cannot store them at Documents and Settings/username Instead
we need to create a new username categorisation somewhere else where we do store programs, like Program Files/username
The problem of imposing hierarchy becomes even clearer when thinking about other file properties like security, ownership, encryption and sharing If we want to share a file with another user, perhaps even over a network, most of the time we must move (or copy) the file to a public location (directory) and set the rights to the file correctly Because the concepts of ownership and sharing work through the hierarchy, we need to create a different hierarchy to prevent all our files from becoming public all at once Two hierarchies to store our files means keeping track of them even becomes harder This principle of splitting the hierarchy based on some properties of what is inside the sub-hierarchies is not always effective; sometimes we have documents that easily fit inside both sub-hierarchies Which one is the best hierarchy to place that file? Probably none Properties over files just are not one dimensional and how we would like to categorize these dimensions depends on our point of view When sharing files, we want those files to be placed somewhere in the hierarchy where the are actually shared But when trying to retrieve certain files, we would like them to be in the most logical place in the hierarchy Unfortunately these two views on where the files should be stored are non-reconcilable
There is a good reason why categorizing our files with directories has it shortcomings The properties over which we are trying to categorize are all different kinds of properties In the example /Documents and Set- tings/username/My Documents/University/Final Project/report.doc, the first categorisation is made over the type of the file (Documents and Settings), the second categorisation is made over the owner of the file (username), then we categorize again over the type of file (My Documents), and finally we categorize twice over the role of our file And only the last part of this categorisation is a truly hierarchical relationship
A last issue with hierarchy is its abstract nature, also see the previous property of locations If we want to keep our file system organized we must create hierarchies, preferably meaning-full ones, in order to keep track of our documents This is a difficult task for less computer literate people It is hard to understand that a directory can contain a directory The reason this is difficult, is mostly because there is no real life example that has somewhat the same properties A house contains a closet that contains a box that contains a photo album that
Trang 15Database File System Hierachy Based File Systems — Theory contains a certain photo Not a hard-drive contains a box that contains a box that contains a box that contains a certain photo And even when there are boxes inside other boxes, the first box would be a big box
Trang 16Database File System
Database File System — ‘Theory
——_—3—
In the previous chapter we discussed current (hierarchical) file systems Four properties were analysed that are inherent to hierarchy based file systems The final conclusion was that those file systems impose to much hierarchy without a choice; the hierarchy forces categorization over different properties that don’t have hierar- chical relationships with each other This chapter presents the idea of the database file system in a high level, non-technical, overview; explaining the overall design and workings The main differences with hierachical file systems will be pointed out
The DBFS does not impose hierarchy by storing all files in one big data store, or database, hence the name Database File System It stores files without any restrictions on the files; multiple files can be stored with the exact same meta-data It is almost like storing all files in one directory, but without the need for unique names To retrieve files, the big store of files can be reduced by telling the system what files to look for Like all files that were modified today, or all files called report The queries on the system can include any sort of meta-data that is associated with files This introduces a new powerful feature to a file system: You can retrieve files independed of the perspective you took when storing them
A little example to explain this some more, suppose you are looking for a file: If you remember you edited your file last week, you can look for all files edited last week; If you remember giving it a certain property, you can look for all files that have that property; If you remember you made someone else the owner of the file, you can look for all files owned by that owner; If you remember at least some part of the filename, you look for all files containing that part in their fileename; Or you can use any combination of the above to look for your file
Trang 17Database File System Database File System — Theory Because the DBFS does not use directories anymore, there are no more custom properties you can categorize files on To reintroduce this the DBFS uses keywords, a file can have zero or more keywords, and the keywords can be used in a search Keywords can be seen as the new directories Keywords are a superset of directories in view of their capabilities; keywords can do what directories can and more More on this later in this chapter
From here on, the data store of files will be called a view, just like any subset from this store of files is called a view And a search or query will be called a filter A view is created by a filter, and every view has a filter; basically, a filter defines the files you are looking at, hence a view The reason not to use search or query, but use filter instead, is because search or query sound too much ‘single-shot’, though the terms are almost analogous Relevance
In comparison to a hierarchy based file system, the DBFS is much more powerful in how to store and retrieve files But it does sacrifice the notion of URLs The DBFS can produce URLs by using unique file identifiers, much like inodes, but not by using symbolic identifiers, like the path in a hierarchy based file system
The DBFS can get away with this limitation, because it services a different goal then todays file systems The DBFS is targeted at the user by only storing documents (ie files the user is interested in) You could see it as a document retrieval system Consequently it does not store system files like shared libraries, configuration files and others These files should be stored using APIs, for instance using a hierarchy based file system
For the DBFS to perform optimally, it is not so much that files should be stored; instead documents should be stored Lots of programs today use multiple files as one document A few examples: An IDE uses multiple
files as source and header files (and more), but all these files are related and form one document Movies are
often stored as multiple files, a part one and part two, and two subtitle files, one for each part, again, all files are related and form one document Applications (especially under MS Windows) typically come with a whole bunch of files, but none of these files make sense unless in the context of the application, again, an application is one unit and should be treaded as such
Trang 18Database File System Database File System — Theory Categorisation
The DBFS categorizes files on any property they have, which creates a multi-dimensional categorization, as opposed by hierarchy based file systems that have a one-dimensional categorization applied multiple times In the DBFS only some categorizations are hierarchical, where there are hierarchical relationships (types and keywords) In a hierarchy based file system every categorization is hierarchical, even if there is no hierarchical relationship It is important to realize that pushing categorizations in hierarchy decreases the categorization its usefulness The way the DBFS categorizes is called a faceted system
With an simple example we can explain a faceted system and its powers over a hierarchical system Lets say we are looking at carrots and oranges ‘They share the properties that they are both edible and orange, but the first is a vegetable and the second is a fruit Also both could be from Europe, but the first is probably from the Netherlands and the second from Spain All these properties have no relationship between them: being orange has nothing to do with being a vegetable Only Spain and Europe have a relationship, which is a hierarchical relationship because Spain is part of Europe
In a faceted system we can create a categorization on both the carrot and the orange in a very natural way Such that we can ask the system for a orange vegetable and we see a carrot Or ask for a vegetable from the Netherlands and see a carrot But in a hierarchy based categorization, there is a fixed order of the properties and only when we traverse this order can we know about the properties of an object If the first categorization is on fruit or vegetable, then it is impossible to retrieve all edible things from Europe Or if we are making an orange salad, it is impossible to retrieve all orange edible things
The main difference between a hierarchy based system and a faceted system (like the DBFS) is that hierarchy based systems are made to store things in some (reasonable) logical location, as where a faceted system is made to categorize and find things Hierachy based systems are what whe use in the physical world to categorize things In a supermarket, oranges would be stored in the fruit department, and carrots in the vegetable department But the fruit and vegetable departments are in the biological food department, a hierarchical ordering on the role of the product
The reason we use such a system is because fysical objects can only reside in one place at the time, were as this limitation does not go for virtual objects, like files So there is no reason to limit a file system to a hierachical
Trang 19Database File System Database File System — Theory system, when a faceted system is more powerfull, and could be considered a super-set of hierarchical systems This is why the beginning of this chapter stated that keywords are like a super-set of directories: A location in a hierachical system is defined by its elements in the hierarchy, using these same elements to query a faceted systems yields the same results
Trang 20Database File System
Database File System — Overview
—
In the previous chapter we discussed the theory behind the DBFs In this chapter a high-level overview is given of the current implementation used in this research It will start at the bottom and end at the GUI that has been implemented in KDE
The DBFS has been implemented as a daemon service for unix like systems, which integrates a SQL library and accepts connections from clients The clients are the open-file and save-file dialogs in the open-source Desktop Environment KDE, together with a standalone filemanager, called KDBFS, which replaces Konqueror Running this setup of KDE gives the impression to a user that there is no hierarchy based file system, only the new database file system
The daemon service is called dbfsd and runs in the background It does not actually store files, it only stores references to files on the hierarchy based file system The dbfsd tries to work together with the underlying hierarchy, such that a high level of backwards compatibility is achieved In the current implementation it only supports a few pieces of meta-data: file-name, file-type, file-size, modification-date and keywords And the server is only meant, to service one user, but every user can run its own instance
The dbfsd can be configured using the dbfs/dbfs.conf file in the users home directory The main purpose of
this file is to tell the server what directories to scan and where certain new files go, according to their file-type It can also be set to ignore certain directories or files Which mime-type to use for which file extension is configured in dbfs/mime.conf A log goes to dbfs/dbfsd.log and the actual database is written in dbfs/db.db The next chapter will go much deeper into its implementation
Trang 21Database File System Database File System — Overview File Access KDE dbfsd ———Ý —— Hierarchical File System Figure 41 Overview of the new KDE
What the user sees when using the KDE implementation from this research, is a normal functioning system, until the user accesses a open-file or save-file dialog These fundamentally differ because they use the DBFS But because the dbfsd does not actually store files, only their references, while the user might see a different file system, a KDE application sees and uses normal files as if there was no DBFS This is important because the KDE applications do not need to change in order to work with the DBFS (Also see Figure 4.1.)
User Interaction
Trang 22Database File System Database File System — Overview „4 kdbfs a Views | (EE ] Keywords ø Date L2 Documents Kevmorda A Q HH [i E) > (5) ~~ Cartoons ¬ =-Johnny Br "as JyC a a Images =~ Mickey M l fp: Lá
Musi ~~ Office Angelina Jo Angelina Jo Angelina Jo Angelina Jo Beach usi Photo Albums ~- Den Haag a =f) = Pati Boa lÿ I & Applications b spon bb Beach Sun Beach Sun Bridget Ma Britney Sp Calm Ripple =-Tim Verja — c LMN B4 ERE OR 7 ir = University i \ "` in) _— 5.Final Proj || DreamHo Fantasy Bird Gillian And Gillian And Gillian And Wallpapers 8 SOAT zwart wit voor 2000 Ay Stage & & Ữ ” (3) =-MGIIIsCIsId8 : - 7 Website Gwyneth P Gwyneth P Lake Meatspace Meg Ryan =- black and w =~ cars ‘sow | ae we - Inverted Mode Modulo26 Natalie Im Reflection Savanne Viper ~ Save View New Keyword + Remove Keyword 25 files in view (total size 3 MB)
Figure 4.2 ‘The KDBFS application The number 1 is the view, numbers 2 through 5 are filters
Whenever the user manipulates a filter, the view follows the filter immediately, providing direct feedback And because the view updates in the background, the user can continue to manipulate the filters, even when thousands of files show up The user can manipulate how the view is rendered using the few buttons just above the view, which toggle the zoom level; overall layout; and sorting on name, date or size Right next to these buttons is a search field (number 5 in Figure 4.2), which searches the filename by manipulating a file-name filter Files in the view can also be renamed
Trang 23Database File System Database File System — Overview
Filters
Just above we already mentioned the name-filter, which is implemented as a search field All other filters the KDBFS offers are implemented as widgets, located next to the view These filter-widgets can be hidden or shown by the buttons at the very top of the application The current implementation has only three of these filter- widgets: a general main-type widget (numbers 2 and 3 in Figure 4.2); a keyword widget (number 4); and a
date widget (not shown in figure)
The general main-type widget has two functions First it can select on one or more of the main file-types there are in the system, like documents or images and more (number 2) But it also supports saving the current filter (number 3) Which means that the user can save a view he created and quickly retrieve that view, without having to click around to recreate the accompanying filter Moreover, after using a stored view, the rest of the filters can be used to create sub-views on the stored view
The date widget can select a date range which will select all files that have a modification dates inside that range Unfortunately the current implementation is not an optimal one, it is just two calendars on which the user can click A more optimal widget would display one calendar, which should be zoom-able, and ranges can be created by clicking and dragging on the calendar Also it is not possible to select on creation-date or last access-date
The keyword widget is probably the most important one, because it supports user defined categorisations The user can create new keywords, and rename or delete existing ones The user can also drag keywords around to create hierarchical relationships between keywords If a keyword is selected, the view will show all files which have that keyword associated with them Multiple keywords can be selected, and the view will show all files which have at least one of the selected keywords associated with them (an OR operation) When a keyword is selected that has multiple keywords beneath it in a hierarchical relation, the created filter will be as if all the keywords beneath and the selected keyword had been selected
Trang 24Database File System Database File System — Overview inverted mode In this mode the whole filter is appended with a NOT This is useful when categorizing lots of files, because the files already categorized will disappear from the view
The behaviours of the filter-widgets are very natural, but chosen quite arbitrary during the development of the KDBFS application The DBFS supports much more powerful filters then can be created using the widgets described above, but in order to keep the system simple, this implementation has been chosen ‘To really get a feel for how the filters work together, the reader should be enabled to click around in the application himself More on the usability of the DBFS and the GUI can be found in Chapter 6
Dialogs
The new open-file dialog is quite similar to the KDBFS application, except most buttons to manipulate the keywords and the view have been removed, keeping the focus of the dialog on opening files, and not manipulation file meta-data For KDE application that tell the dialog which file types it can open, the dialog displays only those files by setting an appropriate filter This can be disabled using a little checkbox located on the bottom of the dialog Also see Figure 4.3
The new save-file dialog is completely different from the original There is no need for an extensive dialog, because the DBFS does not use locations ‘The user can enter a name, optionally add a keyword or keywords to the new file, and press Save For those KDE application that tell the dialog what file-type to save to, the user can leave out the extension Also see Figure 4.4
It is unfortunate that KDE is not very focused on meta-data, and not all applications tell the dialogs what types they can open, or what type they will use to save This can be confusing because the DBFS save-file dialog only asks for a filename, not a file type (in the form of a file-extension) Happily the KOffice suite fully supports file-types when saving or opening But for instance when Konqueror saves a file from the internet, it does not relay its type to the dialog, the user should manually append the name with an extension This shortcoming is not permanent, and will be resolved if KDE gets better file-type support or works together with the DBFS more
Trang 26Database File System O Gorter Database File System — Overview tự KWord <2> Name: ‘document | Cancel | Save | | Keywords = Photography University _ Final Project =~ SOAT = Stage Figure 4.4 A _ save-file dialog; the key- word field can be toggled on or off
Trang 27Database File System
Database File System — Internals
—9——
Readers not interested in the technical design, internals and implementation of the DBFS can safely skip this chapter In fact, the sections containing code can also be safely skipped for those not interested in that much detail
The DBFs has been designed to be client-server oriented, where a client is an user interface to the DBFS, and the server is responsible for all the housekeeping and does all the work The motivation behind this design is that the users should never have to press refresh; Clients register views to the server and from there on the server knows what a client is looking at If the view of the client needs to be updated, the server tells the client to do so Updates are necessary when a client sets a new filter for a view, but also when another client renames a file, or does any other meta-data manipulation ‘This scenario implies two other design aspects: the communication between the client and server is asynchronous, and the server and the client are both multi-threaded
The DBFS is mainly written in O’Caml but on the client side there are different, APIs to interface with the system There are four low level APIs: for O’Caml, C, C++, and Objective-c There is also a high level API for KDE that includes widgets and controllers
Trang 285.1 Database File System Database File System — Internals Server Client Configuration Initialization Database Database File System | I | I I Operating System p g vy Ỷ | | | Figure 5.1 Overview of the DBFS Server
The server has two main responsibilities First it fills and keeps track of all the views clients have registered to it, and sends update to clients who need it Secondly it keeps in sync with the underlying hierarchy based file system, where it renames and deletes files when necessary
The server keeps a SQL database (see “Data Querying” ) of all the files the user is interested in There is a crawler module in the server that fills and updates the database with files from the underlying file system When a client creates a view, the filter that accompanies the view is translated into a SQL query Every time the filter or the database is changed, the SQL query is run against the database A set of files is created from the results
Trang 29Database File System Database File System — Internals of the database, this set is compared to the old set and any differences (added or removed files) are transmitted to the client The same mechanism is used for meta-data like keywords and custom stored filters
Clients connect the the server using either TCP/IP or unix domain sockets sockets After a connection is established, there is a protocol in place that defines the communication between the server and the client (see “Protocol and Views”) As mentioned before, this protocol is completely asynchronous to allow either the server or the client to initiate communications
Synchronizing with the Hierarchical File System
The server keeps in sync with the underlying file system using a crawler module This module periodically indexes the configured directories and all their subdirectories; any new file is added to the database, and an already existing file is updated so that, for example, its modification date keep in sync
The synchronisation also goes the other way around; when the user renames a file using the DBFS, the server will rename that file on the underlying file system Because there is no need for unique file names in the DBFS but there is on the regular file system, the server uses a special scheme that appends a number to the filename when conflicts arise If in the DBFS there are two files called report, which are both MSword documents and are both stored in the same directory on the underlying file system The first file will be called report doc and the second report-1.doc Any subsequent third file will be called report-2.doc and so on
When using the DBFS to save files, the server will use its file-type to determine where to save the file in the underlying file system Such that in a standard configuration MSWord documents will be stored in Documents The same algorithm as discussed above is used to create a unique file names
Trang 305.1.2
Database File System Database File System — Internals
It is rather important to keep in sync with the underlying system, by using the scheme just discussed, because it makes it possible to use the DBFS system while maintaining full backward compatibility As has been done during this research using KDE
Files and Filters
First, we will discuss two fundamental types in the system: files and filters After that we will continue on to see how the major processes take place inside the server
But before we start, a few notes: From here on there will be some code blocks in the text Because the implementation of the DBFS is in O’Cam! there will not be many readers familiar with the language used in these code blocks Still, reading through these blocks and their accompanying texts should give a general idea of how the server is implemented and how the data flows through the system Therefor the author encourages readers to just read on
Whenever a code block corresponds to an implementation file, that file will be mentioned A complete listing of the source files can be found in Appendix H
Files are represented by the following O’Caml record type (defined in common/file.m1):
type file = {
fid: int; (* unique *)
version: int; (* increment when file info changes, caching optimization *) name: string; date: float; size: int; file: string; mime: mime; $33
Where fid is a global unique identifier for a file, much like an inode on a unix file system The rest of the fields are pretty much self explanatory except for the version field, which is explained below
Trang 31Database File System Database File System — Internals In the DBFS there are two notions of files First as the entity that is represented by the file record, which is a representation of meta-data, and a file as a piece of data which is referenced by an URL using the underlying
hierarchical file system (stored in the file field of the file record) This distinction is rather faint, as the files
used in the DBFS are basically a wrapper around the files stored on the underlying hierarchical file system Still the reader should be aware of this distinctions at this level
The version field is used as a caching optimization; every time the file changes some property, the version is incremented This is useful because files are processed using FileSets, which are binary trees holding files, implemented using the O’Caml Set module The files are indexed over a total ordering, which uses the file’s fid and version The end result is that we can compare FileSets with each other to see what has changed without making this an expensive operation ‘This is important as we see later in “Protocol and Views’
(* module that can order files *)
module File_ord = struct type t = file
let compare f1 f2 =
let c = f1.fid - £2.fid in if c = 0 then f1.version - f2.version else c end
(* ordered set of files module *)
module FileSet = Set.Make(File_ord);;
A filter is represented by the following O’Caml type (defined in common/filter.m1): type filter =
Empty All
Trang 325.1.3
Database File System Database File System — Internals
| And of filter * filter | Or of filter * filter
And a filter that selects every file that has the keyword university and is a MSWord document is constructed
such:
let filter = And (Type ("application", "“msword"), Keyword "university");;
But filters can also be parsed from strings: let filter = filter_of_string
"type \"application\" \"msword\" and keyword \"university\""; ;
Which results in a filter identical to the first one
Filters can be converted into strings using string _of_filter and converted to a (partial) SQL query using sql_of_filter, which we will see in action in “Data Querying”
Protocol and Views
When a client starts communications with the server, a view is created for this client A view has a three components: a FileSet which are all the files in the current view; two sets of meta-data namely: KeywordSet and CustomSet representing the keywords and the custom stored queries in the system; a filter which is used to select the files for the Fileset from the database (see “Data Querying” )
After this is setup, the client typically activates the asynchronous mode of communication by calling Set_callback and sets a new filter using Set_filter While in the mean time the server will respond to Set_callback by transmitting the current set of files (using Files) and the keywords and custom stored queries (using Key- words, Customs resp.) These protocol elements are implemented using O’Caml types and are communicated using the Marshal module from O’Caml The protocol type a client can transmit to the server (implemented in
common/protocol.m1):
Trang 33Database File System
(** server command type, client sends these to server *) type server_command = Ọ Ọ Set_filter of filter Get_files Add_file_with_keywords of string * keywordlist Change_file of file Delete_file of file Get_keywords Add_keyword of keyword Delete_keyword of keyword Change_keyword of keyword
Set_keywords_to_files of keywordlist * filelist Add_keyword_with_files of keyword * filelist Remove_keywords_from_files of filelist * filelist Get_customs Add_custom of custom Delete_custom of custom Change_custom of custom Set_callback Remove_callback Set_incremental Set_no_incremental Read_dir of string
and the client receives these types from the server:
Database File System — Internals
(** client command type, server send these to client, mostly as a response *) type client_command =
Trang 34Database File System Database File System — Internals Files of FileSet.t Keywords of KeywordSet.t Added_file of string | | | Customs of CustomSet.t | | Ok rổ
The actual transmission is done using functions like read_server_command:
let read_server_command inc =
(Marshal.from_channel inc : server_command) ;;
Which uses an input file descriptor and yields a server_command This all comes together in the main loop of the server where it listens for incoming commands and responds to the client accordingly (implemented in server/server.m1):
method run =
let rec loop () =
Trang 355.1.4 Database File System Database File System — Internals try loop () with
-> info ~file:"server.ml" "client disconnected"; _view#remove_observer (self :> ’v observer)
And here we can see why we use FileSets as a store for our files Instead of sending all files the the client as a re- sponse to Set_filter, the server sends only the removed and added files (self#send_command (Updated_files
(r, a))) How these files are acquired is discussed in the next section Data Querying The server uses the SQLite SQL database to store and query the files in the system The schema for the database is shown in Figure 5.2 files keywords
fid INT KEY = kid INT KEY version INT ace Keywords ` a version INT name CHAR rs - | name CHAR
date INT kid INT rank INT
size INT parent INT
file CHAR *] customs
base CHAR crawler cid INT KEY special CHAR fid INT inode INT KEY version INT
rank INT name CHAR
timestamp FLOAT filter CHAR rank INT
Figure 5.2 Database schema as used by the DBFS
Trang 36Database File System Database File System — Internals A filter that will select all the MSWord documents: type "application" "msword":
let filter = filter_of_string("type \"application\" \"msword\""); ;
This is translated into the following SQL query and processed in the database like this:
let sql = sql_of_filter (filter) ;;
>>> sql = "base=’application’ and special=’msword’"
let new_files = files_of_query ("SELECT * FROM files WHERE" ~ sql);;
Trang 37Database File System Database File System — Internals
^
warning ~file:"files.ml" ("Error (" ^ g ^ "); q); FileSet.empty; ;
After the new FileSet is constructed it is compared to the old FileSet and the updated files will be send to the
client, if any (see “Protocol and Views’ ) As seen in the following code (implemented in server/view.ml):
let new_files = files_of_query ("SELECT * FROM files WHERE " ~ q) in let inter = FileSet.inter new_files _files in
let removed = FileSet.diff _files inter in let added = FileSet.diff new_files inter in
This process of filling the view is done on two events One, if the client set a new filter to the view (using Set_filter) or two, when any client changes the database In both cases the server will respond with a Updated_files if the view has changed
Server Management
In this section we will discuss a few constructs used inside the server implementation, that are important enough to be mentioned here
Throughout the code we can see Mutex.lock and Mutex.unlock which are used to make the server thread save on database access and communications ‘This is needed because simultaneous access to the database can lead to incorrect results And simultaneous access to the sending mechanisms can lead to incorrect data being transmitted
We also see constructs like debug ~file:"database.ml" "initializing" which are logger commands The logger is implemented in common/log.ml and can write its log to arbitrary output descriptors A log level can be set which can exclude messages from the log Default only warning and fatal messages are logged
Trang 38SỐ
Database File System Database File System — Internals
When an object inside the server also represents a thread, it inherits from a thread class (implemented in common/threadobjects.ml) This provides basic thread management, including shutting threads down on command The object only needs to re-implement the run method which will be called after the thread is started, and the thread will stop when the run method exits
The server accepts a few command line configurations and reads from a configuration file for the rest These con- figurations variables can be accessed using various data access functions like string_of_config or bool_of_config
(implemented in common/config.m1) These options are initialized in the init module of the server (implemented
in server/init.m1), and the configuration file is read using the parse_files function
Client
As we have seen in the previous section, the server does all the hard work The client only mirrors the view, while the server keeps the view updated The main task of the client is to expose the internals of the system into a set of easy accessible of programming functions
This is why there are four APIs for the DBFS And of these the c like APIs have been implemented as shared libraries Including a header file and linking to its library is enough to implement a DBFS client Because the client is only a thin client, we will not discuss it here further There is programmers reference and an programming example, distributed with the DBFS implementation
Graphical User Interface
The main interface that has been implemented for this research is KDE based The implementation files for the DBFS client are all inside the KDE library libkio (in /kdelibs/kio/kfile/) All programs that use open-file or save-file dialogs link against this library, and will automatically use the new dialogs after a recompile (The recompile is necessary due to a few awkward design decisions in the original KDE implementation, which make the new implementation source-compatible but not binary compatible.) Also the widgets and support objects created for the DBFS are implemented in libkio and used by the dialogs and the KDBFS application
Trang 395.3.1
Database File System Database File System — Internals
Following are a few important implementations of the KDE DBFSAgain, some code is given, this time assuming basic knowledge of C++ and the QT and KDE programming constructs
DBFS Library Interface Handling
The DbfsHandler (implemented in /kdelibs/kio/kfile/dbfshandler.cpp) is the main entry point of KDE into the DBFS It takes care of the threaded routines and transforms asynchronous requests into QT events, such that the GUI can take care of them The viewHasChanged() and metaHasChanged() are the callback functions a dbfs: :DbfsViewChangedHandler object from the libdbfs_cpp library must implement to handle the threaded events // low level file stuff void DbfsHandler: :customEvent (QCustomEvent* e) {
Trang 405.3.2
Database File System Database File System — Internals
The DbfsHandler listens to its own custom events, and transforms them into QT signals Other objects can listen for the events by connecting their slots to these signals
The DbfsHandler is also responsible for initializing the DBFS library It is implemented as a factory, because the DBFS library should be initialized only once in the lifetime of an application An instance of the DbfsHan- dler should be created using the static getInstance and deleted when the object is no longer needed, the implementation will take care of handling the internals
DBFS Dialogs
The implementation of the new DBFS dialogs in KDE are somewhat of a hack, such that they can support full source compatibility The setup can be best described as a proxy; the original KFileDialog class now proxies all its functions to either the KDbf sOpenDialog or KDbfsSaveDialog Part of the problem is that the original dialogs are not split into a save and open dialog, but the class’s functionality changes with on a parameter that puts the dialog in either saving or opening mode (or other mode, used for whatever reason) This proxy mechanism can be seen in the following implementation (implemented in /kdelibs/kio/kfile/kfiledialog.cpp):
int
KFileDialog: :exec()
{
kdDebug() << " exec " << endl;
if (NULL != _wOpenDialog) return _wOpenDialog->exec() ; if (NULL != _wSaveDialog) return _wSaveDialog->exec() ; kdDebug() << "creating a new KDbfsOpenFileDialog" << endl;
_wOpenDialog = new KDbfsOpenFileDialog(this, _sFilter, _wParent, _sName, _bModal| , _wWidget) ;
return _wOpenDialog->exec() ;
}